AWS SysOps Exam

Billing and Cost Management

User-Defined Cost Allocation Tags

User-defined tags are tags that you define, create and apply to resources.After you have create and applied them, you can activate them on the Billing and Cost Management console for cost allocation tracking.

The detailed steps are:

  1. Log in to the AWS Management Console of the new account
  2. Use the Tag Editor to create the new user-defined tags
  3. Use the Cost Allocation Tag manager in the payer account to mark the tags as cost allocation tags

Certificate Manager (ACM)

When you request a public certificate, AWS Certificate Manager (ACM) generates a public/private key pair

Application Load Balancer (ALB)

If an application is running on EC2 instances with Auto Scaling Group, and it’s behind an Application Load Balancer (ALB). If the company want to configure the application to scale based on the number of incoming requests, SysOps need to

  • Use a target tracking scaling policy based on ALB’s RequestCountPerTarget metric


A company has deployed its infrastructure using AWS CloudFormation. Recently, the company made manual changes to the infrastructure. A SysOps Administrator is tasked with determining what was changed and updating the CloudForamtion template, he/she needs to

  • Use drift detection on CloudFormation stack. Use the output to udpate the CloudFormation template and redeploy the stack


A stack set lets you create stacks in AWS accounts across regions by using a single AWS CloudFormation template. All the resources included in each stack are defined by the stack set’s AWS CloudFormation template.


If the third company is using federation to authenticate users and grant AWS permissions, SysOps can use CloudTrail for the federated identity username.

Management Events Logging

Management events logging provide visibility into management operations that are preformed on resources in your AWS account.

Data Events Logging

Data events logging provide visibility into the resource operations performed on or within a resource. These are also known as data plane operations. Data events are often high-volume activities.


  • S3 object-level API activity, e.g GetObject, DeleteObject, PutObject API operations


To enable memory metrics for every minute, SysOps needs to

  1. Enable detailed monitoring on the instance within in Amazon CloudWatch
  2. Publish the memory metrics using Amazon CloudWatch Agent


If EC2 instance has stopped responding and the system checks are failing, SysOps needs to

  • Stop and then start the EC2 instance so that it can be launched on a new host

Reboot vs Stop/Start

  • Reboot an EC2 instance will keep your everything still on the same physical host machine
  • Stop/Start an EC2 instance will move to a new physical host machine

Auto Scaling Group (ASG)

Auto Scaling Group is configured to determine the health status of EC2 instances using EC2 status checks.

If we want to analyse the unhealthy instances before termination, we can use EC2 Auto Scaling Group Lifecycle Hook to pause instance termination after the instance has been removed from service.

Operation System

Patching EC2 instances is customers’ responsibility.


To solve alerts of high CPU utilization from a Memcached-based ElastiCache cluster, SysOps can

  • add additional work nodes to ElastiCache cluster
  • create an Auto Scaling Group to ElastiCache cluster

If the eviction count metric is high whilst other components are normal, SysOps needs to

  • Scale the ElastiCache cluster by adding additional nodes

Elastic Load Balancer (ELB)

An ELB is a software-based load balancer which can be set up and configured in front of a collection of AWS EC2 instances. The ELB servers as a single entry point for consumers of the EC2 instances and distributes incoming traffic across all machines available to receive requests.

If Security team wants to track application requests by the originating IP and the EC2 instance that processes the request, a SysOps Admin can use Elastic Load Balancing access log to provide that information.


IAM Role

To securely access credentials that stored in AWS System Manager Parameter Store, SysOps can create an IAM Role for the EC2 instances and grant the role permission to read the System Manager parameters.

To access AWS Management Console with Security Assertion Markup Language SAML, SysOps can map the role attribute to an AWS role. The AWS role is assigned IAM policies that govern access to AWS resources.


If you are running a serverless application in AWS Lambda and there is a expected traffic increase, SysOps need to ensure the concurrency limitation for the Lambda function is higher than the expected simultaneous function executions.


If your NAT instance has a high latency as the network grows, SysOps need to replace the NAT Instance with a NAT gateway.

Comparison between old NAT Instance and NAT Gateway


  • NAT Instance is a generic Amazon Linux AMI that configured to perform NAT
  • NAT Gateway is performance software that optimised for handling NAT traffic


  • NAT Instance depends on the bandwidth of the EC2 instance type
  • NAT Gateway can scale up to 45Gbps


If security team find there are some employees have been using individual AWS accounts that are not under the control of the company, A SysOps need to

  • Send each existing account an invitation from the central organisation

Service Control Policies (SCPs)

AWS Organisation helps you centrally govern your environment and use Service Control Policies (SCPs) to set permission guardrails with the fine-grained controls using AWS IAM policies.

SysOps can set up notifications for whenever combined billing exceeds a certain threshold for all AWS accounts within a company. To achieve that, SysOps needs to

  1. Set up AWS Organisation and enable Consolidated Billing
  2. In the Payer Account
    1. Enable Billing Alerts in the Billing and Cost Management console
    2. Set up a billing alarm in Amazon CloudWatch
    3. Publish an SNS message when the alarm triggers

If the Security team discovers that some employees are using AWS services in ways that violate company policies, A SysOps Administrator need to prevent all users of an account, including the root user, from performing certain restricted actions, the SysOps needs to

  • Apply Service Control Policies (SCPs) to allow approved actions only

Relational Database Service (RDS)

To ensure minimal downtime of a web application in the event the database suffers a failure, SysOps can modify the DB instance to outside of business hours be a Multi-AZ deployment

To have a daily backup of the RDS database in a separated security account, SysOps needs to

  1. Create an RDS snapshot with AWS CLI create-db-snapshot command
  2. Share it with the security account
  3. Create a copy of the shared snapshot in the security account


Amazon RDS Multi-AZ deployments do not failover automatically in response to database operations, such as

  • long running queries
  • deadlocks
  • database corruption errors

RDS will automated failover to secondary database only when

  • A storage failure on primary database
  • The database instance type was changed


Aurora is fault-tolerant by design and adding a read replica can increase availability.

Route 53

If you web application has a new version that need to roll out, SysOps can use an Amazon Route 53 weighted routing policy to gradually move traffic from the old version to the new one

ALIAS Record

An ALIAS record is a virtual host record type, which is used to point one domain name to another one, almost the same as a CNAME. The important difference is that ALIAS can coexist with other records on that name.

CNAME Record

The CNAME record will point your domain or subdomain to the IP address of the destination hostname. If the IP of the destination hostname changes, you won’t need to change your DNS records as the CNAME will have the same IP.


  • You can have multiple ALIAS records, but only one CNAME record
  • CNAME and ALIAS records must point to a name

Service Catalog

AWS Service Catalog allows IT administrators to create, manage, and distribute catalogs of approved products to end users, who can then access the products they need in a personalized portal.

Storage Gateway

AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.

Storage Gateway enables you to reduce your on-premise storage footprint and associated costs by leveraging Amazon S3 Cloud Storage.

System Manager

If a SysOps Administrator is attempting to use AWS System Manager Session Manager to initial a SSH session with an Linux EC2 instance, and cannot find the target instance in Session Manager console, the SysOps Administrator need to

  1. Add System Manager permission to the instance profile
  2. Install System Manager Agent on the target instance


Static Website Hosting

When using static website hosting features with S3, if you received 403 Forbidden Access Denied error, SysOps needs to add a bucket policy to grant everyone to read access to bucket objects.


If there the requestments is to archive data to be retained for at least 7 years, a SysOps Admin need to configure

  • AWS S3 Glacier Vault Lock policy

S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy.

Virtual Private Cloud (VPC)

If a developer has issues with connectivity issues with a particular port, a SysOps need to check

  • Security Group is correct configured to allow that port
  • Network ACL is using default configuration

If all the above steps are not working, VPC Flow logs will show all the details.

VPC Endpoint

A VPC endpoint enables you to privately connect your VPC to supported AWS services and VPC endpoint services powered by AWS PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
Endpoints are virtual devices. They allow communication between instances in your VPC and services without imposing availability risks or bandwidth constraints on your network traffic.

Web Application Firewall (WAF)

AWS Web Application Firewall service is the most commonly used solution for protection form XSS and web application attacks.

If you observes a large number of rogue HTTP requests on an Application Load Balancer, SysOps can use AWS WAF rate-based blacklisting to block this traffic when it exceeds a defined threshold.

If you observe 404 errors are being sent to one IP address every minute, SysOps should to use WAF to block this suspected malicious activity.

General Network


ping operate by sending Internet Control Message Protocol (ICMP) packets to the target host and wait for an ICMP echo reply.

Common Vulnerabilities and Exposures report (CVE)

To get HTTP layer 7 status code, you can use

  • Application Load Balancer (ALB) access logs
  • CloudFront access log

Network Address Translation

Network Address Translation (NAT) is a method of remapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic routing device.

Internet Gateway

An Internet Gateway is a logic connection between an Amazon VPC and the Internet.

  • It is not a physical device
  • Only one Internet Gateway can be associated with VPC
  • It does not limit the bandwidth of Internet connectivity
    • The only limitation on bandwidth is the size of the Amazon EC2 instance, and it applies to all the traffic

If a VPC does not have an Internet Gateway, then the resources in the VPC cannot be accessed from the Internet

NAT vs Internet Gateway

A NAT Gateway does similar things like Internet Gatway, but with two main differences:

  1. NAT Gateway allows resources in a private subnet to access the Internet. Think yum update, external database connections, wget calls, etc.
  2. NAT Gateway only works one way. The Internet at large cannot get through your NAT to your private resources unless you explicitly allow it.

Create vs Apply in Kubernetes

kubectl create is so called Imperative Management. This approach will tell Kubernetes API what you want to create, replace or delete, not how you want your Kubernetes cluster world to look like.

kubectl apply is part of Declarative Management approach, where changes that you may have applied to a live object (i.e. through scale) are maintained even if you apply other changes to the object.

Both approaches are valid ways to work in production.

Table Lock Issues in PostgreSQL

Recently we need to heavy lifting some datasets from AWS Redshift to AWS Aurora in daily basis.

Intuitively I was thinking this progress should be very straightforward because both Redshift and Aurora are nothing but Postgres instances, and we could utilise all the Postgres toolings (pg_dump, pg_restore, COPY etc) to transfer the data. But in reality, nothing is hard until you start to implement and write the actual code to do the work.

There are few things that are not been considered at the beginning of this project.

  1. From Redshift side, the source data is dynamically changing quite frequently
  2. From Aurora side, there are lots of user operations applying to the data, e.g. creating customerised views linked with source data, granting various permissions to different users
  3. The most important and criterial issue is that the query concurrency scenario is out of scope at the beginning. Schemas and tables will be in Table Lock mode if there is a user query against those table, and hence the data updating process will hang there until the schemas and tables are released.

It seems to the table lock problem is the most criterial issue. Luckily we could use the system table pg_locks to detect the current Table Lock information. pg_locks provides a global view of all locks in the database cluster, not only those relevant to the current database. Although its relation column can be joined against pg_class.oid to identify locked relations, this will only work correctly for relations in the current database (those for which the database column is either the current database’s OID or zero).

prd=> SELECT relation::regclass, locktype, database, relation, pid, mode, granted, fastpath FROM pg_locks;
relation | locktype | database | relation | pid | mode | granted | fastpath
pg_stat_database | relation | 13934 | 11703 | 54371 | AccessShareLock | t | t
| virtualxid | | | 54371 | ExclusiveLock | t | t
pg_locks | relation | 21465 | 11577 | 68877 | AccessShareLock | t | t
| virtualxid | | | 68877 | ExclusiveLock | t | t
powerusers_xerocard.puxcuc_created | relation | 21465 | 351168910 | 64656 | AccessShareLock | t | t
powerusers_xerocard.puxcuc_usrdeets | relation | 21465 | 322042383 | 64656 | AccessShareLock | t | t
powerusers_xerocard.puxcuc_usrlogin | relation | 21465 | 322042377 | 64656 | AccessShareLock | t | t
powerusers_xerocard.ix_xc_uc_status | relation | 21465 | 314731513 | 64656 | AccessShareLock | t | t
powerusers_xerocard.ix_xc_uc_email | relation | 21465 | 314731512 | 64656 | AccessShareLock | t | t
powerusers_xerocard.ix_xc_uc_userid | relation | 21465 | 314731511 | 64656 | AccessShareLock | t | t
powerusers_xerocard.puxcoc_orgid | relation | 21465 | 314731271 | 64656 | AccessShareLock | t | t
powerusers_xerocard.puxcoc_orgflags | relation | 21465 | 314731270 | 64656 | AccessShareLock | t | t
powerusers_xerocard.ix_xc_userorgrole_role | relation | 21465 | 314731528 | 64656 | AccessShareLock | t | t
powerusers_xerocard.ix_xc_userorgrole | relation | 21465 | 314731487 | 64656 | AccessShareLock | t | t
digitalmarketing_dbo.google_xerogoogleidlink | relation | 21465 | 165584573 | 64656 | AccessShareLock | t | t
digitalmarketing_dbo.core_dimuseraccount | relation | 21465 | 165585662 | 64656 | AccessShareLock | t | t
powerusers_xerocard.organisationmilestones | relation | 21465 | 307437428 | 64656 | AccessShareLock | t | t
powerusers_xerocard.usercard | relation | 21465 | 307432249 | 64656 | AccessShareLock | t | t
powerusers_xerocard.organisationcard | relation | 21465 | 307432243 | 64656 | AccessShareLock | t | t
powerusers_xerocard.userorganisationrole | relation | 21465 | 307437422 | 64656 | AccessShareLock | t | t
| virtualxid | | | 64656 | ExclusiveLock | t | t
digitalmarketing_dbo."IX_core_dimuseraccount" | relation | 21465 | 165594407 | 64656 | AccessShareLock | t | f
powerusers_xerocard.puxcom_orgid | relation | 21465 | 314731283 | 64656 | AccessShareLock | t | f
pg_database_datname_index | relation | 0 | 2671 | 54371 | AccessShareLock | t | f
pg_database | relation | 0 | 1262 | 54371 | AccessShareLock | t | f
digitalmarketing_dbo.pk_xerogoogleidlink | relation | 21465 | 314752517 | 64656 | AccessShareLock | t | f
digitalmarketing_dbo.uq_dimuseraccount_xuser | relation | 21465 | 165594411 | 64656 | AccessShareLock | t | f
digitalmarketing_dbo.pk_dimuseraccount | relation | 21465 | 165594409 | 64656 | AccessShareLock | t | f
pg_database_oid_index | relation | 0 | 2672 | 54371 | AccessShareLock | t | f
digitalmarketing_dbo.ix_google_xerogoogleidlink | relation | 21465 | 314752522 | 64656 | AccessShareLock | t | f

pg_locks provides lots useful information, but here we just choose the following fields

  • locktype. Type of the lockable object: relation, extend, page, tuple, transactionid, virtualxid, object, userlock or advisory
  • database. OID of the database in which the lock target exists, or zero if the target is a shared object, or null if the target is a transaction ID
  • relation. OID of the relation targeted by the lock, or null if the target is not a relation or part of a relation.
  • pid. Process ID of the server process holding or awaiting this lock, or null if the lock is held by a prepared transaction.
  • mode. Name of the lock mode held or desired by this process.
  • granted. True if lock is held, false if lock is awaited
  • fastpath. True if lock was taken via fast path, false if taken via main lock table
FROM pg_locks pl
LEFT JOIN pg_stat_activity psa
ON =;

We could even use the pg_locks and pg_stat_activity tables to check the query age with the following statement

a.datname as db_name,
c.relname as relation_name,
relation::regclass as table_name,
-- l.transactionid,
-- a.query,
age(now(), a.query_start) AS query_age,
FROM pg_stat_activity a
JOIN pg_locks l ON =
JOIN pg_class c ON c.oid = l.relation
ORDER BY a.query_start;


Get AWS EMR Cluster Info with Powershell

In order to get information from an existing EMR cluster, we can use

PS S:\ Get-EMRCluster -ClusterId $ClusterId

The command will then return a system object in Amazon.ElasticMapReduce.Model.Cluster type.

The Cluster object provides the following attributes that maybe useful

  • MasterPublicDnsName. The DNS name of the master node.
  • NormalizedInstanceHours. An approximation of the cost of the cluster.
  • ReleaseLabel. The release label of Amazon EMR.
  • Status. The current status details about the cluster.

There is another function to extract the EMR Instance Group information, and that is

PS S:\ Get-EMRInstanceGroupList  -ClusterId $ClusterId

This function will return a Amazon.ElasticMapReduce.Model.InstanceGroup object. We can then use the results to extract information like InstanceType, RunningInstanceCount for diffferent Instance Group.

The followings are some attributes we may use

  • BidPrice. The maximum Spot price you are willing to pay for EC2 instances.
  • InstanceGroupType. The type of the instance group. Validate values are MASTER, CORE OR TASK.
  • InstanceType. The EC2 instance type for all instances in the instance group.
  • RunningInstanceCount. The number of the instances currently running in this instance group
  • Status. The current status of the instance group.

For example, if we want to know the instance type and running instance counts for CORE Node instance group, we can use

$instanceGroup = Get-EMRInstanceGroupList  -ClusterId $ClusterId
write-host $instanceGroup.InstanceType
# r5.4xlarge r5.4xlarge r5.4xlarge

write-host $instanceGroup.RunningInstanceCount
# 40 30 1

You will see there are three values in the results section, and the order is TASK, CORE, and MASTER. So for our example of getting the instance type and running instance counts for CORE Node instance group, we should use

write-host $instanceGroup.InstanceType[1]
# r5.4xlarge

write-host $instanceGroup.RunningInstanceCount[1]
# 30