This section focuses on the following exam objectives:
Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization
Task Statement 1.2: Identify and remediate issues using monitoring and availability metrics.
◆◆◆◆◆◆
Key Concepts
Amazon CloudWatch Alarms continuously evaluate metrics and trigger notifications or automated actions when defined thresholds are breached. Amazon Simple Notification Service (SNS) is commonly used to notify administrators when alarms transition to the ALARM state. For automated remediation, AWS Lambda functions and AWS Systems Manager (SSM) Automation can be invoked directly from alarms or event rules.
Common Metrics to Monitor
Typical Issues and Remediation Actions
| Issue | Likely Cause | Recommended Remediation |
|---|---|---|
| High EC2 CPU utilization | Traffic spikes, inefficient application logic | Scale out, optimize application code, or use burstable (T-series) instances |
| RDS performance degradation | High latency, memory pressure | Increase instance size, tune queries, add read replicas |
| ALB targets marked unhealthy | Failing health checks, misconfigured targets | Verify security groups, fix application health check responses |
| Auto Scaling not triggering | Incorrect policies or cooldown settings | Adjust scaling thresholds and reduce cooldown periods |
| IAM role or policy failures | Missing or incorrect permissions | Validate policies using the IAM Policy Simulator |
| Logs not appearing in CloudWatch | Agent misconfiguration or missing permissions | Restart CloudWatch Agent and verify IAM role permissions |
📌 Exam Tips
Understand how to interpret CloudWatch alarms and determine appropriate corrective actions. Know how to subscribe SNS topics to alarms and recognize when Auto Scaling, database tuning, or load balancer reconfiguration is the most effective solution.
Key Concepts
Amazon EventBridge (formerly CloudWatch Events) enables event-driven automation by matching incoming events to predefined rules. When an event pattern matches, EventBridge can trigger targets such as AWS Lambda functions, SNS notifications, or Systems Manager automation workflows.
Common Event Sources
Creating an EventBridge Rule
{ "source": ["aws.ec2"], "detail-type": ["EC2 Instance State-change Notification"], "detail": { "state": ["stopped"] } }Common Event-Driven Use Cases
| Use Case | Trigger | Action |
|---|---|---|
| Automatically restart stopped EC2 instances | EC2 state change | Start instance via Lambda |
| Notify administrators of AWS service issues | AWS Health event | Send notification through SNS |
| Remediate insecure security group changes | AWS Config violation | Revert configuration using SSM |
📌 Exam Tips
Know how EventBridge rules work, how to define event patterns, and which services can be used as targets. Be familiar with using AWS Health events for proactive monitoring and automated remediation.
Key Concepts
AWS Config continuously evaluates resource configurations against defined compliance rules. AWS Systems Manager Automation enables automated execution of remediation steps through SSM runbooks (automation documents). Together, these services provide a powerful framework for enforcing compliance and correcting configuration drift.
Remediation Workflow
Common AWS Config Rules and Automated Actions
| Config Rule | Violation Detected | Automated Remediation |
|---|---|---|
| Unencrypted EBS volumes | Volume created without encryption | Encrypt volume using SSM |
| Public S3 buckets | Public access allowed | Update bucket policy |
| Overly permissive security groups | Open access on restricted ports | Remove offending rules |
| Root account usage | Root user activity detected | Notify administrators via SNS |
Example AWS CLI Command
aws ssm start-automation-execution \
--document-name "AWSConfigRemediation-EncryptEBSVolume" \
--parameters VolumeId=vol-12345678
📌 Exam Tips
Understand how AWS Config detects violations and how SSM Automation runbooks perform remediation. Be able to configure EventBridge rules to trigger automated responses and identify which resource types can be remediated automatically.
Analyze Scenario-Based Questions Carefully
Exam questions often present troubleshooting scenarios requiring you to select the most effective remediation strategy. Pay close attention to the service involved and the metric or event driving the issue.
Prioritize Automation and Cost Efficiency
AWS best practices favor automated remediation over manual intervention. Combining EventBridge with Lambda or SSM is often the preferred approach. Using AWS Config with automation ensures continuous compliance and reduces operational overhead.
Understand IAM Requirements
| Use Case | Recommended Service |
|---|---|
| Detecting and alerting on service issues | AWS Health, EventBridge, SNS |
| Automating remediation of misconfigurations | AWS Config, SSM Automation |
| Restarting EC2 instances automatically | CloudWatch Alarms, Lambda |
| Querying logs during troubleshooting | CloudWatch Logs Insights |
| Monitoring performance and availability | CloudWatch Metrics and Alarms |
| Tracking security and compliance violations | AWS Config |