Section: Exam Notes
Section: Practice Tests

Remediating Issues Using Monitoring and Availability Metrics

This section focuses on the following exam objectives:

Content Domain 1: Monitoring, Logging, Analysis, Remediation, and Performance Optimization
Task Statement 1.2: Identify and remediate issues using monitoring and availability metrics.

◆◆◆◆◆◆

1. Troubleshooting and Corrective Actions Using Notifications and Alarms

Key Concepts
Amazon CloudWatch Alarms continuously evaluate metrics and trigger notifications or automated actions when defined thresholds are breached. Amazon Simple Notification Service (SNS) is commonly used to notify administrators when alarms transition to the ALARM state. For automated remediation, AWS Lambda functions and AWS Systems Manager (SSM) Automation can be invoked directly from alarms or event rules.

Common Metrics to Monitor

  • EC2 Instances: CPU utilization, instance status checks, disk space
  • Amazon RDS: Freeable memory, database connections, read/write latency
  • Elastic Load Balancers: Unhealthy host count, request latency
  • Auto Scaling Groups: Instance launch and termination activity

Typical Issues and Remediation Actions

IssueLikely CauseRecommended Remediation
High EC2 CPU utilizationTraffic spikes, inefficient application logicScale out, optimize application code, or use burstable (T-series) instances
RDS performance degradationHigh latency, memory pressureIncrease instance size, tune queries, add read replicas
ALB targets marked unhealthyFailing health checks, misconfigured targetsVerify security groups, fix application health check responses
Auto Scaling not triggeringIncorrect policies or cooldown settingsAdjust scaling thresholds and reduce cooldown periods
IAM role or policy failuresMissing or incorrect permissionsValidate policies using the IAM Policy Simulator
Logs not appearing in CloudWatchAgent misconfiguration or missing permissionsRestart CloudWatch Agent and verify IAM role permissions

📌 Exam Tips
Understand how to interpret CloudWatch alarms and determine appropriate corrective actions. Know how to subscribe SNS topics to alarms and recognize when Auto Scaling, database tuning, or load balancer reconfiguration is the most effective solution.


2. Using Amazon EventBridge to Trigger Remediation Actions

Key Concepts
Amazon EventBridge (formerly CloudWatch Events) enables event-driven automation by matching incoming events to predefined rules. When an event pattern matches, EventBridge can trigger targets such as AWS Lambda functions, SNS notifications, or Systems Manager automation workflows.

Common Event Sources

  • AWS Health events (service disruptions or maintenance)
  • EC2 instance state changes
  • Auto Scaling lifecycle events
  • AWS Config compliance violations

Creating an EventBridge Rule

  1. Select an Event Source
    Choose an AWS service event or define a custom application event.
  2. Define an Event Pattern
    Example: Detect when an EC2 instance transitions to a stopped state: { "source": ["aws.ec2"], "detail-type": ["EC2 Instance State-change Notification"], "detail": { "state": ["stopped"] } }
  3. Specify a Target Action
    • Send notifications using SNS
    • Invoke a Lambda function
    • Execute an SSM Automation runbook

Common Event-Driven Use Cases

Use CaseTriggerAction
Automatically restart stopped EC2 instancesEC2 state changeStart instance via Lambda
Notify administrators of AWS service issuesAWS Health eventSend notification through SNS
Remediate insecure security group changesAWS Config violationRevert configuration using SSM

📌 Exam Tips
Know how EventBridge rules work, how to define event patterns, and which services can be used as targets. Be familiar with using AWS Health events for proactive monitoring and automated remediation.


3. Automated Remediation Using AWS Config and Systems Manager Automation

Key Concepts
AWS Config continuously evaluates resource configurations against defined compliance rules. AWS Systems Manager Automation enables automated execution of remediation steps through SSM runbooks (automation documents). Together, these services provide a powerful framework for enforcing compliance and correcting configuration drift.

Remediation Workflow

  1. Define an AWS Config rule to detect non-compliant resources.
  2. Create an SSM Automation runbook that specifies remediation actions.
  3. Use Amazon EventBridge to trigger the runbook when non-compliance is detected.

Common AWS Config Rules and Automated Actions

Config RuleViolation DetectedAutomated Remediation
Unencrypted EBS volumesVolume created without encryptionEncrypt volume using SSM
Public S3 bucketsPublic access allowedUpdate bucket policy
Overly permissive security groupsOpen access on restricted portsRemove offending rules
Root account usageRoot user activity detectedNotify administrators via SNS

Example AWS CLI Command

aws ssm start-automation-execution \
  --document-name "AWSConfigRemediation-EncryptEBSVolume" \
  --parameters VolumeId=vol-12345678

📌 Exam Tips
Understand how AWS Config detects violations and how SSM Automation runbooks perform remediation. Be able to configure EventBridge rules to trigger automated responses and identify which resource types can be remediated automatically.


Key Exam Guidance

Analyze Scenario-Based Questions Carefully
Exam questions often present troubleshooting scenarios requiring you to select the most effective remediation strategy. Pay close attention to the service involved and the metric or event driving the issue.

Prioritize Automation and Cost Efficiency
AWS best practices favor automated remediation over manual intervention. Combining EventBridge with Lambda or SSM is often the preferred approach. Using AWS Config with automation ensures continuous compliance and reduces operational overhead.

Understand IAM Requirements

  • EventBridge must have permission to invoke targets.
  • SSM requires access to managed resources.
  • CloudWatch requires permissions to publish metrics and logs.

Choosing the Right AWS Service

Use CaseRecommended Service
Detecting and alerting on service issuesAWS Health, EventBridge, SNS
Automating remediation of misconfigurationsAWS Config, SSM Automation
Restarting EC2 instances automaticallyCloudWatch Alarms, Lambda
Querying logs during troubleshootingCloudWatch Logs Insights
Monitoring performance and availabilityCloudWatch Metrics and Alarms
Tracking security and compliance violationsAWS Config
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Hide picture