
AWS EC2 provides reliable virtual servers, but individual instances still fail. Hardware failures, runaway processes, full disks, application crashes — any of these can take an EC2-hosted application offline. Monitoring your EC2 instances means combining AWS-native tools with external uptime monitoring for complete visibility.
The most important check for an EC2-hosted application is an external HTTP monitor pointing at your domain or IP:
Monitor: https://yourdomain.com
Expected status: 200
Interval: 1 minute
This validates the complete path from the internet to your application: DNS → Elastic IP or Load Balancer → EC2 instance → web server → application.
An EC2 instance can appear healthy in CloudWatch while being completely unreachable externally due to:
External monitoring from Domain Monitor catches what AWS internal metrics miss.
CloudWatch provides built-in EC2 metrics that complement external monitoring:
Basic Monitoring (free):
Detailed Monitoring (charged): Same metrics at 1-minute granularity instead of 5-minute.
EC2 has two built-in status checks:
Set up CloudWatch Alarms on StatusCheckFailed to get notified of instance-level failures. This is complementary to external monitoring — it catches low-level failures before they cause application unavailability.
aws cloudwatch put-metric-alarm \
--alarm-name "EC2-StatusCheck-Failed" \
--alarm-description "EC2 instance status check failed" \
--metric-name StatusCheckFailed \
--namespace AWS/EC2 \
--dimensions Name=InstanceId,Value=i-xxxxxxxxxxxx \
--period 60 \
--evaluation-periods 2 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--alarm-actions arn:aws:sns:region:account-id:your-sns-topic \
--statistic Maximum
If you're using Auto Scaling groups, health checks determine when instances are replaced:
{
"HealthCheckType": "ELB",
"HealthCheckGracePeriod": 300
}
With ELB health check type, Auto Scaling uses your load balancer's health check results. An instance failing load balancer health checks is terminated and replaced.
For HTTP health checks at the load balancer:
Path: /health
Protocol: HTTP
Port: traffic-port
Healthy threshold: 2
Unhealthy threshold: 3
Timeout: 5 seconds
Interval: 30 seconds
If your EC2 instances sit behind an Application Load Balancer (ALB), monitor the ALB endpoint:
Monitor: https://yourdomain.com/health
(or the ALB DNS name directly)
This validates that the ALB is routing traffic to healthy instances. Monitor the ALB endpoint rather than individual instance IPs — the ALB endpoint represents what users actually experience.
EC2 applications typically terminate SSL at:
Either way, monitor your SSL certificate with advance expiry alerts. SSL certificate monitoring with 30-day warnings prevents certificate expiry causing outages.
Add a health endpoint to your application running on EC2:
# Flask example
@app.route('/health')
def health():
return jsonify({'status': 'ok'}), 200
This endpoint is used by:
Keep it lightweight — just verify the application process is running.
EC2 instances are in a single AWS region. Outages can be regional — affecting your instance while other regions are fine. Multi-location monitoring from multiple geographic locations provides confidence that the issue (or recovery) is universal.
| Failure Type | Detection |
|---|---|
| Application process crash | External HTTP monitor |
| EC2 hardware failure | CloudWatch StatusCheckFailed |
| Full disk | CloudWatch DiskUtilization alarm |
| High CPU | CloudWatch CPUUtilization alarm |
| SSL certificate expiry | External SSL monitor |
| Domain expiry | Domain expiry monitor |
| Load balancer routing failure | External HTTP monitor |
Use Domain Monitor for external HTTP and SSL monitoring; CloudWatch alarms for EC2 infrastructure health.
Monitor your AWS EC2 applications externally at Domain Monitor — the layer that confirms users can actually reach your instance.
A subdomain takeover lets an attacker claim your subdomain by exploiting dangling DNS records. Learn how it happens, real-world examples, and how DNS monitoring detects it.
Read moreMean time to detect (MTTD) measures how long it takes to discover an incident after it starts. Reducing MTTD is one of the highest-leverage improvements in reliability engineering.
Read moreBlack box monitoring tests your systems from the outside, the way users experience them — without access to internal code or infrastructure. Learn how it works and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.