
CI/CD pipelines automate the path from code commit to production deployment — but that automation can fail, and failed deployments can take your site down. Monitoring your deployment pipeline is part of a complete website reliability strategy.
The most common cause of website downtime is a bad deployment. Automated CI/CD pipelines can:
Uptime monitoring catches the result of these failures (the site is down or returning errors), while pipeline monitoring helps you understand the cause and take faster corrective action.
The most valuable integration: verify production uptime immediately after every deployment.
Configure your deployment pipeline to wait for uptime monitors to confirm health before completing:
# GitHub Actions example
- name: Deploy to production
run: ./deploy.sh
- name: Wait for deployment health
run: |
for i in {1..10}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourdomain.com/health)
if [ "$STATUS" = "200" ]; then
echo "Deployment healthy"
exit 0
fi
echo "Attempt $i: Status $STATUS, retrying..."
sleep 30
done
echo "Deployment health check failed!"
exit 1
This fails the pipeline if the application isn't healthy within 5 minutes of deployment — and the failed pipeline status tells you to investigate.
For more sophisticated pipelines, trigger automatic rollback if the health check fails:
#!/bin/bash
# deploy-with-rollback.sh
echo "Deploying new version..."
./deploy.sh
echo "Checking deployment health..."
for i in {1..6}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourdomain.com/health)
if [ "$STATUS" = "200" ]; then
echo "Deployment successful"
exit 0
fi
sleep 30
done
echo "Deployment failed health check, rolling back..."
./rollback.sh
exit 1
When deploying, set a maintenance window to suppress false alerts:
This prevents your team from being paged during expected downtime, while still alerting if the deployment maintenance window expires and the site is still down.
Track these pipeline metrics to measure CI/CD reliability:
Deployment frequency: How often are you deploying? Higher frequency with good monitoring is healthier than infrequent, large deployments.
Deployment failure rate: What percentage of deployments fail? A rising failure rate indicates declining code quality or test coverage.
Time to recovery: When a deployment fails, how long to roll back or fix? This is your deployment MTTR.
Change failure rate: What percentage of deployments cause a production incident? Industry benchmarks (DORA metrics) target < 15% for elite performers.
CI/CD pipelines often include scheduled jobs — nightly builds, weekly reports, database migrations. Use heartbeat monitoring to verify these run on schedule:
# At the end of your scheduled workflow
- name: Ping heartbeat
if: success()
run: curl -s https://monitoring-url/ping/YOUR_TOKEN
The if: success() condition ensures the heartbeat only pings when the job succeeds — not on failure. This means missed heartbeats indicate either a job failure or a scheduling problem.
After each deployment, run a minimal set of smoke tests that verify critical functionality:
# smoke-test.sh
BASE_URL="https://yourdomain.com"
# Check homepage
check() {
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$1")
if [ "$STATUS" != "$2" ]; then
echo "FAIL: $1 returned $STATUS (expected $2)"
exit 1
fi
echo "OK: $1"
}
check "$BASE_URL" 200
check "$BASE_URL/health" 200
check "$BASE_URL/api/health" 200
check "$BASE_URL/login" 200
Run smoke tests as a step in your deployment pipeline. Failed smoke tests trigger rollback.
Your external uptime monitoring sees the real user experience during deployments. This is valuable data:
Correlating your monitoring timeline with deployment events (available in CI/CD tool logs) helps you understand the user impact of your deployment strategy.
Monitor the outcome of every deployment in real time at Domain Monitor.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.