Monitoring CI/CD Pipelines and Deployment Health

CI/CD pipelines automate the path from code commit to production deployment — but that automation can fail, and failed deployments can take your site down. Monitoring your deployment pipeline is part of a complete website reliability strategy.

Why Pipeline Monitoring Matters

The most common cause of website downtime is a bad deployment. Automated CI/CD pipelines can:

Deploy broken code to production
Fail silently — the pipeline "succeeded" but the application is broken
Deploy partially — some instances updated, others not (split-brain)
Break rollbacks — deployment tooling fails, leaving you stuck on broken code

Uptime monitoring catches the result of these failures (the site is down or returning errors), while pipeline monitoring helps you understand the cause and take faster corrective action.

Integrating Uptime Monitoring with Deployments

The most valuable integration: verify production uptime immediately after every deployment.

Post-Deployment Health Check

Configure your deployment pipeline to wait for uptime monitors to confirm health before completing:

# GitHub Actions example
- name: Deploy to production
  run: ./deploy.sh

- name: Wait for deployment health
  run: |
    for i in {1..10}; do
      STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourdomain.com/health)
      if [ "$STATUS" = "200" ]; then
        echo "Deployment healthy"
        exit 0
      fi
      echo "Attempt $i: Status $STATUS, retrying..."
      sleep 30
    done
    echo "Deployment health check failed!"
    exit 1

This fails the pipeline if the application isn't healthy within 5 minutes of deployment — and the failed pipeline status tells you to investigate.

Automatic Rollback on Health Failure

For more sophisticated pipelines, trigger automatic rollback if the health check fails:

#!/bin/bash
# deploy-with-rollback.sh

echo "Deploying new version..."
./deploy.sh

echo "Checking deployment health..."
for i in {1..6}; do
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourdomain.com/health)
    if [ "$STATUS" = "200" ]; then
        echo "Deployment successful"
        exit 0
    fi
    sleep 30
done

echo "Deployment failed health check, rolling back..."
./rollback.sh
exit 1

Maintenance Windows During Deployments

When deploying, set a maintenance window to suppress false alerts:

Set maintenance window in your monitoring tool (e.g., 10 minutes)
Deploy
Verify health manually or via pipeline
Close maintenance window (or let it expire)

This prevents your team from being paged during expected downtime, while still alerting if the deployment maintenance window expires and the site is still down.

Monitoring Pipeline Reliability

Track these pipeline metrics to measure CI/CD reliability:

Deployment frequency: How often are you deploying? Higher frequency with good monitoring is healthier than infrequent, large deployments.

Deployment failure rate: What percentage of deployments fail? A rising failure rate indicates declining code quality or test coverage.

Time to recovery: When a deployment fails, how long to roll back or fix? This is your deployment MTTR.

Change failure rate: What percentage of deployments cause a production incident? Industry benchmarks (DORA metrics) target < 15% for elite performers.

Heartbeat Monitoring for Scheduled Jobs

CI/CD pipelines often include scheduled jobs — nightly builds, weekly reports, database migrations. Use heartbeat monitoring to verify these run on schedule:

# At the end of your scheduled workflow
- name: Ping heartbeat
  if: success()
  run: curl -s https://monitoring-url/ping/YOUR_TOKEN

The if: success() condition ensures the heartbeat only pings when the job succeeds — not on failure. This means missed heartbeats indicate either a job failure or a scheduling problem.

Deployment Smoke Tests

After each deployment, run a minimal set of smoke tests that verify critical functionality:

# smoke-test.sh
BASE_URL="https://yourdomain.com"

# Check homepage
check() {
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$1")
    if [ "$STATUS" != "$2" ]; then
        echo "FAIL: $1 returned $STATUS (expected $2)"
        exit 1
    fi
    echo "OK: $1"
}

check "$BASE_URL" 200
check "$BASE_URL/health" 200
check "$BASE_URL/api/health" 200
check "$BASE_URL/login" 200

Run smoke tests as a step in your deployment pipeline. Failed smoke tests trigger rollback.

What External Monitoring Sees During Deployments

Your external uptime monitoring sees the real user experience during deployments. This is valuable data:

Rolling deployments: Brief response time increase as new instances start; no visible downtime if rolling is done correctly
Big-bang deployments: Brief downtime (detectable by your monitors)
Failed deployments: Sustained errors or downtime until rollback

Correlating your monitoring timeline with deployment events (available in CI/CD tool logs) helps you understand the user impact of your deployment strategy.

Monitor the outcome of every deployment in real time at Domain Monitor.

Monitoring CI/CD Pipelines and Deployment Health

Why Pipeline Monitoring Matters

Integrating Uptime Monitoring with Deployments

Post-Deployment Health Check

Automatic Rollback on Health Failure

Maintenance Windows During Deployments

Monitoring Pipeline Reliability

Heartbeat Monitoring for Scheduled Jobs

Deployment Smoke Tests

What External Monitoring Sees During Deployments

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.

Domain Monitoring

Uptime Monitoring

SSL Monitoring

WHOIS Lookup

Notifications

Status Pages

Ping test

Traceroute test

Find my website's IP

# website monitoring

Monitoring CI/CD Pipelines and Deployment Health

Why Pipeline Monitoring Matters

Integrating Uptime Monitoring with Deployments

Post-Deployment Health Check

Automatic Rollback on Health Failure

Maintenance Windows During Deployments

Monitoring Pipeline Reliability

Heartbeat Monitoring for Scheduled Jobs

Deployment Smoke Tests

What External Monitoring Sees During Deployments

Related Articles

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.