Testing website monitoring setup with simulated downtime, alert verification and runbook validation
# website monitoring

How to Test Your Website Monitoring Setup

Most teams set up monitoring, confirm that the dashboard shows green, and consider the job done. Then the first real incident reveals that alerts were misconfigured, the on-call engineer did not receive the notification, or the monitoring check was looking at the wrong endpoint.

Testing your monitoring setup before a real incident is essential. This guide covers how to validate every layer of your monitoring stack.

Why Testing Monitoring Is Non-Negotiable

A monitoring system that has never been tested is a monitoring system you cannot trust. Common failure modes discovered only during a real incident:

  • Alert emails going to a spam folder
  • Phone numbers or Slack webhook URLs no longer valid
  • On-call schedule not updated after team changes
  • Monitor checking a CDN-cached URL that stays green even when the origin is down
  • SSL alert threshold set too low, generating constant noise and being muted
  • Heartbeat monitor grace period set incorrectly, missing short failures

The cost of discovering these in a test is zero. The cost of discovering them during a production incident is measured in hours of undetected downtime.

Testing HTTP Uptime Monitors

Simulating Downtime

The simplest test: temporarily return a non-200 status from your application.

Maintenance mode (temporary redirect):

# nginx — return 503 for 2 minutes
location / {
    return 503 "Testing monitoring";
}

Express.js test endpoint:

// Temporarily enable via environment variable
app.get('/', (req, res) => {
  if (process.env.SIMULATE_DOWN === 'true') {
    return res.status(503).json({ error: 'Service unavailable' });
  }
  // ... normal handler
});

After triggering the simulated failure, verify:

  1. The monitor detects the failure (check dashboard)
  2. The alert fires within the expected timeframe
  3. The alert reaches the correct destination (email, SMS, Slack)
  4. The alert content is correct (right monitor name, right URL, right status code)
  5. When you restore the service, a recovery alert fires

Allow at least 5 minutes of simulated downtime to ensure the alert triggers. Most monitoring tools require 2-3 consecutive failures before alerting, so a 1-minute failure may not trigger anything.

Testing Content Verification

If your monitors check for specific text content (e.g., verifying a string that should appear on your homepage), test the negative case:

  1. Temporarily remove or rename the expected string in your response
  2. Verify the monitor detects the content mismatch and alerts

This validates that your content checks are actually working, not just checking that the server returns 200.

Testing from Multiple Locations

If you use multi-location monitoring, verify that each location reports independently. A useful test: restrict access from one geographic region (firewall rule or geo-block) and confirm the monitor for that region fires while others remain green.

Testing SSL Certificate Monitoring

SSL expiry alerts are often set up and forgotten. Test them by:

Reviewing configured thresholds: Check that your alert fires at 60, 30, and 14 days before expiry — not just one threshold.

Checking the right domains: List all domains your SSL monitor covers. Missing a subdomain is a common oversight.

Verifying alert routing: SSL expiry alerts often go to a different team (DevOps, platform) than uptime alerts (engineering on-call). Confirm the routing is correct.

You can also use SSL Labs to check your certificate details and verify that your monitoring tool is reporting the same expiry date.

Testing Domain Expiry Monitoring

Domain expiry alerts have longer timescales than SSL (domains renew annually, not every 90 days), but the test approach is similar:

  1. Verify the expiry date reported by your monitoring tool matches the actual WHOIS expiry date
  2. Confirm the alert threshold is set far enough in advance (60 days minimum)
  3. Verify the alert goes to someone who has authority to renew the domain

For WHOIS monitoring, simulate a record change by checking that your tool detects the current registrar and nameservers correctly, and review the alert configuration for registrar changes.

Testing Heartbeat Monitoring

Heartbeat monitors detect missed cron jobs and background processes. Testing them is straightforward: simply do not send the expected ping.

Method 1: Disable the job temporarily

# Comment out the cron job
# 0 3 * * * /scripts/backup.sh && curl https://domain-monitor.io/heartbeat/abc123

Wait for the grace period to expire and verify the alert fires.

Method 2: Send the ping with a test flag Some heartbeat services support a test mode that triggers an alert without requiring you to wait for a missed interval.

Method 3: Check the "last ping" timestamp Verify the monitoring dashboard shows the correct last ping time. If the timestamp is stale, the job may have stopped running without triggering an alert (if within the grace period).

See how to monitor cron jobs for heartbeat implementation details.

Testing Alert Routing and Escalation

Email Alerts

  • Send a test alert and verify receipt, including checking spam folders
  • Verify the email address is current and monitored (not a departed employee's inbox)
  • Check that the email contains enough information to act on (URL, status code, timestamp)

SMS Alerts

  • Verify phone numbers are current
  • Test across timezones if your team is distributed — SMS delivery can vary internationally
  • Confirm that SMS works outside business hours (carrier restrictions, Do Not Disturb settings)

Slack/Teams Webhooks

Slack webhook URLs expire when integrations are removed or reinstalled. Test them actively:

curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert from monitoring system"}' \
  YOUR_WEBHOOK_URL

If this returns a 200, the webhook is valid. If not, recreate the webhook integration.

PagerDuty/Opsgenie Escalation

Test the full escalation chain:

  1. Send a test incident
  2. Verify it reaches the first on-call engineer
  3. If unacknowledged within the escalation window, verify it escalates to the secondary
  4. Test the acknowledgment flow (engineer acknowledges, incident resolves)

See what is on-call management for escalation policy design.

Testing Status Page Updates

If your monitoring tool auto-updates a status page during incidents, verify this works:

  1. Trigger a simulated downtime
  2. Confirm the status page updates automatically
  3. Verify the update is visible publicly (not cached by CDN)
  4. Confirm recovery also updates the status page

Creating a Monitoring Test Checklist

Run through this checklist quarterly or after any significant infrastructure change:

  • Simulate HTTP downtime and verify alert fires and reaches on-call
  • Simulate recovery and verify recovery alert fires
  • Verify SSL certificate expiry dates are correct in monitoring dashboard
  • Verify domain expiry dates are correct
  • Test each alert channel (email, SMS, Slack, PagerDuty)
  • Verify heartbeat monitors received a ping recently (check last-seen timestamps)
  • Review on-call schedule — are all contacts current?
  • Check monitor list — are there any domains or endpoints you should be monitoring that are not listed?
  • Test status page auto-update
  • Review alert thresholds — are they still appropriate?

After a Real Incident

Every real incident is also a monitoring test. In your post-incident report, include a monitoring review section:

  • When did monitoring first detect the issue?
  • How long between detection and the first alert being received?
  • Did all configured alert channels fire?
  • Were there any false negatives — things that should have alerted but did not?

Use the answers to improve your monitoring setup before the next incident. The goal is continuous improvement: each incident should make your monitoring more reliable than it was before.


Run a monitoring test at Domain Monitor — verify your alerts, SSL checks, and heartbeat monitors are all working correctly.

More posts

What Is Generative AI? How It Works and What It Creates

Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.

Read more
What Is Cursor AI? The AI Code Editor Explained

Cursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.

Read more
What Is Claude Opus? Anthropic's Most Powerful Model Explained

Claude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.