Testing website monitoring setup with simulated downtime, alert verification and runbook validation

How to Test Your Website Monitoring Setup

Most teams set up monitoring, confirm that the dashboard shows green, and consider the job done. Then the first real incident reveals that alerts were misconfigured, the on-call engineer did not receive the notification, or the monitoring check was looking at the wrong endpoint.

Testing your monitoring setup before a real incident is essential. This guide covers how to validate every layer of your monitoring stack.

Why Testing Monitoring Is Non-Negotiable

A monitoring system that has never been tested is a monitoring system you cannot trust. Common failure modes discovered only during a real incident:

Alert emails going to a spam folder
Phone numbers or Slack webhook URLs no longer valid
On-call schedule not updated after team changes
Monitor checking a CDN-cached URL that stays green even when the origin is down
SSL alert threshold set too low, generating constant noise and being muted
Heartbeat monitor grace period set incorrectly, missing short failures

The cost of discovering these in a test is zero. The cost of discovering them during a production incident is measured in hours of undetected downtime.

Testing HTTP Uptime Monitors

Simulating Downtime

The simplest test: temporarily return a non-200 status from your application.

Maintenance mode (temporary redirect):

# nginx — return 503 for 2 minutes
location / {
    return 503 "Testing monitoring";
}

Express.js test endpoint:

// Temporarily enable via environment variable
app.get('/', (req, res) => {
  if (process.env.SIMULATE_DOWN === 'true') {
    return res.status(503).json({ error: 'Service unavailable' });
  }
  // ... normal handler
});

After triggering the simulated failure, verify:

The monitor detects the failure (check dashboard)
The alert fires within the expected timeframe
The alert reaches the correct destination (email, SMS, Slack)
The alert content is correct (right monitor name, right URL, right status code)
When you restore the service, a recovery alert fires

Allow at least 5 minutes of simulated downtime to ensure the alert triggers. Most monitoring tools require 2-3 consecutive failures before alerting, so a 1-minute failure may not trigger anything.

Testing Content Verification

If your monitors check for specific text content (e.g., verifying a string that should appear on your homepage), test the negative case:

Temporarily remove or rename the expected string in your response
Verify the monitor detects the content mismatch and alerts

This validates that your content checks are actually working, not just checking that the server returns 200.

Testing from Multiple Locations

If you use multi-location monitoring, verify that each location reports independently. A useful test: restrict access from one geographic region (firewall rule or geo-block) and confirm the monitor for that region fires while others remain green.

Testing SSL Certificate Monitoring

SSL expiry alerts are often set up and forgotten. Test them by:

Reviewing configured thresholds: Check that your alert fires at 60, 30, and 14 days before expiry — not just one threshold.

Checking the right domains: List all domains your SSL monitor covers. Missing a subdomain is a common oversight.

Verifying alert routing: SSL expiry alerts often go to a different team (DevOps, platform) than uptime alerts (engineering on-call). Confirm the routing is correct.

You can also use SSL Labs to check your certificate details and verify that your monitoring tool is reporting the same expiry date.

Testing Domain Expiry Monitoring

Domain expiry alerts have longer timescales than SSL (domains renew annually, not every 90 days), but the test approach is similar:

Verify the expiry date reported by your monitoring tool matches the actual WHOIS expiry date
Confirm the alert threshold is set far enough in advance (60 days minimum)
Verify the alert goes to someone who has authority to renew the domain

For WHOIS monitoring, simulate a record change by checking that your tool detects the current registrar and nameservers correctly, and review the alert configuration for registrar changes.

Testing Heartbeat Monitoring

Heartbeat monitors detect missed cron jobs and background processes. Testing them is straightforward: simply do not send the expected ping.

Method 1: Disable the job temporarily

# Comment out the cron job
# 0 3 * * * /scripts/backup.sh && curl https://domain-monitor.io/heartbeat/abc123

Wait for the grace period to expire and verify the alert fires.

Method 2: Send the ping with a test flag Some heartbeat services support a test mode that triggers an alert without requiring you to wait for a missed interval.

Method 3: Check the "last ping" timestamp Verify the monitoring dashboard shows the correct last ping time. If the timestamp is stale, the job may have stopped running without triggering an alert (if within the grace period).

See how to monitor cron jobs for heartbeat implementation details.

Testing Alert Routing and Escalation

Email Alerts

Send a test alert and verify receipt, including checking spam folders
Verify the email address is current and monitored (not a departed employee's inbox)
Check that the email contains enough information to act on (URL, status code, timestamp)

SMS Alerts

Verify phone numbers are current
Test across timezones if your team is distributed — SMS delivery can vary internationally
Confirm that SMS works outside business hours (carrier restrictions, Do Not Disturb settings)

Slack/Teams Webhooks

Slack webhook URLs expire when integrations are removed or reinstalled. Test them actively:

curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert from monitoring system"}' \
  YOUR_WEBHOOK_URL

If this returns a 200, the webhook is valid. If not, recreate the webhook integration.

PagerDuty/Opsgenie Escalation

Test the full escalation chain:

Send a test incident
Verify it reaches the first on-call engineer
If unacknowledged within the escalation window, verify it escalates to the secondary
Test the acknowledgment flow (engineer acknowledges, incident resolves)

See what is on-call management for escalation policy design.

Testing Status Page Updates

If your monitoring tool auto-updates a status page during incidents, verify this works:

Trigger a simulated downtime
Confirm the status page updates automatically
Verify the update is visible publicly (not cached by CDN)
Confirm recovery also updates the status page

Creating a Monitoring Test Checklist

Run through this checklist quarterly or after any significant infrastructure change:

After a Real Incident

Every real incident is also a monitoring test. In your post-incident report, include a monitoring review section:

When did monitoring first detect the issue?
How long between detection and the first alert being received?
Did all configured alert channels fire?
Were there any false negatives — things that should have alerted but did not?

Use the answers to improve your monitoring setup before the next incident. The goal is continuous improvement: each incident should make your monitoring more reliable than it was before.

Run a monitoring test at Domain Monitor — verify your alerts, SSL checks, and heartbeat monitors are all working correctly.

What Is a Subdomain Takeover and How to Prevent It

A subdomain takeover lets an attacker claim your subdomain by exploiting dangling DNS records. Learn how it happens, real-world examples, and how DNS monitoring detects it.

What Is Mean Time to Detect (MTTD)?

Mean time to detect (MTTD) measures how long it takes to discover an incident after it starts. Reducing MTTD is one of the highest-leverage improvements in reliability engineering.

What Is Black Box Monitoring?

Black box monitoring tests your systems from the outside, the way users experience them — without access to internal code or infrastructure. Learn how it works and when to use it.

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.

View pricing & plans

Domain Monitoring

Uptime Monitoring

SSL Monitoring

WHOIS Lookup

Notifications

Status Pages

Ping test

Traceroute test

Find my website's IP

# website monitoring