
Setting up uptime monitoring is straightforward. Setting it up well — in a way that catches real problems fast, minimises false alarms, and gives you useful data for improving reliability — takes a bit more thought.
This guide covers the best practices that separate effective monitoring from monitoring that generates noise and gets ignored.
The most common monitoring mistake is checking the wrong thing. A monitor on your server's IP address bypasses DNS. A monitor pointing at your homepage's HTML doesn't test your API. A monitor with no content verification doesn't catch blank pages caused by database failures.
Best practice: Monitor the URL your users use, from outside your infrastructure, checking for the content or response code that confirms the service is working.
For most applications, this means:
Check frequency determines your maximum detection time — the worst-case gap between when a failure starts and when you're alerted.
| Check Interval | Max Detection Time | Best For |
|---|---|---|
| 30 seconds | 30 seconds | High-traffic production, SLA-sensitive |
| 1 minute | 1 minute | Standard production applications |
| 5 minutes | 5 minutes | Internal tools, lower-criticality |
| 15+ minutes | 15+ minutes | Development/staging environments |
For most production websites, 1-minute checks provide an excellent balance of responsiveness and cost. For revenue-critical e-commerce or high-stakes SaaS applications, 30-second checks are justified.
More detail on this decision in how to choose your monitoring check frequency.
A monitor checking from a single location gives you a partial picture. If that location has a network problem, you get false positives. If your site is down only in certain regions, you miss the incident entirely.
Best practice: Check from at least 3 geographically distributed locations. This provides:
Multi-location uptime monitoring is covered in depth in its own guide.
A single failed check can be caused by a momentary network blip between the monitoring server and your site — not a real outage. Without confirmation counts, you'll receive alerts for transient failures that resolve in seconds.
Best practice: Require 2 consecutive failures before alerting.
With 1-minute checks and 2-failure confirmation:
If you use 30-second checks with a 3-failure confirmation, you detect real outages within 90 seconds — fast enough for almost any use case.
SSL certificate expiry is one of the most preventable causes of website downtime. An expired certificate causes browsers to block access with a security warning — effectively taking your site offline for most users.
Best practice: Set up SSL certificate monitoring with alerts at:
Most certificate issuers (including Let's Encrypt) auto-renew, but automation fails. Early warnings give you time to intervene before expiry.
Domain expiry is even more catastrophic than SSL expiry — an expired domain can be registered by someone else. Domain expiry monitoring with 60-day advance alerts ensures you never lose your domain.
Good monitoring with bad alerting is still ineffective. Alerts need to reach the right person through the right channel at the right time.
Best practice for alert routing:
Avoid routing all alerts to a shared email inbox — critical alerts get buried. Use dedicated channels for monitoring alerts.
See the full guide on how to set up downtime alerts.
Planned maintenance — deployments, database migrations, infrastructure work — should not generate alerts. Schedule maintenance windows to suppress alerts during these periods.
Without maintenance windows:
Your website depends on services beyond your own code. A failure in any of these causes your site to fail, even if your application is perfectly healthy:
Best practice: Monitor the health endpoints of critical third-party services. Many major services publish public status pages — subscribe to these.
For your own dependencies, set up separate monitors for each: monitoring third-party API dependencies explains this approach.
Uptime monitoring is not a set-and-forget operation. Review your monitoring setup quarterly:
Create a runbook that documents:
This is invaluable during incidents when you're stressed and need to act quickly. It also ensures the setup survives team changes.
Only monitoring your homepage: Critical API failures often don't affect the homepage. Monitor key endpoints separately.
Using only email alerts: Email isn't reliable enough for immediate incident notification. Use SMS or push notifications for downtime.
Setting too aggressive thresholds: 1-failure confirmation on a 30-second interval generates noise. Tune for signal-to-noise ratio.
Ignoring SSL and domain monitoring: These are entirely preventable failures. Set up the warnings.
Never testing your alerting: Routing misconfiguration means you only discover alerts are broken during a real incident.
Implement these best practices with Domain Monitor — uptime monitoring with multi-location checks, SMS alerts, SSL monitoring, and maintenance windows.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.