
When your uptime monitoring fires an alert at 2am, someone needs to be ready to respond. On-call management is the system that ensures there's always a designated person ready to handle incidents — with clear escalation paths when the primary responder can't be reached.
For any production service that requires high availability, someone needs to be reachable 24/7. Without a formal on-call system:
An on-call system makes the responsibility explicit, fair, and well-understood.
An on-call rotation is a schedule defining who is the designated incident responder at any given time. Common rotation patterns:
Weekly rotation: One person is primary on-call for a week at a time. Simple to schedule, but can be exhausting for the on-call person.
Daily handoff: On-call shifts change daily. More complex to schedule but distributes the burden more evenly.
Follow-the-sun: For global teams, on-call shifts align with working hours in different time zones — European team covers European hours, US team covers US hours. No one is on-call outside their working day.
Pooled rotation: A group of on-call engineers share responsibility, rotating primary and secondary positions.
Most on-call systems have at least two tiers:
Primary on-call: Receives initial alerts. Expected to acknowledge within 5-10 minutes and begin investigation.
Secondary on-call: Receives escalated alerts if the primary doesn't acknowledge within the escalation timeout. Backup when the primary is unreachable.
This two-tier system prevents alerts from going unacknowledged — if the primary is asleep with phone on silent, the secondary catches it.
Configure your uptime monitoring to deliver alerts at the right severity to the right people:
| Severity | Initial Alert | Escalation |
|---|---|---|
| P1 (complete outage) | SMS to primary on-call | After 5 min: SMS to secondary |
| P2 (major degradation) | SMS to primary on-call | After 10 min: Slack to team |
| P3 (partial issue) | Slack to team | Manual escalation if needed |
| P4 (minor) | Next business day |
The downtime alerts guide covers configuring multiple recipients and alert channels.
An escalation policy defines what happens when alerts aren't acknowledged:
Escalation ensures critical incidents always get a response, even when individual people are unreachable.
On-call duty is stressful. Teams that don't manage it well experience:
Mitigation strategies:
When the on-call engineer is paged at 3am, they shouldn't need to remember everything about the system. Write runbooks for your most common incidents:
Good runbooks reduce mean time to recovery dramatically. See also: incident response plan template.
| Tool | What It Provides |
|---|---|
| PagerDuty | Full on-call rotation, escalation, incident management |
| OpsGenie | On-call scheduling, alerts, escalation |
| VictorOps (Splunk) | Incident response platform with on-call features |
| Better Uptime | Built-in on-call with monitoring |
| Domain Monitor | Monitoring + configurable multi-contact alerting |
For small teams, configuring multiple alert recipients with priority escalation in Domain Monitor handles basic on-call routing without a dedicated tool. As the team grows, dedicated on-call tools provide more sophisticated rotation management.
If your team currently handles incidents informally ("whoever notices the alert deals with it"), transitioning to formal on-call:
The transition is uncomfortable but worth it — clarity about who is responsible reduces both response time and team stress.
Set up alert routing for your on-call team at Domain Monitor.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.