
If you've committed to a 99.9% uptime SLA, you've implicitly created an error budget — 0.1% of the time, or about 8.7 hours per year, where your service is allowed to be unavailable. How you spend that budget determines how quickly you can ship features and how much risk you can take with deployments.
Error budgets are a concept from Site Reliability Engineering (SRE) popularised by Google, and they provide a systematic framework for making reliability vs. velocity trade-offs.
An error budget is the maximum amount of unreliability you're willing to tolerate in a given period, derived directly from your uptime target.
Error budget = 1 − SLA target
For common SLA targets:
| Uptime SLA | Error Budget (Annual) | Error Budget (Monthly) |
|---|---|---|
| 99% | 3.65 days | 7.3 hours |
| 99.5% | 1.83 days | 3.65 hours |
| 99.9% | 8.76 hours | 43.8 minutes |
| 99.95% | 4.38 hours | 21.9 minutes |
| 99.99% | 52.6 minutes | 4.38 minutes |
See what does 99.9% uptime really mean? for a more detailed breakdown of what these numbers mean in practice.
The key insight of the error budget model is this: if you're not spending your error budget, you're being too conservative. An organisation with a 99.9% SLA that achieves 99.99% uptime has "left error budget on the table" — they could have shipped more features, experimented more aggressively, or deployed more frequently.
Error budgets give product and engineering teams a shared language for reliability trade-offs:
This removes the subjective argument of "is it safe to ship?" and replaces it with a data-driven answer: "how much budget do we have left?"
Organisations with mature reliability practices document their error budget policies explicitly:
Your uptime SLA is your starting point. If you haven't formally defined one, start with a realistic target based on your infrastructure and team capacity.
This is where website uptime monitoring becomes essential. You can't track error budget consumption without accurate uptime data.
Your monitoring tool's reports give you:
If your monthly error budget for 99.9% SLA is 43.8 minutes, and you had 12 minutes of downtime this month:
Budget consumed: 12 / 43.8 = 27.4% Budget remaining: 72.6%
Track your error budget consumption week over week. A budget that's 80% consumed in week 2 of 4 is a signal to slow down.
Error budgets originated in large-scale SRE organisations like Google, but the concept scales down to small teams and products.
A simplified approach for smaller organisations:
You don't need a formal error budget policy to benefit from this thinking. Simply knowing that you have X minutes of allowed downtime this month — and knowing how much you've used — changes how you make deployment decisions.
Any period where your service is unavailable or degraded below your SLA threshold consumes error budget. This includes:
Your uptime monitoring records all of these, providing the data you need for accurate budget calculations.
Error budget consumption is the product of incident frequency and incident duration. Reducing either reduces budget consumption:
This is why MTTR and error budget tracking work together — minimising MTTR directly reduces error budget consumption.
You don't need complex tooling to start tracking error budgets. A monitoring tool like Domain Monitor provides the uptime data you need. A simple spreadsheet can calculate budget consumption from that data.
As your reliability practices mature, dedicated SLO tracking tools (Grafana SLO, Datadog SLOs, Google Cloud SLOs) can automate the calculation and alerting.
Track your uptime data with precision at Domain Monitor.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.