
SLI, SLO, and SLA. Three acronyms that look similar, get used interchangeably by people who should know better, and represent meaningfully different things. If you're running a SaaS product and trying to think seriously about reliability, understanding the distinction matters.
This guide explains each concept clearly and shows how small SaaS teams can apply them without the overhead of a dedicated SRE team.
They build on each other. SLIs are the raw data. SLOs are your internal goals for that data. SLAs are the promises you make externally based on your confidence in meeting those goals.
An SLI is a quantitative measurement of some aspect of your service's behaviour. It answers the question: how is the service performing right now, and historically?
Common SLIs for SaaS products:
Availability — The percentage of time the service is accessible and returning successful responses. Typically measured as: (successful requests / total requests) × 100. A request returning a 500 error counts as a failure.
Latency — How long it takes for requests to complete. Usually expressed as a percentile: "99th percentile response time is under 400ms."
Error rate — The percentage of requests returning errors. The inverse of availability.
Throughput — Requests processed per second. Relevant when capacity is a concern.
Freshness — For data-driven features, how recent is the data? A dashboard that shows data from 3 hours ago when it should be real-time is failing a freshness SLI even if it's technically "up."
SLIs are measurements, not judgements. "We had 99.7% availability last month" is an SLI. "That's good enough" is an SLO judgment. "We guarantee 99.9%" is an SLA commitment.
An SLO is the internal target you set for an SLI. It answers: how good does our service need to be?
Examples:
SLOs are internal commitments — you set them to guide engineering decisions and prioritisation. Breaching an SLO is a signal that reliability work needs attention, not necessarily that customers are going to be compensated.
The concept of an error budget comes from SLOs. If your SLO is 99.9% availability, your monthly error budget is 0.1% of that month — about 43 minutes of allowed downtime.
Error budgets make reliability concrete. If you've used 80% of your error budget with two weeks left in the month, you know the rest of the month needs to be very clean. If you haven't used much of your budget, you have room to deploy aggressively or run experiments.
For small teams, formal error budget tracking may be overkill, but the concept is useful: how much availability can we afford to lose this month while still meeting our targets?
A common mistake: setting aspirational SLOs you can't actually meet. An SLO of 99.99% means you can only be down for 4.4 minutes per month — achievable for large infrastructure teams, extremely difficult for a small team without redundancy.
Start by measuring your actual SLI data. Set your SLO based on what you can realistically achieve and sustain, then improve over time. An SLO you consistently miss is useless; one you consistently meet gives you a baseline to build from.
An SLA is a formal, contractual commitment to customers about service performance. Unlike SLOs (internal goals), SLAs have external consequences: service credits, refunds, contract clauses.
A typical SaaS SLA might read:
Domain Monitor guarantees 99.9% monthly uptime for the monitoring service. In the event of downtime below this threshold, affected customers are eligible for service credits of 10% of their monthly fee for each 0.1% below the guaranteed level, up to 50% of monthly fees.
SLAs should be more conservative than SLOs. If your SLO is 99.9%, your SLA might commit to 99.5% — giving yourself a buffer between the target you shoot for and the level you legally guarantee.
You don't want to be paying service credits because you just barely missed your internal target. The gap between SLO and SLA is your safety margin.
Not always. Many early-stage SaaS products operate without formal SLAs and customers accept that. As you move upmarket — particularly to enterprise customers — SLAs become expected. Procurement processes often require them.
If you're not ready for formal SLAs, a public commitment on your website ("we target 99.9% uptime, tracked on our status page") gives customers visibility without creating legal obligations.
You can't set meaningful SLOs without measurement. At minimum:
Run your monitoring for 30–90 days before setting targets. See what you actually achieve. Use that baseline to set realistic SLOs.
For a small SaaS, two or three SLOs are plenty:
Review them quarterly. As your reliability improves, tighten them.
A public status page shows your uptime history and active incidents. It's the foundation of transparency with customers and supports any SLA claims. See how to create a public status page.
Once your SLIs are measured, your SLOs are consistently met, and customers are asking — introduce SLAs. Set them below your SLOs, define the credit structure, and put it in your terms of service.
For more on SLAs and uptime reporting, see uptime SLA guide, what does 99.9% uptime really mean, and how to interpret uptime reports.
When your site goes down, your status page becomes the most important page you have. Here's why it matters, what happens when you don't have one, and what a good status page does during a real outage.
Read moreYour domain is resolving, but pointing to the wrong server — showing old content, a previous host's page, or someone else's site entirely. Here's what causes this and how to diagnose it.
Read moreUptime monitoring isn't foolproof. Single-location monitors, wrong health check endpoints, long check intervals, and false positives can all cause real downtime to go undetected. Here's what to watch out for.
Read moreLooking to monitor your website and domains? Join our platform and start today.