Dashboard showing reliability metrics with SLI gauge charts, SLO targets and SLA commitment indicators
# website monitoring

SLI vs SLO vs SLA: A Practical Guide for Small SaaS Teams

SLI, SLO, and SLA. Three acronyms that look similar, get used interchangeably by people who should know better, and represent meaningfully different things. If you're running a SaaS product and trying to think seriously about reliability, understanding the distinction matters.

This guide explains each concept clearly and shows how small SaaS teams can apply them without the overhead of a dedicated SRE team.

The One-Sentence Definitions

  • SLI (Service Level Indicator) — A measurement of how your service is actually performing
  • SLO (Service Level Objective) — A target you set for that measurement
  • SLA (Service Level Agreement) — A contractual commitment to customers, with consequences if you breach it

They build on each other. SLIs are the raw data. SLOs are your internal goals for that data. SLAs are the promises you make externally based on your confidence in meeting those goals.


SLI: Service Level Indicator

An SLI is a quantitative measurement of some aspect of your service's behaviour. It answers the question: how is the service performing right now, and historically?

Common SLIs for SaaS products:

Availability — The percentage of time the service is accessible and returning successful responses. Typically measured as: (successful requests / total requests) × 100. A request returning a 500 error counts as a failure.

Latency — How long it takes for requests to complete. Usually expressed as a percentile: "99th percentile response time is under 400ms."

Error rate — The percentage of requests returning errors. The inverse of availability.

Throughput — Requests processed per second. Relevant when capacity is a concern.

Freshness — For data-driven features, how recent is the data? A dashboard that shows data from 3 hours ago when it should be real-time is failing a freshness SLI even if it's technically "up."

What SLIs Aren't

SLIs are measurements, not judgements. "We had 99.7% availability last month" is an SLI. "That's good enough" is an SLO judgment. "We guarantee 99.9%" is an SLA commitment.


SLO: Service Level Objective

An SLO is the internal target you set for an SLI. It answers: how good does our service need to be?

Examples:

  • Availability SLO: 99.9% uptime measured over a rolling 30-day window
  • Latency SLO: 95th percentile response time under 300ms
  • Error rate SLO: fewer than 0.1% of API requests return 5xx errors

SLOs are internal commitments — you set them to guide engineering decisions and prioritisation. Breaching an SLO is a signal that reliability work needs attention, not necessarily that customers are going to be compensated.

Error Budgets

The concept of an error budget comes from SLOs. If your SLO is 99.9% availability, your monthly error budget is 0.1% of that month — about 43 minutes of allowed downtime.

Error budgets make reliability concrete. If you've used 80% of your error budget with two weeks left in the month, you know the rest of the month needs to be very clean. If you haven't used much of your budget, you have room to deploy aggressively or run experiments.

For small teams, formal error budget tracking may be overkill, but the concept is useful: how much availability can we afford to lose this month while still meeting our targets?

Setting Realistic SLOs

A common mistake: setting aspirational SLOs you can't actually meet. An SLO of 99.99% means you can only be down for 4.4 minutes per month — achievable for large infrastructure teams, extremely difficult for a small team without redundancy.

Start by measuring your actual SLI data. Set your SLO based on what you can realistically achieve and sustain, then improve over time. An SLO you consistently miss is useless; one you consistently meet gives you a baseline to build from.


SLA: Service Level Agreement

An SLA is a formal, contractual commitment to customers about service performance. Unlike SLOs (internal goals), SLAs have external consequences: service credits, refunds, contract clauses.

A typical SaaS SLA might read:

Domain Monitor guarantees 99.9% monthly uptime for the monitoring service. In the event of downtime below this threshold, affected customers are eligible for service credits of 10% of their monthly fee for each 0.1% below the guaranteed level, up to 50% of monthly fees.

SLA vs SLO: The Relationship

SLAs should be more conservative than SLOs. If your SLO is 99.9%, your SLA might commit to 99.5% — giving yourself a buffer between the target you shoot for and the level you legally guarantee.

You don't want to be paying service credits because you just barely missed your internal target. The gap between SLO and SLA is your safety margin.

Do Small SaaS Teams Need SLAs?

Not always. Many early-stage SaaS products operate without formal SLAs and customers accept that. As you move upmarket — particularly to enterprise customers — SLAs become expected. Procurement processes often require them.

If you're not ready for formal SLAs, a public commitment on your website ("we target 99.9% uptime, tracked on our status page") gives customers visibility without creating legal obligations.


Practical Implementation for Small Teams

Step 1: Instrument Your SLIs

You can't set meaningful SLOs without measurement. At minimum:

  • Uptime monitoring — Use an external service that checks your application every minute from multiple locations. Domain Monitor records uptime percentages and provides historical reports — create a free account to start tracking.
  • Error rate logging — Log HTTP response codes and track 5xx error rates
  • Latency monitoring — Most APM tools and some uptime monitors record response times

Step 2: Establish Baseline SLIs

Run your monitoring for 30–90 days before setting targets. See what you actually achieve. Use that baseline to set realistic SLOs.

Step 3: Set SLOs at the Right Granularity

For a small SaaS, two or three SLOs are plenty:

  • Availability: 99.9% monthly (allows ~43 minutes downtime per month)
  • API latency: 95th percentile under 500ms
  • Error rate: under 0.5% of requests

Review them quarterly. As your reliability improves, tighten them.

Step 4: Publish Your Status Page

A public status page shows your uptime history and active incidents. It's the foundation of transparency with customers and supports any SLA claims. See how to create a public status page.

Step 5: Introduce SLAs When You Need Them

Once your SLIs are measured, your SLOs are consistently met, and customers are asking — introduce SLAs. Set them below your SLOs, define the credit structure, and put it in your terms of service.


Also in This Series

For more on SLAs and uptime reporting, see uptime SLA guide, what does 99.9% uptime really mean, and how to interpret uptime reports.

More posts

Why Your Status Page Matters During an Outage

When your site goes down, your status page becomes the most important page you have. Here's why it matters, what happens when you don't have one, and what a good status page does during a real outage.

Read more
Why Your Domain Points to the Wrong Server

Your domain is resolving, but pointing to the wrong server — showing old content, a previous host's page, or someone else's site entirely. Here's what causes this and how to diagnose it.

Read more
Why Website Monitoring Misses Downtime Sometimes

Uptime monitoring isn't foolproof. Single-location monitors, wrong health check endpoints, long check intervals, and false positives can all cause real downtime to go undetected. Here's what to watch out for.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.