
Observability is the ability to understand the internal state of a system by examining its external outputs. For web systems and applications, an observable system is one where you can answer questions like "why is this slow?", "where is this error coming from?", and "what changed right before this broke?" — without having to deploy new code to get the answers.
The term comes from control theory and has been adopted by the software industry, particularly the DevOps and SRE communities, as a framework for thinking about monitoring and debugging production systems.
Observability is traditionally built on three signal types:
Metrics are numeric measurements collected over time — the "what" of your system's health.
Examples:
Metrics are efficient to store and query. They're ideal for dashboards, alerting, and trend analysis. Uptime monitoring response times are a form of metric collection.
Logs are time-stamped records of events — the "what happened" of your system.
Examples:
2026-03-17 14:23:01 ERROR: Database connection timeout after 30s2026-03-17 14:23:05 INFO: Retrying database connection2026-03-17 14:24:00 INFO: Database connection restoredLogs provide the narrative context that metrics lack. When your monitoring alert fires, logs tell you why.
Distributed traces follow a request through multiple services — the "where did time go" of your system.
For a microservices architecture, a single user request might touch 10 services. A trace shows you each hop, how long it took, and where errors occurred. This makes traces essential for microservices monitoring but less critical for simple monolithic applications.
These terms are often used interchangeably, but they have a distinction:
You can have monitoring without observability — a simple uptime check tells you your site is down but not why. Observability without monitoring is also incomplete — you can have detailed internal data but no alerting when things break.
The two work together: monitoring provides the trigger ("something is wrong"), observability provides the investigation capability ("here's why").
Uptime monitoring is the external availability layer of observability — it answers the most fundamental question: can users reach your service?
This is the perspective that matters most:
| Layer | Question Answered | Tool Type |
|---|---|---|
| External availability | Can users reach my site? | Uptime monitoring |
| Infrastructure metrics | Is my server healthy? | APM / infrastructure monitoring |
| Application metrics | Is my application behaving correctly? | APM |
| Logs | What events led to this issue? | Log aggregation |
| Traces | Where in the request lifecycle did this fail? | Distributed tracing |
For most small-to-medium websites and applications, external uptime monitoring is the most important observability layer to have in place. A synthetic uptime check that confirms your site is accessible and responding is more actionable than a pile of internal metrics that no one watches.
You don't need to implement all three pillars at once. A pragmatic approach:
Level 1 — Availability: External HTTP uptime monitoring, SSL monitoring, domain expiry monitoring. This catches "is it down?" problems. → Domain Monitor
Level 2 — Application errors: Error tracking (Sentry, Rollbar) to capture application exceptions. This catches "is it broken in a way uptime monitoring misses?" problems.
Level 3 — Performance metrics: APM tool (Datadog, New Relic, Elastic APM) for response time distribution, database query times, and memory metrics. This catches "is it slow?" problems.
Level 4 — Distributed tracing: Distributed tracing (Jaeger, Zipkin, OpenTelemetry) for complex microservices architectures. Most applications don't need this level.
Most teams benefit enormously from Levels 1-2 and should add Level 3 when performance becomes a concern. Level 4 is primarily for distributed systems at scale.
OpenTelemetry is an open-source observability framework that provides standardised APIs and SDKs for collecting metrics, logs, and traces. It's vendor-neutral, meaning you can collect data once and send it to multiple backends.
For web applications, OpenTelemetry can be instrumented at the application level to automatically collect timing data, error rates, and request traces — providing a rich observability foundation.
Observability is a spectrum. You don't need enterprise-grade observability tooling for a small website — but you do need to be able to answer the question "is my site up and responding?" reliably.
Start with external uptime monitoring. It's the foundation of any observability stack and the most actionable signal for most teams. Build from there as your system complexity and scale demand it.
Start your observability journey with uptime monitoring at Domain Monitor.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.