
When your website goes down, the first question is always "what broke?" Nine times out of ten, the answer involves the database. A slow query, an exhausted connection pool, a disk that's 99% full — database problems are one of the most common causes of website downtime, and they're often the hardest to diagnose quickly.
Database monitoring is the practice of continuously tracking database performance, availability, and resource usage to catch problems before they cause outages. This guide explains what it involves, why it matters, and how database health directly affects your website's uptime.
Your database is the backbone of most web applications. When it struggles, your entire site feels it immediately. Here's how common database problems translate to user-facing failures:
Most web applications connect to databases through a connection pool — a pre-established set of database connections that are reused for requests. This pool has a maximum size (often 10-100 connections).
When your application receives more concurrent requests than your connection pool can handle, new requests queue up waiting for a connection. If the queue fills up, requests fail with connection errors. Users see 500 errors or timeout pages.
What causes it:
What to monitor: Current connection count vs. maximum connections. Alert when you reach 80% of the pool limit.
A single slow SQL query can block an entire web page from loading. If that query is called on every page view, your entire site becomes slow or unresponsive.
Query performance degrades when:
What to monitor: Slow query logs (queries exceeding a time threshold), query execution counts, and average query duration.
When a database's disk fills up completely, it stops accepting writes. INSERT, UPDATE, and DELETE operations all fail. For most web applications, this means the site effectively stops working — users can't log in, can't save data, can't complete transactions.
What to monitor: Disk usage percentage. Alert at 70% and 85% — disk fills up faster than you expect once you're past 70%.
Many applications use read replicas to distribute query load. The primary database handles writes; replicas handle reads. Replication lag is the delay between a write on the primary and that write appearing on the replica.
High replication lag means users reading from a replica see stale data. Extreme replication lag can cause read replicas to fall so far behind that they become useless, routing all traffic back to the primary and causing overload.
Database software crashes, though it's rarer than application crashes. PostgreSQL, MySQL, MongoDB, and other databases can crash due to:
When the database process crashes, every application request fails immediately.
The most basic database check: is the database reachable and accepting connections? This is typically done via:
SELECT 1)If you're using Domain Monitor, TCP port monitoring lets you check whether your database port is accepting connections — even if you can't expose the database to external HTTP checks. See our guide on TCP monitoring for more.
Tracks query performance metrics:
Tracks database server resource usage:
Your website monitoring tells you that something is wrong. Your database monitoring tells you why.
Consider this scenario: your external uptime monitor alerts you that your website is returning 503 errors. You check your application logs and see Connection refused to database. Your database monitoring shows connection count at 100% of the pool limit and 50 connections waiting.
You now know exactly what's wrong and can act:
Without database monitoring, this diagnosis could take 30 minutes instead of 3.
For more on responding to outages effectively, see essential methods for dealing with unscheduled website downtime.
Cloud-managed databases (AWS RDS, Google Cloud SQL, Azure Database, PlanetScale, Supabase) provide some monitoring out of the box, but you still need to configure alerts:
AWS RDS key metrics to monitor:
DatabaseConnections — number of active connectionsFreeStorageSpace — alert before it hits zeroReadLatency and WriteLatency — query performanceReplicaLag — for read replica setupsCPUUtilization — database server loadGoogle Cloud SQL key metrics:
database/cpu/utilizationdatabase/disk/utilizationdatabase/postgresql/num_backends (connections)Several tools specialize in database monitoring:
Most of these work alongside external website monitoring — the database monitoring tells you about internal health, while external monitoring confirms whether users are affected.
Even without specialized tooling, you can set up basic database monitoring today:
slow_query_log = 1, threshold long_query_time = 1; PostgreSQL: log_min_duration_statement = 1000information_schema.processlist (MySQL) or pg_stat_activity (PostgreSQL)/health endpoint/health endpoint includes database connection statusDatabase problems cause website downtime more often than any other single component. Monitoring your database means knowing about connection exhaustion, slow queries, and disk pressure before they cascade into user-facing failures.
Start with the basics: availability monitoring, disk usage alerts, and connection count tracking. Add slow query logging and performance metrics as you grow. And make sure your application's health endpoint reflects the database status so your external uptime monitor can alert you the moment a database issue starts affecting users.
Domain Monitor handles external monitoring and TCP port checks — pair it with database-level monitoring for full coverage of your data layer.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.