How to Monitor MySQL Availability, Slow Queries, and Replication Lag

The database is almost always the bottleneck, and often the failure point. When MySQL goes down — or gets slow enough that it might as well be down — your entire application suffers. The challenge is that database problems rarely announce themselves with an obvious error. Instead, you see slow page loads, intermittent 500 errors, and timeouts that are difficult to trace back to their source without proper monitoring.

Availability Monitoring

The first layer: is MySQL accepting connections at all?

Application Health Check

Add a database connectivity check to your application's health endpoint:

# Flask / Python
from sqlalchemy import text

@app.route('/health')
def health():
    try:
        db.session.execute(text('SELECT 1'))
        db_status = 'ok'
    except Exception as e:
        db_status = str(e)

    return jsonify({
        'status': 'ok' if db_status == 'ok' else 'degraded',
        'database': db_status
    }), 200 if db_status == 'ok' else 503

// Laravel
Route::get('/health', function () {
    try {
        DB::select('SELECT 1');
        $db = 'ok';
    } catch (\Exception $e) {
        $db = $e->getMessage();
    }
    $status = $db === 'ok' ? 200 : 503;
    return response()->json(['status' => $db === 'ok' ? 'ok' : 'degraded', 'database' => $db], $status);
});

Point your uptime monitor at this endpoint. A 503 response tells you MySQL is unreachable before users notice.

Direct MySQL Connectivity Check

For more granular monitoring, check MySQL directly:

# Quick connectivity test
mysqladmin -u monitor_user -p'password' -h 127.0.0.1 ping

# Returns "mysqld is alive" or fails with an error

Create a dedicated read-only monitoring user with minimal permissions:

CREATE USER 'monitor'@'localhost' IDENTIFIED BY 'monitor_password';
GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'monitor'@'localhost';
FLUSH PRIVILEGES;

Slow Query Monitoring

A database that's up but running slow queries is almost as bad as one that's down. Slow queries cause timeouts, connection pool exhaustion, and cascading failures.

Enable the Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries taking over 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-queries.log';

-- Also log queries that don't use indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';

Make these permanent in /etc/mysql/mysql.conf.d/mysqld.cnf:

[mysqld]
slow_query_log = 1
long_query_time = 1
slow_query_log_file = /var/log/mysql/slow-queries.log
log_queries_not_using_indexes = 1

Analyse Slow Queries

mysqldumpslow summarises the slow query log:

# Show top 10 slowest queries by total time
mysqldumpslow -s t -t 10 /var/log/mysql/slow-queries.log

# Show queries with the most occurrences
mysqldumpslow -s c -t 10 /var/log/mysql/slow-queries.log

pt-query-digest (Percona Toolkit) provides more detailed analysis including fingerprinting and statistics.

Key Slow Query Metrics to Track

Queries exceeding your long_query_time threshold per minute
Queries not using indexes (full table scans)
Lock wait time — queries waiting for row or table locks
Average query execution time trends

Connection Pool Monitoring

Connection pool exhaustion causes Too many connections errors that take your application down even while MySQL itself is running fine.

-- Check current connections vs maximum
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Threads_running';

-- Check connection usage percentage
SELECT
    @@max_connections AS max_connections,
    COUNT(*) AS current_connections,
    ROUND(COUNT(*) * 100 / @@max_connections, 1) AS usage_pct
FROM information_schema.processlist;

Alert when connection usage exceeds 80% of max_connections. At that point, you're at risk of exhaustion under any traffic spike.

Add a scheduled check to your monitoring:

#!/bin/bash
USAGE=$(mysql -u monitor -p'password' -e "SELECT ROUND(COUNT(*) * 100 / @@max_connections) FROM information_schema.processlist;" -s -N 2>/dev/null)

if [ "$USAGE" -gt 80 ]; then
    # Send alert
    echo "MySQL connection usage at ${USAGE}%" | mail -s "MySQL Connection Warning" [email protected]
fi

Replication Lag Monitoring

If you're running MySQL replication (primary-replica setup for read scaling or high availability), replication lag is a critical metric. High lag means your replicas are serving stale data.

-- On the replica
SHOW REPLICA STATUS\G

-- Key field: Seconds_Behind_Source (or Seconds_Behind_Master in older versions)
-- 0 = replica is current
-- High numbers = replica is falling behind

Alert on Seconds_Behind_Source exceeding your threshold — typically 30–60 seconds for most applications. For near-real-time applications, alert on anything above 5 seconds.

A replication script for scheduled monitoring:

import pymysql

def check_replication_lag(host, user, password):
    conn = pymysql.connect(host=host, user=user, password=password)
    cursor = conn.cursor(pymysql.cursors.DictCursor)
    cursor.execute("SHOW REPLICA STATUS")
    status = cursor.fetchone()

    if status is None:
        return None  # Not a replica

    lag = status.get('Seconds_Behind_Source', 0)
    running = status.get('Replica_SQL_Running', 'No')

    if running != 'Yes':
        alert(f"Replication SQL thread is not running on {host}")
    elif lag and lag > 60:
        alert(f"Replication lag is {lag}s on {host}")

    return lag

Disk Space

MySQL tables grow. Running out of disk space causes MySQL to crash or stop accepting writes immediately. Monitor disk usage on your MySQL data directory:

df -h /var/lib/mysql

Alert when disk usage exceeds 80%. Running out of space at 100% gives you no time to react.

Application-Level Monitoring

All the database-specific monitoring above is complementary to application-level uptime monitoring. The quickest way to know when database problems are affecting users is a health check endpoint that tests the connection.

Domain Monitor monitors your application health check endpoint every minute. When your MySQL goes down or becomes so slow that queries time out, your health check returns 503 and you're alerted immediately. Create a free account.

Also in This Series

For broader monitoring context, see uptime monitoring best practices and website monitoring checklist for developers.

How to Monitor MySQL Availability, Slow Queries, and Replication Lag

Availability Monitoring

Application Health Check

Direct MySQL Connectivity Check

Slow Query Monitoring

Enable the Slow Query Log

Analyse Slow Queries

Key Slow Query Metrics to Track

Connection Pool Monitoring

Replication Lag Monitoring

Disk Space

Application-Level Monitoring

Also in This Series

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.

Domain Monitoring

Uptime Monitoring

SSL Monitoring

WHOIS Lookup

Notifications

Status Pages

Ping test

Traceroute test

Find my website's IP

# developer tools# website monitoring

How to Monitor MySQL Availability, Slow Queries, and Replication Lag

Availability Monitoring

Application Health Check

Direct MySQL Connectivity Check

Slow Query Monitoring

Enable the Slow Query Log

Analyse Slow Queries

Key Slow Query Metrics to Track

Connection Pool Monitoring

Replication Lag Monitoring

Disk Space

Application-Level Monitoring

Also in This Series

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.