Queue worker monitoring dashboard showing job processing rates, failed jobs, queue depth and worker health across Laravel Horizon and BullMQ
# developer tools# website monitoring

How to Monitor Queue Workers on Laravel, Node.js, and Python Apps

Queue workers are silent infrastructure. When they're healthy, nobody notices. When they fail, jobs accumulate silently, emails stop sending, reports stop generating, and webhooks stop processing — and nobody knows until a user complains or you happen to check.

Unlike a crashed web server (which immediately returns errors), a dead queue worker leaves your application appearing healthy from the outside while background work quietly piles up. This is what makes queue monitoring different from uptime monitoring — and why it needs a separate approach.

What You're Actually Monitoring

Queue worker monitoring has three distinct concerns:

Worker liveness — Are workers running at all? A worker process that has crashed, been OOM-killed, or failed to restart after a deploy means no jobs are being processed.

Queue depth — How many jobs are waiting? A growing queue indicates workers can't keep up with inflow, even if workers are technically running.

Job failure rate — Are jobs completing successfully? Workers can be running while most jobs are failing, which is just as bad as no workers.

All three need monitoring. Any one of them can be the failure point.


Laravel Horizon

Laravel's Horizon dashboard gives you a UI overview, but it's not enough on its own for production alerting.

Health Check Endpoint

Expose a dedicated health check for your queue system:

// routes/api.php
Route::get('/health/queue', function () {
    $failedJobs = DB::table('failed_jobs')->count();
    $horizonStatus = Cache::get('horizon:status', 'inactive');

    $health = [
        'status' => $horizonStatus === 'running' ? 'ok' : 'degraded',
        'horizon' => $horizonStatus,
        'failed_jobs' => $failedJobs,
    ];

    $statusCode = $horizonStatus === 'running' ? 200 : 503;
    return response()->json($health, $statusCode);
});

Horizon writes its status to the cache — horizon:status will be running, paused, or inactive. Your uptime monitor can check this endpoint every minute and alert when the status isn't running.

Queue Depth Monitoring

Route::get('/health/queue', function () {
    $queues = ['default', 'emails', 'reports'];
    $depths = [];

    foreach ($queues as $queue) {
        $depths[$queue] = Queue::size($queue);
    }

    $maxDepth = max(array_values($depths));
    $status = $maxDepth > 1000 ? 'degraded' : 'ok';

    return response()->json([
        'status' => $status,
        'queues' => $depths,
        'horizon' => Cache::get('horizon:status'),
    ], $status === 'ok' ? 200 : 503);
});

Horizon Alerts Configuration

In config/horizon.php:

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'maxProcesses' => 10,
            'minProcesses' => 3,
            'tries' => 3,
            'timeout' => 60,
        ],
    ],
],

'metrics' => [
    'trim_snapshots' => [
        'job' => 24,
        'queue' => 24,
    ],
],

'waits' => [
    'redis:default' => 60,  // Alert if jobs wait > 60 seconds
],

Heartbeat Monitoring with Laravel Horizon

For jobs that should run on a schedule, use a heartbeat pattern:

// In a scheduled command that runs every 5 minutes
class QueueHeartbeat extends Command
{
    public function handle()
    {
        Cache::put('queue:heartbeat', now()->timestamp, 300);
    }
}

// Health check
Route::get('/health/queue', function () {
    $heartbeat = Cache::get('queue:heartbeat');
    $age = $heartbeat ? now()->timestamp - $heartbeat : null;

    $status = (!$heartbeat || $age > 600) ? 'degraded' : 'ok';

    return response()->json([
        'status' => $status,
        'heartbeat_age_seconds' => $age,
    ], $status === 'ok' ? 200 : 503);
});

Node.js / BullMQ

BullMQ is the standard choice for Node.js queue processing. Its built-in metrics make monitoring straightforward.

Health Check with Queue Stats

const { Queue } = require('bullmq');
const { createClient } = require('redis');

const connection = createClient({ url: process.env.REDIS_URL });
const emailQueue = new Queue('emails', { connection });
const reportQueue = new Queue('reports', { connection });

app.get('/health/queue', async (req, res) => {
    try {
        const [emailCounts, reportCounts] = await Promise.all([
            emailQueue.getJobCounts('waiting', 'active', 'failed', 'completed'),
            reportQueue.getJobCounts('waiting', 'active', 'failed', 'completed'),
        ]);

        const totalFailed = emailCounts.failed + reportCounts.failed;
        const totalWaiting = emailCounts.waiting + reportCounts.waiting;

        const degraded = totalFailed > 50 || totalWaiting > 1000;

        res.status(degraded ? 503 : 200).json({
            status: degraded ? 'degraded' : 'ok',
            queues: {
                emails: emailCounts,
                reports: reportCounts,
            },
        });
    } catch (err) {
        res.status(503).json({ status: 'error', message: err.message });
    }
});

Worker Liveness Check

BullMQ workers have a isRunning() method, but if your worker is in a separate process you need another approach:

// In your worker process — write a heartbeat to Redis
const worker = new Worker('emails', processEmailJob, { connection });

worker.on('ready', () => {
    console.log('Worker ready');
    setInterval(async () => {
        await connection.set('worker:emails:heartbeat', Date.now(), { EX: 120 });
    }, 30000);
});

worker.on('error', (err) => {
    console.error('Worker error:', err);
});
// In your health check — read the heartbeat
app.get('/health/queue', async (req, res) => {
    const heartbeat = await connection.get('worker:emails:heartbeat');
    const age = heartbeat ? Date.now() - parseInt(heartbeat) : null;
    const workerAlive = age !== null && age < 90000; // 90 second threshold

    res.status(workerAlive ? 200 : 503).json({
        status: workerAlive ? 'ok' : 'degraded',
        worker_age_ms: age,
    });
});

Stalled Job Detection

BullMQ automatically marks jobs as stalled if a worker dies mid-processing:

const queueEvents = new QueueEvents('emails', { connection });

queueEvents.on('stalled', ({ jobId }) => {
    console.error(`Job ${jobId} stalled — worker may have died`);
    // Send alert to your monitoring system
});

queueEvents.on('failed', ({ jobId, failedReason }) => {
    console.error(`Job ${jobId} failed: ${failedReason}`);
});

Python / Celery

Celery is the standard queue processing library for Python. Monitoring requires a combination of the Celery inspection API and external health checks.

Health Check Endpoint

from celery import Celery
from flask import Flask, jsonify

app = Flask(__name__)
celery = Celery('tasks', broker=os.environ['REDIS_URL'])

@app.route('/health/queue')
def queue_health():
    try:
        # Check if any workers are responding
        inspect = celery.control.inspect(timeout=2)
        active = inspect.active()

        if not active:
            return jsonify({
                'status': 'degraded',
                'error': 'No workers responding'
            }), 503

        worker_count = len(active)
        total_active_jobs = sum(len(jobs) for jobs in active.values())

        return jsonify({
            'status': 'ok',
            'workers': worker_count,
            'active_jobs': total_active_jobs,
        })
    except Exception as e:
        return jsonify({'status': 'error', 'error': str(e)}), 503

Queue Depth via Redis

import redis

r = redis.from_url(os.environ['REDIS_URL'])

@app.route('/health/queue')
def queue_health():
    queues = ['celery', 'emails', 'reports']
    depths = {}

    for queue in queues:
        depths[queue] = r.llen(queue)

    max_depth = max(depths.values()) if depths else 0
    status = 'degraded' if max_depth > 1000 else 'ok'

    return jsonify({
        'status': status,
        'queues': depths,
    }), 200 if status == 'ok' else 503

Celery Beat for Scheduled Heartbeats

# tasks.py
@celery.task
def queue_heartbeat():
    r = redis.from_url(os.environ['REDIS_URL'])
    r.setex('queue:heartbeat', 300, int(time.time()))

# celerybeat-schedule (runs every 5 minutes)
CELERYBEAT_SCHEDULE = {
    'queue-heartbeat': {
        'task': 'tasks.queue_heartbeat',
        'schedule': 300.0,
    },
}
@app.route('/health/queue')
def queue_health():
    heartbeat = r.get('queue:heartbeat')
    if not heartbeat:
        return jsonify({'status': 'degraded', 'error': 'No heartbeat'}), 503

    age = int(time.time()) - int(heartbeat)
    if age > 600:  # 10 minutes
        return jsonify({'status': 'degraded', 'heartbeat_age': age}), 503

    return jsonify({'status': 'ok', 'heartbeat_age': age})

What to Alert On

SignalThresholdSeverity
Worker not runningImmediateP1
Health endpoint returning 503ImmediateP1
Queue depth growing>1000 jobsP2
Job failure rate>5% failure rateP2
Heartbeat missed>10 minutesP1
Jobs stalledAnyP2

The most reliable approach is a dedicated /health/queue endpoint that your uptime monitor checks every minute. It encapsulates all the queue-specific logic, and you get the same alerting path as your main application uptime.

Application Uptime Monitoring

Domain Monitor monitors your health check endpoints — including queue-specific ones — from multiple global locations every minute. Point it at /health/queue alongside your main health check and you'll know immediately when workers go down, not when a user reports that their email never arrived. Create a free account.

Also in This Series

More posts

Why Your Status Page Matters During an Outage

When your site goes down, your status page becomes the most important page you have. Here's why it matters, what happens when you don't have one, and what a good status page does during a real outage.

Read more
Why Your Domain Points to the Wrong Server

Your domain is resolving, but pointing to the wrong server — showing old content, a previous host's page, or someone else's site entirely. Here's what causes this and how to diagnose it.

Read more
Why Website Monitoring Misses Downtime Sometimes

Uptime monitoring isn't foolproof. Single-location monitors, wrong health check endpoints, long check intervals, and false positives can all cause real downtime to go undetected. Here's what to watch out for.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.