Monitoring health check showing signup endpoint, email queue depth and password reset flow status across a web application
# developer tools# website monitoring

How to Monitor Signup, Email Verification, and Password Reset Flows

Signup and password reset flows have a dependency that most other flows don't: email delivery. A user can complete the signup form perfectly, get a 200 response, and still be stuck — because the verification email never arrived.

From a monitoring perspective, this makes these flows harder to check. Your web server is up. Your database is writing records. But the email sending is broken, and you won't know until users start asking why they never received their verification link.


The Three Failure Points

1. Form submission endpoint — The POST to /register or /auth/signup fails or returns an error. This is the easiest failure to detect with standard monitoring.

2. Email queue — The signup handler queues an email but the queue worker isn't processing it, or the connection to your email provider (SendGrid, Postmark, SES) has failed. The endpoint returns 200, the record is written, and nothing else happens.

3. Email delivery — The email is sent but filtered as spam, bounced, or delayed by the provider. This is outside your direct control but still affects your users.

Standard uptime monitoring catches failure 1. You need additional monitoring for failure 2.


Health Check for Signup and Auth Email Flows

# Flask
@app.route('/health/email-flows')
def email_flows_health():
    checks = {}

    # 1. Check email queue depth
    queue_depth = get_queue_size('emails')
    checks['email_queue_depth'] = queue_depth
    checks['email_queue_status'] = 'ok' if queue_depth < 500 else 'degraded'

    # 2. Check email provider connectivity
    try:
        import sendgrid
        sg = sendgrid.SendGridAPIClient(api_key=os.environ['SENDGRID_API_KEY'])
        # Lightweight API check — don't send email, just verify credentials
        response = sg.client.scopes.get()
        checks['email_provider'] = 'ok'
    except Exception as e:
        checks['email_provider'] = f'error: {str(e)}'

    # 3. Check database can write new user records
    try:
        # Test write capability without creating real records
        db.execute('SELECT COUNT(*) FROM users')
        checks['user_db'] = 'ok'
    except Exception as e:
        checks['user_db'] = f'error: {str(e)}'

    status = 'ok' if all(v == 'ok' for v in [
        checks['email_queue_status'],
        checks['email_provider'],
        checks['user_db']
    ]) else 'degraded'

    return jsonify({'status': status, **checks}), 200 if status == 'ok' else 503
// Node.js
app.get('/health/email-flows', async (req, res) => {
    const checks = {};

    try {
        // Check email queue depth
        const emailQueue = new Queue('emails', { connection });
        const counts = await emailQueue.getJobCounts('waiting', 'delayed');
        const depth = counts.waiting + counts.delayed;
        checks.emailQueueDepth = depth;
        checks.emailQueue = depth < 500 ? 'ok' : 'degraded';

        // Check SMTP/provider connectivity
        await transporter.verify();
        checks.emailProvider = 'ok';

        // Check DB write access
        await db.query('SELECT COUNT(*) FROM users');
        checks.userDb = 'ok';

        const allOk = Object.values(checks).every(v => v === 'ok' || typeof v === 'number');
        res.status(allOk ? 200 : 503).json({ status: allOk ? 'ok' : 'degraded', ...checks });
    } catch (err) {
        res.status(503).json({ status: 'error', error: err.message, ...checks });
    }
});

Monitoring Password Reset Specifically

Password reset is the flow users trigger when they're locked out. If password reset is broken, they have no way back in. That's a churn event.

The password reset flow has the same email dependency as signup, with one additional risk: token expiry. Password reset tokens expire (typically after 1–24 hours). If your queue is backlogged and the email arrives after the token expires, the user's reset attempt fails even though the email eventually arrived.

@app.route('/health/password-reset')
def password_reset_health():
    try:
        # Check reset token store is accessible
        reset_token_key = 'health:test-reset-token'
        cache.set(reset_token_key, 'test', ex=60)
        val = cache.get(reset_token_key)
        assert val is not None

        # Check email queue isn't backed up beyond token TTL risk
        queue_depth = get_queue_size('emails')
        queue_ok = queue_depth < 200  # Conservative — don't want delays > ~5 minutes

        return jsonify({
            'status': 'ok' if queue_ok else 'degraded',
            'token_store': 'ok',
            'email_queue_depth': queue_depth,
        }), 200 if queue_ok else 503
    except Exception as e:
        return jsonify({'status': 'error', 'error': str(e)}), 503

Alerting Thresholds

SignalThresholdWhy
Email queue depth> 500 jobsEmails backing up — users waiting for verification
Email provider pingAny failureProvider unreachable — no emails sending
Password reset queue> 200 jobsRisk of token expiry before email arrives
/health/email-flows response time> 3sDegradation in email or DB layer

What Breaks These Flows

  • Email provider API key expired or rotated — all emails silently fail
  • Queue worker crashed — jobs accumulate, emails never send (see how to monitor queue workers)
  • SMTP credentials changed — authentication to mail server fails
  • Spam filter blocking verification domain — emails send but never arrive
  • Redis down — token store inaccessible; password reset tokens can't be created or verified

Monitoring with Domain Monitor

Domain Monitor monitors your /health/email-flows and /health/password-reset endpoints every minute. When email sending breaks — an API key rotation, a queue worker crash, a provider outage — you're alerted before the support tickets arrive. Create a free account.


Also in This Series

More posts

What Is a Subdomain Takeover and How to Prevent It

A subdomain takeover lets an attacker claim your subdomain by exploiting dangling DNS records. Learn how it happens, real-world examples, and how DNS monitoring detects it.

Read more
What Is Mean Time to Detect (MTTD)?

Mean time to detect (MTTD) measures how long it takes to discover an incident after it starts. Reducing MTTD is one of the highest-leverage improvements in reliability engineering.

Read more
What Is Black Box Monitoring?

Black box monitoring tests your systems from the outside, the way users experience them — without access to internal code or infrastructure. Learn how it works and when to use it.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.