Monitoring health check showing signup endpoint, email queue depth and password reset flow status across a web application
# developer tools# website monitoring

How to Monitor Signup, Email Verification, and Password Reset Flows

Signup and password reset flows have a dependency that most other flows don't: email delivery. A user can complete the signup form perfectly, get a 200 response, and still be stuck — because the verification email never arrived.

From a monitoring perspective, this makes these flows harder to check. Your web server is up. Your database is writing records. But the email sending is broken, and you won't know until users start asking why they never received their verification link.


The Three Failure Points

1. Form submission endpoint — The POST to /register or /auth/signup fails or returns an error. This is the easiest failure to detect with standard monitoring.

2. Email queue — The signup handler queues an email but the queue worker isn't processing it, or the connection to your email provider (SendGrid, Postmark, SES) has failed. The endpoint returns 200, the record is written, and nothing else happens.

3. Email delivery — The email is sent but filtered as spam, bounced, or delayed by the provider. This is outside your direct control but still affects your users.

Standard uptime monitoring catches failure 1. You need additional monitoring for failure 2.


Health Check for Signup and Auth Email Flows

# Flask
@app.route('/health/email-flows')
def email_flows_health():
    checks = {}

    # 1. Check email queue depth
    queue_depth = get_queue_size('emails')
    checks['email_queue_depth'] = queue_depth
    checks['email_queue_status'] = 'ok' if queue_depth < 500 else 'degraded'

    # 2. Check email provider connectivity
    try:
        import sendgrid
        sg = sendgrid.SendGridAPIClient(api_key=os.environ['SENDGRID_API_KEY'])
        # Lightweight API check — don't send email, just verify credentials
        response = sg.client.scopes.get()
        checks['email_provider'] = 'ok'
    except Exception as e:
        checks['email_provider'] = f'error: {str(e)}'

    # 3. Check database can write new user records
    try:
        # Test write capability without creating real records
        db.execute('SELECT COUNT(*) FROM users')
        checks['user_db'] = 'ok'
    except Exception as e:
        checks['user_db'] = f'error: {str(e)}'

    status = 'ok' if all(v == 'ok' for v in [
        checks['email_queue_status'],
        checks['email_provider'],
        checks['user_db']
    ]) else 'degraded'

    return jsonify({'status': status, **checks}), 200 if status == 'ok' else 503
// Node.js
app.get('/health/email-flows', async (req, res) => {
    const checks = {};

    try {
        // Check email queue depth
        const emailQueue = new Queue('emails', { connection });
        const counts = await emailQueue.getJobCounts('waiting', 'delayed');
        const depth = counts.waiting + counts.delayed;
        checks.emailQueueDepth = depth;
        checks.emailQueue = depth < 500 ? 'ok' : 'degraded';

        // Check SMTP/provider connectivity
        await transporter.verify();
        checks.emailProvider = 'ok';

        // Check DB write access
        await db.query('SELECT COUNT(*) FROM users');
        checks.userDb = 'ok';

        const allOk = Object.values(checks).every(v => v === 'ok' || typeof v === 'number');
        res.status(allOk ? 200 : 503).json({ status: allOk ? 'ok' : 'degraded', ...checks });
    } catch (err) {
        res.status(503).json({ status: 'error', error: err.message, ...checks });
    }
});

Monitoring Password Reset Specifically

Password reset is the flow users trigger when they're locked out. If password reset is broken, they have no way back in. That's a churn event.

The password reset flow has the same email dependency as signup, with one additional risk: token expiry. Password reset tokens expire (typically after 1–24 hours). If your queue is backlogged and the email arrives after the token expires, the user's reset attempt fails even though the email eventually arrived.

@app.route('/health/password-reset')
def password_reset_health():
    try:
        # Check reset token store is accessible
        reset_token_key = 'health:test-reset-token'
        cache.set(reset_token_key, 'test', ex=60)
        val = cache.get(reset_token_key)
        assert val is not None

        # Check email queue isn't backed up beyond token TTL risk
        queue_depth = get_queue_size('emails')
        queue_ok = queue_depth < 200  # Conservative — don't want delays > ~5 minutes

        return jsonify({
            'status': 'ok' if queue_ok else 'degraded',
            'token_store': 'ok',
            'email_queue_depth': queue_depth,
        }), 200 if queue_ok else 503
    except Exception as e:
        return jsonify({'status': 'error', 'error': str(e)}), 503

Alerting Thresholds

SignalThresholdWhy
Email queue depth> 500 jobsEmails backing up — users waiting for verification
Email provider pingAny failureProvider unreachable — no emails sending
Password reset queue> 200 jobsRisk of token expiry before email arrives
/health/email-flows response time> 3sDegradation in email or DB layer

What Breaks These Flows

  • Email provider API key expired or rotated — all emails silently fail
  • Queue worker crashed — jobs accumulate, emails never send (see how to monitor queue workers)
  • SMTP credentials changed — authentication to mail server fails
  • Spam filter blocking verification domain — emails send but never arrive
  • Redis down — token store inaccessible; password reset tokens can't be created or verified

Monitoring with Domain Monitor

Domain Monitor monitors your /health/email-flows and /health/password-reset endpoints every minute. When email sending breaks — an API key rotation, a queue worker crash, a provider outage — you're alerted before the support tickets arrive. Create a free account.


Also in This Series

More posts

Wildcard vs SAN vs Single-Domain SSL Certificates: Which Do You Need?

Wildcard, SAN (multi-domain), and single-domain SSL certificates cover different use cases. Here's a clear comparison to help you pick the right type — and avoid paying for coverage you don't need.

Read more
Why DNS Works in One Location but Fails in Another

DNS resolves correctly from your office but fails for users in other countries or on different ISPs. Here's why geographic DNS inconsistency happens and how to diagnose which layer is causing it.

Read more
Registrar Lock vs Transfer Lock: What's the Difference?

Registrar lock and transfer lock are often confused — and disabling the wrong one leaves your domain vulnerable. Here's a clear breakdown of what each does and when to use them.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.