
Signup and password reset flows have a dependency that most other flows don't: email delivery. A user can complete the signup form perfectly, get a 200 response, and still be stuck — because the verification email never arrived.
From a monitoring perspective, this makes these flows harder to check. Your web server is up. Your database is writing records. But the email sending is broken, and you won't know until users start asking why they never received their verification link.
1. Form submission endpoint — The POST to /register or /auth/signup fails or returns an error. This is the easiest failure to detect with standard monitoring.
2. Email queue — The signup handler queues an email but the queue worker isn't processing it, or the connection to your email provider (SendGrid, Postmark, SES) has failed. The endpoint returns 200, the record is written, and nothing else happens.
3. Email delivery — The email is sent but filtered as spam, bounced, or delayed by the provider. This is outside your direct control but still affects your users.
Standard uptime monitoring catches failure 1. You need additional monitoring for failure 2.
# Flask
@app.route('/health/email-flows')
def email_flows_health():
checks = {}
# 1. Check email queue depth
queue_depth = get_queue_size('emails')
checks['email_queue_depth'] = queue_depth
checks['email_queue_status'] = 'ok' if queue_depth < 500 else 'degraded'
# 2. Check email provider connectivity
try:
import sendgrid
sg = sendgrid.SendGridAPIClient(api_key=os.environ['SENDGRID_API_KEY'])
# Lightweight API check — don't send email, just verify credentials
response = sg.client.scopes.get()
checks['email_provider'] = 'ok'
except Exception as e:
checks['email_provider'] = f'error: {str(e)}'
# 3. Check database can write new user records
try:
# Test write capability without creating real records
db.execute('SELECT COUNT(*) FROM users')
checks['user_db'] = 'ok'
except Exception as e:
checks['user_db'] = f'error: {str(e)}'
status = 'ok' if all(v == 'ok' for v in [
checks['email_queue_status'],
checks['email_provider'],
checks['user_db']
]) else 'degraded'
return jsonify({'status': status, **checks}), 200 if status == 'ok' else 503
// Node.js
app.get('/health/email-flows', async (req, res) => {
const checks = {};
try {
// Check email queue depth
const emailQueue = new Queue('emails', { connection });
const counts = await emailQueue.getJobCounts('waiting', 'delayed');
const depth = counts.waiting + counts.delayed;
checks.emailQueueDepth = depth;
checks.emailQueue = depth < 500 ? 'ok' : 'degraded';
// Check SMTP/provider connectivity
await transporter.verify();
checks.emailProvider = 'ok';
// Check DB write access
await db.query('SELECT COUNT(*) FROM users');
checks.userDb = 'ok';
const allOk = Object.values(checks).every(v => v === 'ok' || typeof v === 'number');
res.status(allOk ? 200 : 503).json({ status: allOk ? 'ok' : 'degraded', ...checks });
} catch (err) {
res.status(503).json({ status: 'error', error: err.message, ...checks });
}
});
Password reset is the flow users trigger when they're locked out. If password reset is broken, they have no way back in. That's a churn event.
The password reset flow has the same email dependency as signup, with one additional risk: token expiry. Password reset tokens expire (typically after 1–24 hours). If your queue is backlogged and the email arrives after the token expires, the user's reset attempt fails even though the email eventually arrived.
@app.route('/health/password-reset')
def password_reset_health():
try:
# Check reset token store is accessible
reset_token_key = 'health:test-reset-token'
cache.set(reset_token_key, 'test', ex=60)
val = cache.get(reset_token_key)
assert val is not None
# Check email queue isn't backed up beyond token TTL risk
queue_depth = get_queue_size('emails')
queue_ok = queue_depth < 200 # Conservative — don't want delays > ~5 minutes
return jsonify({
'status': 'ok' if queue_ok else 'degraded',
'token_store': 'ok',
'email_queue_depth': queue_depth,
}), 200 if queue_ok else 503
except Exception as e:
return jsonify({'status': 'error', 'error': str(e)}), 503
| Signal | Threshold | Why |
|---|---|---|
| Email queue depth | > 500 jobs | Emails backing up — users waiting for verification |
| Email provider ping | Any failure | Provider unreachable — no emails sending |
| Password reset queue | > 200 jobs | Risk of token expiry before email arrives |
/health/email-flows response time | > 3s | Degradation in email or DB layer |
Domain Monitor monitors your /health/email-flows and /health/password-reset endpoints every minute. When email sending breaks — an API key rotation, a queue worker crash, a provider outage — you're alerted before the support tickets arrive. Create a free account.
Wildcard, SAN (multi-domain), and single-domain SSL certificates cover different use cases. Here's a clear comparison to help you pick the right type — and avoid paying for coverage you don't need.
Read moreDNS resolves correctly from your office but fails for users in other countries or on different ISPs. Here's why geographic DNS inconsistency happens and how to diagnose which layer is causing it.
Read moreRegistrar lock and transfer lock are often confused — and disabling the wrong one leaves your domain vulnerable. Here's a clear breakdown of what each does and when to use them.
Read moreLooking to monitor your website and domains? Join our platform and start today.