
File upload flows are harder to monitor than most endpoints. A standard uptime check against your upload endpoint will almost always return 200 — the endpoint exists and responds. But the actual upload process involves storage providers, processing queues, and size/timeout constraints that break in ways the endpoint status doesn't reveal.
At the same time, file upload endpoints are prone to false alerts if you monitor them naively — timeout thresholds set for normal HTTP requests will fire on legitimate large uploads.
Here's how to monitor uploads correctly.
A file upload typically involves:
Monitoring the upload endpoint only covers step 1. Steps 2–5 are where most real failures occur.
# Flask
import boto3
@app.route('/health/uploads')
def upload_health():
checks = {}
# 1. Test storage provider connectivity (S3/compatible)
try:
s3 = boto3.client('s3',
aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
)
# Write a tiny test object and delete it
bucket = os.environ['S3_BUCKET']
s3.put_object(Bucket=bucket, Key='health-check/test.txt', Body=b'ok')
s3.delete_object(Bucket=bucket, Key='health-check/test.txt')
checks['storage'] = 'ok'
except Exception as e:
checks['storage'] = f'error: {str(e)}'
# 2. Test processing queue depth
queue_depth = get_queue_size('file-processing')
checks['processing_queue_depth'] = queue_depth
checks['processing_queue'] = 'ok' if queue_depth < 100 else 'degraded'
# 3. Test database write for upload records
try:
db.execute('SELECT COUNT(*) FROM uploads')
checks['uploads_db'] = 'ok'
except Exception as e:
checks['uploads_db'] = f'error: {str(e)}'
all_ok = all(v == 'ok' for v in [
checks.get('storage'),
checks.get('processing_queue'),
checks.get('uploads_db'),
])
return jsonify({'status': 'ok' if all_ok else 'degraded', **checks}), \
200 if all_ok else 503
// Node.js / AWS SDK v3
import { S3Client, PutObjectCommand, DeleteObjectCommand } from '@aws-sdk/client-s3';
app.get('/health/uploads', async (req, res) => {
const checks = {};
try {
// Test S3 connectivity with a tiny write
const s3 = new S3Client({ region: process.env.AWS_REGION });
await s3.send(new PutObjectCommand({
Bucket: process.env.S3_BUCKET,
Key: 'health-check/test.txt',
Body: Buffer.from('ok'),
}));
await s3.send(new DeleteObjectCommand({
Bucket: process.env.S3_BUCKET,
Key: 'health-check/test.txt',
}));
checks.storage = 'ok';
// Test processing queue
const queue = new Queue('file-processing', { connection });
const counts = await queue.getJobCounts('waiting', 'active');
checks.processingQueueDepth = counts.waiting;
checks.processingQueue = counts.waiting < 100 ? 'ok' : 'degraded';
// Test DB
await db.query('SELECT COUNT(*) FROM uploads');
checks.uploadsDb = 'ok';
const allOk = ['storage', 'processingQueue', 'uploadsDb']
.every(k => checks[k] === 'ok');
res.status(allOk ? 200 : 503).json({ status: allOk ? 'ok' : 'degraded', ...checks });
} catch (err) {
res.status(503).json({ status: 'error', error: err.message, ...checks });
}
});
Monitoring POST /api/upload directly will either:
Instead, monitor /health/uploads which tests each component independently without doing an actual upload.
If you do monitor an upload endpoint, set the timeout threshold to match your expected upload duration. A 30-second timeout threshold for an endpoint that legitimately handles 200MB files will fire false alerts constantly.
Better: separate the timeout thresholds by endpoint type:
If your architecture generates signed S3 URLs and uploads go directly to S3 (not through your server), your upload endpoint just generates a URL — it's fast and easily monitored. The actual upload bypasses your server entirely.
In this case, monitor:
Domain Monitor monitors your /health/uploads endpoint every minute. When your S3 credentials expire, your processing queue backs up, or your storage provider has an incident, you know within a minute — before users start complaining that their uploads aren't processing. Create a free account.
Wildcard, SAN (multi-domain), and single-domain SSL certificates cover different use cases. Here's a clear comparison to help you pick the right type — and avoid paying for coverage you don't need.
Read moreDNS resolves correctly from your office but fails for users in other countries or on different ISPs. Here's why geographic DNS inconsistency happens and how to diagnose which layer is causing it.
Read moreRegistrar lock and transfer lock are often confused — and disabling the wrong one leaves your domain vulnerable. Here's a clear breakdown of what each does and when to use them.
Read moreLooking to monitor your website and domains? Join our platform and start today.