Serverless function monitoring dashboard showing Lambda and edge function health and response times
# website monitoring

Monitoring Serverless Functions: AWS Lambda, Vercel, and Cloudflare Workers

Serverless functions promise zero infrastructure management — but they still fail. Cold starts add latency. Functions time out. Memory limits get hit. Deployment errors break handlers. Without monitoring, these failures are invisible until users complain.

How Serverless Functions Fail

Serverless failure modes differ from traditional servers:

  • Cold starts — function takes 100-3000ms to initialise on first invocation (or after idle period)
  • Timeout — function exceeds configured execution limit (default: 3-15s depending on platform)
  • Memory limit exceeded — function killed mid-execution
  • Deployment error — new code deployed with a syntax error or misconfigured handler
  • Dependency failure — function calls a database or API that's down
  • Concurrency throttling — too many simultaneous invocations; excess requests fail or queue
  • Region failure — cloud region experiencing issues; functions unavailable

External HTTP monitoring catches most of these — if your function-backed endpoint returns an error or times out, the monitor fails.

External HTTP Monitoring for Serverless Endpoints

The foundation of serverless monitoring is the same as any web application: external HTTP checks on your production endpoints.

For a serverless API:

Monitor: https://api.yourdomain.com/v1/health
Expected status: 200
Content check: {"status":"ok"}
Interval: 1 minute

For a Vercel-deployed application:

Monitor: https://yourapp.vercel.app/api/health
Expected status: 200

This validates the complete path: DNS → CDN edge → serverless platform → function → response.

A dedicated /health endpoint in your function should be lightweight — just return 200. Don't call the database in health checks for serverless (it adds cost and latency).

Monitoring Cold Starts

Cold starts are a performance issue unique to serverless. Monitor response time data from your uptime checks to detect when cold starts are affecting user experience:

  • Set a response time alert threshold (e.g., alert if response time > 5 seconds)
  • Use a monitoring service with multi-location checks — cold starts affect response time, which will show up as spikes in your monitoring data

For latency-sensitive functions, consider:

  • Provisioned concurrency (AWS Lambda) — keeps instances warm, eliminates cold starts at a cost
  • Minimum instances (Cloud Run) — keeps at least one instance warm
  • Regular check intervals in your monitoring act as keep-warm requests for low-traffic functions

AWS Lambda Monitoring

Beyond external HTTP monitoring, AWS provides Lambda-specific tools:

CloudWatch Metrics (built-in):

  • Duration — execution time per invocation
  • Errors — invocations that threw an exception
  • Throttles — invocations rejected due to concurrency limits
  • ConcurrentExecutions — simultaneous invocations

CloudWatch Alarms on these metrics complement external monitoring:

  • Alert on error rate > 1% of invocations
  • Alert on throttling > 0 (any throttling is a problem)
  • Alert on P99 duration > threshold

External monitoring tells you about user-facing availability. CloudWatch tells you about internal Lambda behaviour.

Vercel Function Monitoring

Vercel Functions (including Next.js API routes deployed on Vercel) are monitored through:

External HTTP monitoring: Point your uptime monitor at your function's endpoint. Vercel deployment monitoring covers the full approach.

Vercel Analytics: Vercel provides built-in Web Vitals and function execution metrics in their dashboard.

Function timeout awareness: Vercel Functions have execution limits (Hobby: 10s, Pro: 60s). Functions that regularly approach these limits need optimisation or migration to longer-running solutions.

Cloudflare Workers Monitoring

Cloudflare Workers run at the edge (150+ global locations) with near-zero cold starts. Monitoring challenges:

  • Workers respond from the nearest edge node — monitoring from multiple locations tests different nodes
  • Failures may be regional (one edge location having issues while others work fine)

Use multi-location uptime monitoring to detect regional Worker failures.

Cloudflare Analytics: Workers analytics shows requests, errors, and CPU time per worker.

Heartbeat Monitoring for Serverless Background Jobs

Serverless background processing (SQS + Lambda, Cloudflare Queue Workers) is invisible to HTTP monitoring. Use heartbeat monitoring to verify background processing is running:

// Lambda handler for background processing
export const handler = async (event) => {
    // Process SQS messages
    for (const record of event.Records) {
        await processMessage(record);
    }
    
    // Signal successful processing
    await fetch('https://monitoring-url/ping/YOUR_TOKEN');
};

Configure the heartbeat monitor with a grace period matching your expected processing interval.

Alert Strategy for Serverless

Failure TypeDetectionAlert
Endpoint returning errorsExternal HTTP monitorSMS + Slack
High error rateCloudWatch alarmSlack
ThrottlingCloudWatch alarmSlack (investigate concurrency limits)
Slow response timesExternal monitor thresholdSlack
Background job stoppedHeartbeat monitorSMS
SSL certificate expirySSL monitorEmail (30 days advance)

Serverless architectures benefit from the same downtime alert configuration as traditional applications — the delivery mechanisms are identical.


Monitor serverless function endpoints from outside your cloud infrastructure at Domain Monitor.

More posts

What Is Generative AI? How It Works and What It Creates

Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.

Read more
What Is Cursor AI? The AI Code Editor Explained

Cursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.

Read more
What Is Claude Opus? Anthropic's Most Powerful Model Explained

Claude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.