
AI applications — chatbots, writing assistants, code generators, AI-powered search — depend on external LLM APIs (OpenAI, Anthropic, Google Gemini) and model inference services. When these services are unavailable, your application breaks. Monitoring AI applications requires tracking both your own infrastructure and the AI services it depends on.
An AI-powered application typically has more dependencies than a traditional web app:
Every link in this chain can fail. Monitoring needs to cover each one.
Start with external monitoring on your application's user-facing endpoints:
Monitor: https://yourapp.com
Expected status: 200
Interval: 1 minute
Monitor: https://yourapp.com/api/health
Expected status: 200
Content check: {"status":"ok"}
This confirms your infrastructure is accessible, regardless of what's happening with the AI backend.
Build a health endpoint that probes your AI dependencies:
import httpx
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
async def health_check():
results = {"status": "ok", "dependencies": {}}
# Test OpenAI API connectivity
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(
"https://api.openai.com/v1/models",
headers={"Authorization": f"Bearer {OPENAI_API_KEY}"}
)
results["dependencies"]["openai"] = "ok" if response.status_code == 200 else "degraded"
except Exception as e:
results["dependencies"]["openai"] = "error"
results["status"] = "degraded"
status_code = 200 if results["status"] == "ok" else 503
return results, status_code
Monitor this endpoint to catch AI API availability issues before users encounter them.
Major LLM providers publish status pages — subscribe to them for incident notifications:
Subscribe to email/webhook notifications. Status page updates often lag actual incidents, so also monitor your own proxy endpoint (as above) for real-time detection.
For full coverage, also set up an HTTP monitor on the provider's API health endpoint:
Monitor: https://api.openai.com/v1/models
Headers: Authorization: Bearer {your-api-key}
Expected status: 200
Interval: 5 minutes
Note: Some providers may rate-limit these checks — use a generous interval (5 minutes) and test with a lightweight endpoint.
Response time is particularly critical for AI applications — LLM inference takes seconds, and degraded API performance has a significant UX impact.
Track response time trends in your monitoring data:
Set a response time alert threshold — if your /api/chat endpoint starts taking >10s on average, that warrants investigation even if it's technically "available."
AI applications often have background processing:
Use heartbeat monitoring for these:
import httpx
async def index_documents():
# Process and embed documents
await embed_and_store_documents(docs)
# Signal successful completion
async with httpx.AsyncClient() as client:
await client.get(f"{HEARTBEAT_URL}/ping/{MONITOR_TOKEN}")
If you're running Model Context Protocol (MCP) servers that provide tools and context to AI agents, monitor their HTTP endpoints just like any other API service:
Monitor: https://your-mcp-server.com/health
Expected status: 200
Interval: 1 minute
MCP servers are increasingly part of production AI agent architectures — their availability directly affects AI agent reliability.
Design your application to degrade gracefully when AI services are unavailable:
async def generate_response(user_message: str):
try:
# Try primary AI provider
return await openai_client.chat(user_message)
except openai.APIError:
try:
# Fallback to secondary provider
return await anthropic_client.messages(user_message)
except anthropic.APIError:
# Final fallback
return {"message": "AI assistant is temporarily unavailable. Please try again shortly."}
Fallback providers and graceful error messages maintain user experience during AI service outages.
Given the dependency complexity of AI applications, layer your alerts:
| Failure Type | Alert Channel | Why |
|---|---|---|
| Your frontend/API down | SMS + Slack | Users can't access the app |
| AI API degraded | Slack | Investigate, may need fallback |
| AI API down | SMS + Slack | Activate fallback provider |
| Background job missed | Slack | Processing pipeline broken |
| SSL certificate expiry | Email (30 days) | Preventable outage |
See how to set up downtime alerts for complete alert configuration.
AI applications face unique SLA challenges — you can't guarantee 99.9% availability if your AI provider has scheduled maintenance. Include appropriate carve-outs in your SLA for:
Document your AI provider dependencies in your SLA and update your public status page with component status for each dependency.
Monitor your AI application's availability from end to end at Domain Monitor.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.