
AI applications — chatbots, writing assistants, code generators, AI-powered search — depend on external LLM APIs (OpenAI, Anthropic, Google Gemini) and model inference services. When these services are unavailable, your application breaks. Monitoring AI applications requires tracking both your own infrastructure and the AI services it depends on.
An AI-powered application typically has more dependencies than a traditional web app:
Every link in this chain can fail. Monitoring needs to cover each one.
Start with external monitoring on your application's user-facing endpoints:
Monitor: https://yourapp.com
Expected status: 200
Interval: 1 minute
Monitor: https://yourapp.com/api/health
Expected status: 200
Content check: {"status":"ok"}
This confirms your infrastructure is accessible, regardless of what's happening with the AI backend.
Build a health endpoint that probes your AI dependencies:
import httpx
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
async def health_check():
results = {"status": "ok", "dependencies": {}}
# Test OpenAI API connectivity
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(
"https://api.openai.com/v1/models",
headers={"Authorization": f"Bearer {OPENAI_API_KEY}"}
)
results["dependencies"]["openai"] = "ok" if response.status_code == 200 else "degraded"
except Exception as e:
results["dependencies"]["openai"] = "error"
results["status"] = "degraded"
status_code = 200 if results["status"] == "ok" else 503
return results, status_code
Monitor this endpoint to catch AI API availability issues before users encounter them.
Major LLM providers publish status pages — subscribe to them for incident notifications:
Subscribe to email/webhook notifications. Status page updates often lag actual incidents, so also monitor your own proxy endpoint (as above) for real-time detection.
For full coverage, also set up an HTTP monitor on the provider's API health endpoint:
Monitor: https://api.openai.com/v1/models
Headers: Authorization: Bearer {your-api-key}
Expected status: 200
Interval: 5 minutes
Note: Some providers may rate-limit these checks — use a generous interval (5 minutes) and test with a lightweight endpoint.
Response time is particularly critical for AI applications — LLM inference takes seconds, and degraded API performance has a significant UX impact.
Track response time trends in your monitoring data:
Set a response time alert threshold — if your /api/chat endpoint starts taking >10s on average, that warrants investigation even if it's technically "available."
AI applications often have background processing:
Use heartbeat monitoring for these:
import httpx
async def index_documents():
# Process and embed documents
await embed_and_store_documents(docs)
# Signal successful completion
async with httpx.AsyncClient() as client:
await client.get(f"{HEARTBEAT_URL}/ping/{MONITOR_TOKEN}")
If you're running Model Context Protocol (MCP) servers that provide tools and context to AI agents, monitor their HTTP endpoints just like any other API service:
Monitor: https://your-mcp-server.com/health
Expected status: 200
Interval: 1 minute
MCP servers are increasingly part of production AI agent architectures — their availability directly affects AI agent reliability.
Design your application to degrade gracefully when AI services are unavailable:
async def generate_response(user_message: str):
try:
# Try primary AI provider
return await openai_client.chat(user_message)
except openai.APIError:
try:
# Fallback to secondary provider
return await anthropic_client.messages(user_message)
except anthropic.APIError:
# Final fallback
return {"message": "AI assistant is temporarily unavailable. Please try again shortly."}
Fallback providers and graceful error messages maintain user experience during AI service outages.
Given the dependency complexity of AI applications, layer your alerts:
| Failure Type | Alert Channel | Why |
|---|---|---|
| Your frontend/API down | SMS + Slack | Users can't access the app |
| AI API degraded | Slack | Investigate, may need fallback |
| AI API down | SMS + Slack | Activate fallback provider |
| Background job missed | Slack | Processing pipeline broken |
| SSL certificate expiry | Email (30 days) | Preventable outage |
See how to set up downtime alerts for complete alert configuration.
AI applications face unique SLA challenges — you can't guarantee 99.9% availability if your AI provider has scheduled maintenance. Include appropriate carve-outs in your SLA for:
Document your AI provider dependencies in your SLA and update your public status page with component status for each dependency.
Monitor your AI application's availability from end to end at Domain Monitor.
A subdomain takeover lets an attacker claim your subdomain by exploiting dangling DNS records. Learn how it happens, real-world examples, and how DNS monitoring detects it.
Read moreMean time to detect (MTTD) measures how long it takes to discover an incident after it starts. Reducing MTTD is one of the highest-leverage improvements in reliability engineering.
Read moreBlack box monitoring tests your systems from the outside, the way users experience them — without access to internal code or infrastructure. Learn how it works and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.