Kubernetes pod monitoring dashboard showing pod health status and uptime checks
# website monitoring

How to Monitor Kubernetes Pod Uptime and Health

Kubernetes is the de facto standard for container orchestration — but running workloads in Kubernetes doesn't mean your services are automatically reliable. Pods crash, deployments roll out broken configurations, services lose their endpoints. Monitoring Kubernetes pod health from both inside and outside the cluster gives you the complete visibility you need to catch failures before users do.

This guide covers external HTTP monitoring, Kubernetes-native health probes, and how to combine them for robust uptime assurance.

Why External Monitoring Matters for Kubernetes

Kubernetes has built-in health checking (liveness and readiness probes) that restarts unhealthy pods. But these internal checks only tell you what's happening inside the cluster. They don't answer the critical question: can users actually reach your service?

External monitoring detects:

  • Ingress controller failures
  • Load balancer misconfiguration
  • DNS resolution problems
  • Network policy blocking external traffic
  • TLS termination failures

A pod can be perfectly healthy inside the cluster while being completely unreachable from the internet. External HTTP monitoring catches exactly this class of failure.

Kubernetes Health Probe Types

Kubernetes provides three probe types you should configure for every production workload:

Liveness Probe

Determines if a container is running. If the liveness probe fails, Kubernetes restarts the container.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

The /healthz endpoint should return 200 if the application process is alive, 500 if it's in an unrecoverable state.

Readiness Probe

Determines if a pod is ready to receive traffic. Pods that fail the readiness probe are removed from the Service's endpoint list — they won't receive requests until they recover.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2

The readiness probe is especially valuable during deployments and startup. It prevents traffic from reaching pods that are still initialising or waiting for dependencies (database connections, cache warmup).

Startup Probe

For slow-starting containers, a startup probe gives the application time to initialise before liveness probes take over:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

Implementing Health Endpoints

Your application needs proper health endpoints that probes can hit. A robust pattern separates liveness from readiness:

Liveness (/healthz): Returns 200 if the process is running. Should almost never fail — only return 500 if the process is truly broken and needs a restart.

Readiness (/ready): Returns 200 only when all dependencies are available:

  • Database connection is healthy
  • Cache connection is healthy
  • Required background jobs are running
  • Necessary configuration is loaded

Returns 503 (Service Unavailable) when any dependency is unavailable. This gracefully takes the pod out of rotation without killing it.

External Uptime Monitoring for Kubernetes Services

Once your Kubernetes service is exposed via an Ingress or LoadBalancer, set up external HTTP monitoring on the public endpoint.

For a service at https://api.yourdomain.com:

Monitor: https://api.yourdomain.com/healthz
Method: GET
Expected status: 200
Check interval: 1 minute
Alert channels: SMS + Slack

This confirms the entire path from the internet to your application: DNS → load balancer → ingress → service → pod → application.

Multi-Location Monitoring

Use multi-location monitoring to detect regional failures. A Kubernetes cluster in eu-west-1 with a misconfigured CDN edge might be inaccessible from users in North America while appearing fine from Europe.

Checking from multiple geographic locations surfaces these routing issues immediately.

Monitoring During Rolling Deployments

Kubernetes rolling deployments update pods incrementally, but a bad deployment can still cause visible errors. Configure your monitoring with a 2-failure confirmation count — this avoids false alarms during the brief unavailability that can occur as pods are replaced.

With a 1-minute check interval and 2-failure confirmation:

  • Transient pod restarts: no alert (single failure, resolves quickly)
  • Bad deployment breaking the service: alert within 2 minutes

This balance between speed and noise reduction is discussed in detail in how to set up downtime alerts.

SSL Certificate Monitoring

If your Kubernetes Ingress terminates TLS (common with cert-manager and Let's Encrypt), monitor your certificates:

# cert-manager Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
spec:
  dnsNames:
    - api.yourdomain.com
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod

cert-manager auto-renews certificates, but automation can fail. An external SSL certificate monitor with 30-day advance warnings ensures you're notified before any auto-renewal failure becomes a user-facing problem.

Alerting on Pod-Level Issues

For pod-level visibility inside the cluster, combine external monitoring with Kubernetes-native tooling:

IssueDetection Method
Service unreachable externallyExternal HTTP monitor
Pod crash loopKubernetes events + alertmanager
High pod restart countPrometheus kube-state-metrics
Certificate expiryExternal SSL monitor
Deployment failureKubernetes deployment status

External monitoring from Domain Monitor handles the external visibility layer. Prometheus and Alertmanager handle internal cluster metrics.

Common Kubernetes Monitoring Failures

Forgetting to expose health endpoints: Probes pointing at non-existent paths always return connection refused, causing constant pod restarts. Implement /healthz and /ready in every service.

Liveness probe too aggressive: A liveness probe with a short initialDelaySeconds on a slow-starting application will restart healthy pods. Use startup probes for slow applications.

Only monitoring internally: Relying solely on Kubernetes probes means ingress failures, DNS issues, and network policy problems go undetected. Always add external monitoring.

No readiness probe: Without a readiness probe, traffic is sent to pods immediately on start — before the application has connected to the database or finished initialising. Always implement readiness probes.

Incident Response for Kubernetes Outages

When your external monitor alerts on a Kubernetes service:

  1. Check the external endpoint — is it returning an error code or timing out?
  2. Check Kubernetes pod statuskubectl get pods -n <namespace> — are pods running?
  3. Check recent deploymentskubectl rollout history deployment/<name> — did something change?
  4. Check ingress status — is the ingress controller healthy?
  5. Rollback if neededkubectl rollout undo deployment/<name>

Your monitoring tool's timestamp tells you exactly when the failure started — correlate this with deployment and event timelines.


Monitor your Kubernetes services from outside the cluster at Domain Monitor.

More posts

What Is Generative AI? How It Works and What It Creates

Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.

Read more
What Is Cursor AI? The AI Code Editor Explained

Cursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.

Read more
What Is Claude Opus? Anthropic's Most Powerful Model Explained

Claude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.