How to Monitor Kubernetes Pod Uptime and Health

Kubernetes is the de facto standard for container orchestration — but running workloads in Kubernetes doesn't mean your services are automatically reliable. Pods crash, deployments roll out broken configurations, services lose their endpoints. Monitoring Kubernetes pod health from both inside and outside the cluster gives you the complete visibility you need to catch failures before users do.

This guide covers external HTTP monitoring, Kubernetes-native health probes, and how to combine them for robust uptime assurance.

Why External Monitoring Matters for Kubernetes

Kubernetes has built-in health checking (liveness and readiness probes) that restarts unhealthy pods. But these internal checks only tell you what's happening inside the cluster. They don't answer the critical question: can users actually reach your service?

External monitoring detects:

Ingress controller failures
Load balancer misconfiguration
DNS resolution problems
Network policy blocking external traffic
TLS termination failures

A pod can be perfectly healthy inside the cluster while being completely unreachable from the internet. External HTTP monitoring catches exactly this class of failure.

Kubernetes Health Probe Types

Kubernetes provides three probe types you should configure for every production workload:

Liveness Probe

Determines if a container is running. If the liveness probe fails, Kubernetes restarts the container.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

The /healthz endpoint should return 200 if the application process is alive, 500 if it's in an unrecoverable state.

Readiness Probe

Determines if a pod is ready to receive traffic. Pods that fail the readiness probe are removed from the Service's endpoint list — they won't receive requests until they recover.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2

The readiness probe is especially valuable during deployments and startup. It prevents traffic from reaching pods that are still initialising or waiting for dependencies (database connections, cache warmup).

Startup Probe

For slow-starting containers, a startup probe gives the application time to initialise before liveness probes take over:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

Implementing Health Endpoints

Your application needs proper health endpoints that probes can hit. A robust pattern separates liveness from readiness:

Liveness (/healthz): Returns 200 if the process is running. Should almost never fail — only return 500 if the process is truly broken and needs a restart.

Readiness (/ready): Returns 200 only when all dependencies are available:

Database connection is healthy
Cache connection is healthy
Required background jobs are running
Necessary configuration is loaded

Returns 503 (Service Unavailable) when any dependency is unavailable. This gracefully takes the pod out of rotation without killing it.

External Uptime Monitoring for Kubernetes Services

Once your Kubernetes service is exposed via an Ingress or LoadBalancer, set up external HTTP monitoring on the public endpoint.

For a service at https://api.yourdomain.com:

Monitor: https://api.yourdomain.com/healthz
Method: GET
Expected status: 200
Check interval: 1 minute
Alert channels: SMS + Slack

This confirms the entire path from the internet to your application: DNS → load balancer → ingress → service → pod → application.

Multi-Location Monitoring

Use multi-location monitoring to detect regional failures. A Kubernetes cluster in eu-west-1 with a misconfigured CDN edge might be inaccessible from users in North America while appearing fine from Europe.

Checking from multiple geographic locations surfaces these routing issues immediately.

Monitoring During Rolling Deployments

Kubernetes rolling deployments update pods incrementally, but a bad deployment can still cause visible errors. Configure your monitoring with a 2-failure confirmation count — this avoids false alarms during the brief unavailability that can occur as pods are replaced.

With a 1-minute check interval and 2-failure confirmation:

Transient pod restarts: no alert (single failure, resolves quickly)
Bad deployment breaking the service: alert within 2 minutes

This balance between speed and noise reduction is discussed in detail in how to set up downtime alerts.

SSL Certificate Monitoring

If your Kubernetes Ingress terminates TLS (common with cert-manager and Let's Encrypt), monitor your certificates:

# cert-manager Certificate resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
spec:
  dnsNames:
    - api.yourdomain.com
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod

cert-manager auto-renews certificates, but automation can fail. An external SSL certificate monitor with 30-day advance warnings ensures you're notified before any auto-renewal failure becomes a user-facing problem.

Alerting on Pod-Level Issues

For pod-level visibility inside the cluster, combine external monitoring with Kubernetes-native tooling:

Issue	Detection Method
Service unreachable externally	External HTTP monitor
Pod crash loop	Kubernetes events + alertmanager
High pod restart count	Prometheus kube-state-metrics
Certificate expiry	External SSL monitor
Deployment failure	Kubernetes deployment status

External monitoring from Domain Monitor handles the external visibility layer. Prometheus and Alertmanager handle internal cluster metrics.

Common Kubernetes Monitoring Failures

Forgetting to expose health endpoints: Probes pointing at non-existent paths always return connection refused, causing constant pod restarts. Implement /healthz and /ready in every service.

Liveness probe too aggressive: A liveness probe with a short initialDelaySeconds on a slow-starting application will restart healthy pods. Use startup probes for slow applications.

Only monitoring internally: Relying solely on Kubernetes probes means ingress failures, DNS issues, and network policy problems go undetected. Always add external monitoring.

No readiness probe: Without a readiness probe, traffic is sent to pods immediately on start — before the application has connected to the database or finished initialising. Always implement readiness probes.

Incident Response for Kubernetes Outages

When your external monitor alerts on a Kubernetes service:

Check the external endpoint — is it returning an error code or timing out?
Check Kubernetes pod status — kubectl get pods -n <namespace> — are pods running?
Check recent deployments — kubectl rollout history deployment/<name> — did something change?
Check ingress status — is the ingress controller healthy?
Rollback if needed — kubectl rollout undo deployment/<name>

Your monitoring tool's timestamp tells you exactly when the failure started — correlate this with deployment and event timelines.

Monitor your Kubernetes services from outside the cluster at Domain Monitor.

How to Monitor Kubernetes Pod Uptime and Health

Why External Monitoring Matters for Kubernetes

Kubernetes Health Probe Types

Liveness Probe

Readiness Probe

Startup Probe

Implementing Health Endpoints

External Uptime Monitoring for Kubernetes Services

Multi-Location Monitoring

Monitoring During Rolling Deployments

SSL Certificate Monitoring

Alerting on Pod-Level Issues

Common Kubernetes Monitoring Failures

Incident Response for Kubernetes Outages

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.

Domain Monitoring

Uptime Monitoring

SSL Monitoring

WHOIS Lookup

Notifications

Status Pages

Ping test

Traceroute test

Find my website's IP

# website monitoring

How to Monitor Kubernetes Pod Uptime and Health

Why External Monitoring Matters for Kubernetes

Kubernetes Health Probe Types

Liveness Probe

Readiness Probe

Startup Probe

Implementing Health Endpoints

External Uptime Monitoring for Kubernetes Services

Multi-Location Monitoring

Monitoring During Rolling Deployments

SSL Certificate Monitoring

Alerting on Pod-Level Issues

Common Kubernetes Monitoring Failures

Incident Response for Kubernetes Outages

Related Articles

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.