~/blog/how-to-troubleshoot-kubernetes-pods-2026
zsh
KUBERNETES

How to Troubleshoot Kubernetes Pods: We Debug 50+ Pods a Week (2026 Guide)

Engineering Team 2026-04-26

When a Kubernetes pod is not working, you need to find the root cause fast. At Tasrie IT Services, our engineers debug over 50 pod issues per week across client clusters. The process is always the same: check the pod status, read the events, check the logs, fix the problem.

This guide covers every pod state you will encounter, the exact commands to diagnose each one, and the fixes that work in production.

The Pod Debugging Workflow

Every pod issue follows the same three-step diagnostic pattern. Run these commands in order and you will find the root cause within minutes:

Step 1: Check Pod Status

kubectl get pods -n <namespace>
kubectl get pods -n <namespace> -o wide  # includes node assignment and IP

The STATUS column tells you where to focus:

StatusMeaningNext Step
PendingPod cannot be scheduledCheck events for scheduling failures
ContainerCreatingImage pulling or volume mountingCheck events for pull/mount errors
RunningContainer is runningCheck if app is actually healthy
CrashLoopBackOffContainer keeps crashingCheck logs from previous instance
ImagePullBackOffCannot pull container imageCheck image name, tag, and registry auth
ErrImagePullImage pull failedSame as ImagePullBackOff
OOMKilledOut of memoryIncrease memory limit or fix memory leak
ErrorContainer exited with errorCheck logs for application error
TerminatingPod is being deletedCheck finalizers if stuck
Init:ErrorInit container failedCheck init container logs
Init:CrashLoopBackOffInit container keeps crashingCheck init container logs

Step 2: Read Events and Describe

kubectl describe pod <pod-name> -n <namespace>

Focus on two sections in the output:

  1. Conditions — shows the pod’s lifecycle state (PodScheduled, Initialized, ContainersReady, Ready)
  2. Events — shows the chronological history of what happened (scheduling, pulling, starting, failing)

The Events section is the single most useful piece of diagnostic information. It tells you exactly what Kubernetes tried to do and where it failed.

Step 3: Check Logs

# Current container logs
kubectl logs <pod-name> -n <namespace>

# Previous container logs (essential for CrashLoopBackOff)
kubectl logs <pod-name> -n <namespace> --previous

# Specific container in a multi-container pod
kubectl logs <pod-name> -n <namespace> -c <container-name>

# Follow logs in real time
kubectl logs <pod-name> -n <namespace> -f

# Last 100 lines
kubectl logs <pod-name> -n <namespace> --tail=100

For a comprehensive guide on kubectl logs, see our kubectl logs guide.

Troubleshooting Pending Pods

A pod stuck in Pending means the Kubernetes scheduler cannot find a suitable node.

kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"

Insufficient Resources

Event message: 0/3 nodes are available: 3 Insufficient cpu or 3 Insufficient memory

The cluster does not have enough allocatable CPU or memory to satisfy the pod’s resource requests.

# Check available resources on each node
kubectl describe nodes | grep -A 5 "Allocated resources"

# Check what the pod is requesting
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

Fixes:

  • Reduce the pod’s resource requests if they are over-provisioned
  • Add more nodes to the cluster
  • Remove or scale down other workloads to free capacity
  • Check if the cluster autoscaler is configured and working

Node Selector or Affinity Mismatch

Event message: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector

# Check the pod's node selector
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.nodeSelector}'

# Check node labels
kubectl get nodes --show-labels

Fix: Either update the pod’s nodeSelector/nodeAffinity to match existing node labels, or add the required labels to nodes.

Taint and Toleration Mismatch

Event message: 0/3 nodes are available: 3 node(s) had taint {key: value}, that the pod didn't tolerate

# Check node taints
kubectl describe nodes | grep Taints

Fix: Add the appropriate toleration to the pod spec, or remove the taint from the nodes. See our guide on Kubernetes taints and tolerations for details.

PVC Not Bound

Event message: persistentvolumeclaim "data-pvc" not found or unbound PersistentVolumeClaims

kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

Fix: Create the missing PVC, fix the StorageClass reference, or ensure the storage provisioner is running.

Troubleshooting CrashLoopBackOff

CrashLoopBackOff means the container starts, crashes, and Kubernetes keeps restarting it with exponential backoff (10s, 20s, 40s, up to 5 minutes).

# Check exit code and reason
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Last State"

# Check previous container logs
kubectl logs <pod-name> -n <namespace> --previous

Common causes and fixes:

Exit CodeCauseFix
1Application errorCheck logs for the error message
126Command cannot executeFix file permissions or entrypoint path
127Command not foundFix the image or entrypoint command
137OOMKilled (SIGKILL)Increase memory limit
139SegfaultFix application code or update base image
143SIGTERM (graceful)Usually normal during rolling updates

For detailed CrashLoopBackOff fixes, see our CrashLoopBackOff troubleshooting guide.

Troubleshooting ImagePullBackOff

The container runtime cannot pull the specified image.

kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events:"

Common causes:

  1. Wrong image name or tag — Check for typos in the deployment spec
  2. Private registry without credentials — Create and reference an imagePullSecret
  3. Image tag does not exist — Verify the tag exists in the registry
  4. Registry rate limiting — Docker Hub limits anonymous pulls to 100 per 6 hours
  5. Network policy blocking egress — The node cannot reach the registry
# Check imagePullSecrets on the pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'

# Verify the secret exists
kubectl get secret <secret-name> -n <namespace>

# Test image pull manually on the node
sudo crictl pull <image-name>

Fix for private registries:

# Create a Docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=<registry-url> \
  --docker-username=<username> \
  --docker-password=<password> \
  -n <namespace>

Then reference it in your pod spec:

spec:
  imagePullSecrets:
    - name: regcred

Troubleshooting OOMKilled Pods

Exit code 137 with reason OOMKilled means the container exceeded its memory limit.

# Confirm OOMKilled
kubectl describe pod <pod-name> -n <namespace> | grep -B 5 "OOMKilled"

# Check current memory usage
kubectl top pod <pod-name> -n <namespace>

# Check memory limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'

Fixes:

  • Increase the memory limit if the application genuinely needs more
  • Fix memory leaks in the application
  • For JVM apps, set -Xmx to 75% of the container memory limit
  • Consider using Vertical Pod Autoscaler to right-size automatically

For a complete OOMKilled troubleshooting guide, see our dedicated post on fixing OOMKilled in Kubernetes.

Troubleshooting Probe Failures

Liveness, readiness, and startup probe failures cause different symptoms:

  • Liveness probe failure — Container gets restarted (can cause CrashLoopBackOff)
  • Readiness probe failure — Pod removed from Service endpoints (no traffic)
  • Startup probe failure — Container killed before liveness probe takes over
# Check probe configuration
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].livenessProbe}' | jq .
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

# Check events for probe failures
kubectl describe pod <pod-name> -n <namespace> | grep -i "unhealthy\|probe"

Common fixes:

  • Increase initialDelaySeconds for slow-starting applications
  • Increase timeoutSeconds if the health endpoint is slow under load
  • Add a startup probe for applications with variable startup times
  • Ensure the probe endpoint does not check downstream dependencies (liveness probes should only check if the process is alive)

Troubleshooting Pods Stuck in Terminating

Pods stuck in Terminating state will not go away on their own.

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.metadata.finalizers}'

# Check the node status (pod may be on a NotReady node)
kubectl get nodes

Common causes:

  • The node where the pod runs is NotReady (see our Node NotReady troubleshooting guide)
  • A finalizer is blocking deletion
  • The container is not responding to SIGTERM within terminationGracePeriodSeconds

Fixes:

# If the node is down and the pod will never terminate gracefully
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force

# If a finalizer is blocking
kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'

Use force deletion sparingly. For stateful workloads, force-deleting can cause data corruption if the pod is actually still running on a partitioned node.

Troubleshooting Init Container Failures

Init containers run before the main application container starts. If an init container fails, the pod stays in Init:Error or Init:CrashLoopBackOff.

# Check which init container failed
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Init Containers"

# Check init container logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name>

Common causes:

  • The init container is waiting for a service that does not exist yet (e.g., database migration)
  • Missing ConfigMap or Secret referenced by the init container
  • Network policy preventing the init container from reaching an external service

Advanced Debugging with Ephemeral Containers

For distroless or minimal images where you cannot exec into the container, use ephemeral debug containers:

# Attach a debug container to a running pod
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>

# Debug with a full networking toolkit
kubectl debug -it <pod-name> -n <namespace> --image=nicolaka/netshoot --target=<container-name>

This lets you inspect the pod’s filesystem, network stack, and process list without modifying the original container image.

Pod Debugging Cheat Sheet

# Quick status overview
kubectl get pods -n <namespace> --sort-by='.status.containerStatuses[0].restartCount'

# Find all non-running pods across all namespaces
kubectl get pods -A | grep -v Running | grep -v Completed

# Get pod resource usage
kubectl top pods -n <namespace> --sort-by=memory

# Check pod events across entire namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20

# Get pod YAML for detailed inspection
kubectl get pod <pod-name> -n <namespace> -o yaml

# Exec into a running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Copy files from a pod for analysis
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file

For a broader view of Kubernetes troubleshooting including cluster and node-level issues, see our comprehensive Kubernetes troubleshooting guide.


Stop Spending Hours Debugging Kubernetes Pods

Pod issues in production cost real money. Every minute a pod is in CrashLoopBackOff or stuck Pending is a minute your application is degraded or down. Our engineers at Tasrie IT Services debug these issues daily and can help you build the observability and automation to catch them before they impact users.

Our Kubernetes consulting services help you:

  • Set up proactive monitoring with Prometheus alerts for pod failures, restarts, and resource exhaustion
  • Build automated remediation workflows that fix common pod issues without human intervention
  • Train your team on systematic debugging methodology so every on-call engineer can resolve issues fast

Get expert Kubernetes support →

Continue exploring these related topics

$ suggest --service

Need Kubernetes expertise?

From architecture to production support, we help teams run Kubernetes reliably at scale.

Get started
Chat with real humans
Chat on WhatsApp