When a Kubernetes pod is not working, you need to find the root cause fast. At Tasrie IT Services, our engineers debug over 50 pod issues per week across client clusters. The process is always the same: check the pod status, read the events, check the logs, fix the problem.
This guide covers every pod state you will encounter, the exact commands to diagnose each one, and the fixes that work in production.
The Pod Debugging Workflow
Every pod issue follows the same three-step diagnostic pattern. Run these commands in order and you will find the root cause within minutes:
Step 1: Check Pod Status
kubectl get pods -n <namespace>
kubectl get pods -n <namespace> -o wide # includes node assignment and IP
The STATUS column tells you where to focus:
| Status | Meaning | Next Step |
|---|---|---|
Pending | Pod cannot be scheduled | Check events for scheduling failures |
ContainerCreating | Image pulling or volume mounting | Check events for pull/mount errors |
Running | Container is running | Check if app is actually healthy |
CrashLoopBackOff | Container keeps crashing | Check logs from previous instance |
ImagePullBackOff | Cannot pull container image | Check image name, tag, and registry auth |
ErrImagePull | Image pull failed | Same as ImagePullBackOff |
OOMKilled | Out of memory | Increase memory limit or fix memory leak |
Error | Container exited with error | Check logs for application error |
Terminating | Pod is being deleted | Check finalizers if stuck |
Init:Error | Init container failed | Check init container logs |
Init:CrashLoopBackOff | Init container keeps crashing | Check init container logs |
Step 2: Read Events and Describe
kubectl describe pod <pod-name> -n <namespace>
Focus on two sections in the output:
- Conditions — shows the pod’s lifecycle state (PodScheduled, Initialized, ContainersReady, Ready)
- Events — shows the chronological history of what happened (scheduling, pulling, starting, failing)
The Events section is the single most useful piece of diagnostic information. It tells you exactly what Kubernetes tried to do and where it failed.
Step 3: Check Logs
# Current container logs
kubectl logs <pod-name> -n <namespace>
# Previous container logs (essential for CrashLoopBackOff)
kubectl logs <pod-name> -n <namespace> --previous
# Specific container in a multi-container pod
kubectl logs <pod-name> -n <namespace> -c <container-name>
# Follow logs in real time
kubectl logs <pod-name> -n <namespace> -f
# Last 100 lines
kubectl logs <pod-name> -n <namespace> --tail=100
For a comprehensive guide on kubectl logs, see our kubectl logs guide.
Troubleshooting Pending Pods
A pod stuck in Pending means the Kubernetes scheduler cannot find a suitable node.
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"
Insufficient Resources
Event message: 0/3 nodes are available: 3 Insufficient cpu or 3 Insufficient memory
The cluster does not have enough allocatable CPU or memory to satisfy the pod’s resource requests.
# Check available resources on each node
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check what the pod is requesting
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
Fixes:
- Reduce the pod’s resource requests if they are over-provisioned
- Add more nodes to the cluster
- Remove or scale down other workloads to free capacity
- Check if the cluster autoscaler is configured and working
Node Selector or Affinity Mismatch
Event message: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector
# Check the pod's node selector
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.nodeSelector}'
# Check node labels
kubectl get nodes --show-labels
Fix: Either update the pod’s nodeSelector/nodeAffinity to match existing node labels, or add the required labels to nodes.
Taint and Toleration Mismatch
Event message: 0/3 nodes are available: 3 node(s) had taint {key: value}, that the pod didn't tolerate
# Check node taints
kubectl describe nodes | grep Taints
Fix: Add the appropriate toleration to the pod spec, or remove the taint from the nodes. See our guide on Kubernetes taints and tolerations for details.
PVC Not Bound
Event message: persistentvolumeclaim "data-pvc" not found or unbound PersistentVolumeClaims
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Fix: Create the missing PVC, fix the StorageClass reference, or ensure the storage provisioner is running.
Troubleshooting CrashLoopBackOff
CrashLoopBackOff means the container starts, crashes, and Kubernetes keeps restarting it with exponential backoff (10s, 20s, 40s, up to 5 minutes).
# Check exit code and reason
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Last State"
# Check previous container logs
kubectl logs <pod-name> -n <namespace> --previous
Common causes and fixes:
| Exit Code | Cause | Fix |
|---|---|---|
| 1 | Application error | Check logs for the error message |
| 126 | Command cannot execute | Fix file permissions or entrypoint path |
| 127 | Command not found | Fix the image or entrypoint command |
| 137 | OOMKilled (SIGKILL) | Increase memory limit |
| 139 | Segfault | Fix application code or update base image |
| 143 | SIGTERM (graceful) | Usually normal during rolling updates |
For detailed CrashLoopBackOff fixes, see our CrashLoopBackOff troubleshooting guide.
Troubleshooting ImagePullBackOff
The container runtime cannot pull the specified image.
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events:"
Common causes:
- Wrong image name or tag — Check for typos in the deployment spec
- Private registry without credentials — Create and reference an
imagePullSecret - Image tag does not exist — Verify the tag exists in the registry
- Registry rate limiting — Docker Hub limits anonymous pulls to 100 per 6 hours
- Network policy blocking egress — The node cannot reach the registry
# Check imagePullSecrets on the pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'
# Verify the secret exists
kubectl get secret <secret-name> -n <namespace>
# Test image pull manually on the node
sudo crictl pull <image-name>
Fix for private registries:
# Create a Docker registry secret
kubectl create secret docker-registry regcred \
--docker-server=<registry-url> \
--docker-username=<username> \
--docker-password=<password> \
-n <namespace>
Then reference it in your pod spec:
spec:
imagePullSecrets:
- name: regcred
Troubleshooting OOMKilled Pods
Exit code 137 with reason OOMKilled means the container exceeded its memory limit.
# Confirm OOMKilled
kubectl describe pod <pod-name> -n <namespace> | grep -B 5 "OOMKilled"
# Check current memory usage
kubectl top pod <pod-name> -n <namespace>
# Check memory limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'
Fixes:
- Increase the memory limit if the application genuinely needs more
- Fix memory leaks in the application
- For JVM apps, set
-Xmxto 75% of the container memory limit - Consider using Vertical Pod Autoscaler to right-size automatically
For a complete OOMKilled troubleshooting guide, see our dedicated post on fixing OOMKilled in Kubernetes.
Troubleshooting Probe Failures
Liveness, readiness, and startup probe failures cause different symptoms:
- Liveness probe failure — Container gets restarted (can cause CrashLoopBackOff)
- Readiness probe failure — Pod removed from Service endpoints (no traffic)
- Startup probe failure — Container killed before liveness probe takes over
# Check probe configuration
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].livenessProbe}' | jq .
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .
# Check events for probe failures
kubectl describe pod <pod-name> -n <namespace> | grep -i "unhealthy\|probe"
Common fixes:
- Increase
initialDelaySecondsfor slow-starting applications - Increase
timeoutSecondsif the health endpoint is slow under load - Add a startup probe for applications with variable startup times
- Ensure the probe endpoint does not check downstream dependencies (liveness probes should only check if the process is alive)
Troubleshooting Pods Stuck in Terminating
Pods stuck in Terminating state will not go away on their own.
# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.metadata.finalizers}'
# Check the node status (pod may be on a NotReady node)
kubectl get nodes
Common causes:
- The node where the pod runs is NotReady (see our Node NotReady troubleshooting guide)
- A finalizer is blocking deletion
- The container is not responding to SIGTERM within
terminationGracePeriodSeconds
Fixes:
# If the node is down and the pod will never terminate gracefully
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force
# If a finalizer is blocking
kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'
Use force deletion sparingly. For stateful workloads, force-deleting can cause data corruption if the pod is actually still running on a partitioned node.
Troubleshooting Init Container Failures
Init containers run before the main application container starts. If an init container fails, the pod stays in Init:Error or Init:CrashLoopBackOff.
# Check which init container failed
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Init Containers"
# Check init container logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name>
Common causes:
- The init container is waiting for a service that does not exist yet (e.g., database migration)
- Missing ConfigMap or Secret referenced by the init container
- Network policy preventing the init container from reaching an external service
Advanced Debugging with Ephemeral Containers
For distroless or minimal images where you cannot exec into the container, use ephemeral debug containers:
# Attach a debug container to a running pod
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>
# Debug with a full networking toolkit
kubectl debug -it <pod-name> -n <namespace> --image=nicolaka/netshoot --target=<container-name>
This lets you inspect the pod’s filesystem, network stack, and process list without modifying the original container image.
Pod Debugging Cheat Sheet
# Quick status overview
kubectl get pods -n <namespace> --sort-by='.status.containerStatuses[0].restartCount'
# Find all non-running pods across all namespaces
kubectl get pods -A | grep -v Running | grep -v Completed
# Get pod resource usage
kubectl top pods -n <namespace> --sort-by=memory
# Check pod events across entire namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
# Get pod YAML for detailed inspection
kubectl get pod <pod-name> -n <namespace> -o yaml
# Exec into a running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Copy files from a pod for analysis
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file
For a broader view of Kubernetes troubleshooting including cluster and node-level issues, see our comprehensive Kubernetes troubleshooting guide.
Stop Spending Hours Debugging Kubernetes Pods
Pod issues in production cost real money. Every minute a pod is in CrashLoopBackOff or stuck Pending is a minute your application is degraded or down. Our engineers at Tasrie IT Services debug these issues daily and can help you build the observability and automation to catch them before they impact users.
Our Kubernetes consulting services help you:
- Set up proactive monitoring with Prometheus alerts for pod failures, restarts, and resource exhaustion
- Build automated remediation workflows that fix common pod issues without human intervention
- Train your team on systematic debugging methodology so every on-call engineer can resolve issues fast