How to Troubleshoot Kubernetes Pods: We Debug 50+ Pods a Week (2026 Guide)

When a Kubernetes pod is not working, you need to find the root cause fast. At Tasrie IT Services, our engineers debug over 50 pod issues per week across client clusters. The process is always the same: check the pod status, read the events, check the logs, fix the problem.

This guide covers every pod state you will encounter, the exact commands to diagnose each one, and the fixes that work in production.

The Pod Debugging Workflow

Every pod issue follows the same three-step diagnostic pattern. Run these commands in order and you will find the root cause within minutes:

Step 1: Check Pod Status

kubectl get pods -n <namespace>
kubectl get pods -n <namespace> -o wide  # includes node assignment and IP

The STATUS column tells you where to focus:

Status	Meaning	Next Step
`Pending`	Pod cannot be scheduled	Check events for scheduling failures
`ContainerCreating`	Image pulling or volume mounting	Check events for pull/mount errors
`Running`	Container is running	Check if app is actually healthy
`CrashLoopBackOff`	Container keeps crashing	Check logs from previous instance
`ImagePullBackOff`	Cannot pull container image	Check image name, tag, and registry auth
`ErrImagePull`	Image pull failed	Same as ImagePullBackOff
`OOMKilled`	Out of memory	Increase memory limit or fix memory leak
`Error`	Container exited with error	Check logs for application error
`Terminating`	Pod is being deleted	Check finalizers if stuck
`Init:Error`	Init container failed	Check init container logs
`Init:CrashLoopBackOff`	Init container keeps crashing	Check init container logs

Step 2: Read Events and Describe

kubectl describe pod <pod-name> -n <namespace>

Focus on two sections in the output:

Conditions — shows the pod’s lifecycle state (PodScheduled, Initialized, ContainersReady, Ready)
Events — shows the chronological history of what happened (scheduling, pulling, starting, failing)

The Events section is the single most useful piece of diagnostic information. It tells you exactly what Kubernetes tried to do and where it failed.

Step 3: Check Logs

# Current container logs
kubectl logs <pod-name> -n <namespace>

# Previous container logs (essential for CrashLoopBackOff)
kubectl logs <pod-name> -n <namespace> --previous

# Specific container in a multi-container pod
kubectl logs <pod-name> -n <namespace> -c <container-name>

# Follow logs in real time
kubectl logs <pod-name> -n <namespace> -f

# Last 100 lines
kubectl logs <pod-name> -n <namespace> --tail=100

For a comprehensive guide on kubectl logs, see our kubectl logs guide.

Troubleshooting Pending Pods

A pod stuck in Pending means the Kubernetes scheduler cannot find a suitable node.

kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"

Insufficient Resources

Event message: 0/3 nodes are available: 3 Insufficient cpu or 3 Insufficient memory

The cluster does not have enough allocatable CPU or memory to satisfy the pod’s resource requests.

# Check available resources on each node
kubectl describe nodes | grep -A 5 "Allocated resources"

# Check what the pod is requesting
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

Fixes:

Reduce the pod’s resource requests if they are over-provisioned
Add more nodes to the cluster
Remove or scale down other workloads to free capacity
Check if the cluster autoscaler is configured and working

Node Selector or Affinity Mismatch

Event message: 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector

# Check the pod's node selector
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.nodeSelector}'

# Check node labels
kubectl get nodes --show-labels

Fix: Either update the pod’s nodeSelector/nodeAffinity to match existing node labels, or add the required labels to nodes.

Taint and Toleration Mismatch

Event message: 0/3 nodes are available: 3 node(s) had taint {key: value}, that the pod didn't tolerate

# Check node taints
kubectl describe nodes | grep Taints

Fix: Add the appropriate toleration to the pod spec, or remove the taint from the nodes. See our guide on Kubernetes taints and tolerations for details.

PVC Not Bound

Event message: persistentvolumeclaim "data-pvc" not found or unbound PersistentVolumeClaims

kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

Fix: Create the missing PVC, fix the StorageClass reference, or ensure the storage provisioner is running.

Troubleshooting CrashLoopBackOff

CrashLoopBackOff means the container starts, crashes, and Kubernetes keeps restarting it with exponential backoff (10s, 20s, 40s, up to 5 minutes).

# Check exit code and reason
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Last State"

# Check previous container logs
kubectl logs <pod-name> -n <namespace> --previous

Common causes and fixes:

Exit Code	Cause	Fix
1	Application error	Check logs for the error message
126	Command cannot execute	Fix file permissions or entrypoint path
127	Command not found	Fix the image or entrypoint command
137	OOMKilled (SIGKILL)	Increase memory limit
139	Segfault	Fix application code or update base image
143	SIGTERM (graceful)	Usually normal during rolling updates

For detailed CrashLoopBackOff fixes, see our CrashLoopBackOff troubleshooting guide.

Troubleshooting ImagePullBackOff

The container runtime cannot pull the specified image.

kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events:"

Common causes:

Wrong image name or tag — Check for typos in the deployment spec
Private registry without credentials — Create and reference an imagePullSecret
Image tag does not exist — Verify the tag exists in the registry
Registry rate limiting — Docker Hub limits anonymous pulls to 100 per 6 hours
Network policy blocking egress — The node cannot reach the registry

# Check imagePullSecrets on the pod
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'

# Verify the secret exists
kubectl get secret <secret-name> -n <namespace>

# Test image pull manually on the node
sudo crictl pull <image-name>

Fix for private registries:

# Create a Docker registry secret
kubectl create secret docker-registry regcred \
  --docker-server=<registry-url> \
  --docker-username=<username> \
  --docker-password=<password> \
  -n <namespace>

Then reference it in your pod spec:

spec:
  imagePullSecrets:
    - name: regcred

Troubleshooting OOMKilled Pods

Exit code 137 with reason OOMKilled means the container exceeded its memory limit.

# Confirm OOMKilled
kubectl describe pod <pod-name> -n <namespace> | grep -B 5 "OOMKilled"

# Check current memory usage
kubectl top pod <pod-name> -n <namespace>

# Check memory limits
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources.limits.memory}'

Fixes:

Increase the memory limit if the application genuinely needs more
Fix memory leaks in the application
For JVM apps, set -Xmx to 75% of the container memory limit
Consider using Vertical Pod Autoscaler to right-size automatically

For a complete OOMKilled troubleshooting guide, see our dedicated post on fixing OOMKilled in Kubernetes.

Troubleshooting Probe Failures

Liveness, readiness, and startup probe failures cause different symptoms:

Liveness probe failure — Container gets restarted (can cause CrashLoopBackOff)
Readiness probe failure — Pod removed from Service endpoints (no traffic)
Startup probe failure — Container killed before liveness probe takes over

# Check probe configuration
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].livenessProbe}' | jq .
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

# Check events for probe failures
kubectl describe pod <pod-name> -n <namespace> | grep -i "unhealthy\|probe"

Common fixes:

Increase initialDelaySeconds for slow-starting applications
Increase timeoutSeconds if the health endpoint is slow under load
Add a startup probe for applications with variable startup times
Ensure the probe endpoint does not check downstream dependencies (liveness probes should only check if the process is alive)

Troubleshooting Pods Stuck in Terminating

Pods stuck in Terminating state will not go away on their own.

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.metadata.finalizers}'

# Check the node status (pod may be on a NotReady node)
kubectl get nodes

Common causes:

The node where the pod runs is NotReady (see our Node NotReady troubleshooting guide)
A finalizer is blocking deletion
The container is not responding to SIGTERM within terminationGracePeriodSeconds

Fixes:

# If the node is down and the pod will never terminate gracefully
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --force

# If a finalizer is blocking
kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}'

Use force deletion sparingly. For stateful workloads, force-deleting can cause data corruption if the pod is actually still running on a partitioned node.

Troubleshooting Init Container Failures

Init containers run before the main application container starts. If an init container fails, the pod stays in Init:Error or Init:CrashLoopBackOff.

# Check which init container failed
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Init Containers"

# Check init container logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name>

Common causes:

The init container is waiting for a service that does not exist yet (e.g., database migration)
Missing ConfigMap or Secret referenced by the init container
Network policy preventing the init container from reaching an external service

Advanced Debugging with Ephemeral Containers

For distroless or minimal images where you cannot exec into the container, use ephemeral debug containers:

# Attach a debug container to a running pod
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>

# Debug with a full networking toolkit
kubectl debug -it <pod-name> -n <namespace> --image=nicolaka/netshoot --target=<container-name>

This lets you inspect the pod’s filesystem, network stack, and process list without modifying the original container image.

Pod Debugging Cheat Sheet

# Quick status overview
kubectl get pods -n <namespace> --sort-by='.status.containerStatuses[0].restartCount'

# Find all non-running pods across all namespaces
kubectl get pods -A | grep -v Running | grep -v Completed

# Get pod resource usage
kubectl top pods -n <namespace> --sort-by=memory

# Check pod events across entire namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20

# Get pod YAML for detailed inspection
kubectl get pod <pod-name> -n <namespace> -o yaml

# Exec into a running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Copy files from a pod for analysis
kubectl cp <namespace>/<pod-name>:/path/to/file ./local-file

For a broader view of Kubernetes troubleshooting including cluster and node-level issues, see our comprehensive Kubernetes troubleshooting guide.

Stop Spending Hours Debugging Kubernetes Pods

Pod issues in production cost real money. Every minute a pod is in CrashLoopBackOff or stuck Pending is a minute your application is degraded or down. Our engineers at Tasrie IT Services debug these issues daily and can help you build the observability and automation to catch them before they impact users.

Our Kubernetes consulting services help you:

Set up proactive monitoring with Prometheus alerts for pod failures, restarts, and resource exhaustion
Build automated remediation workflows that fix common pod issues without human intervention
Train your team on systematic debugging methodology so every on-call engineer can resolve issues fast

Get expert Kubernetes support →

How to Troubleshoot Kubernetes Pods: We Debug 50+ Pods a Week (2026 Guide)

The Pod Debugging Workflow

Step 1: Check Pod Status

Step 2: Read Events and Describe

Step 3: Check Logs

Troubleshooting Pending Pods

Insufficient Resources

Node Selector or Affinity Mismatch

Taint and Toleration Mismatch

PVC Not Bound

Troubleshooting CrashLoopBackOff

Troubleshooting ImagePullBackOff

Troubleshooting OOMKilled Pods

Troubleshooting Probe Failures

Troubleshooting Pods Stuck in Terminating

Troubleshooting Init Container Failures

Advanced Debugging with Ephemeral Containers

Pod Debugging Cheat Sheet

Stop Spending Hours Debugging Kubernetes Pods

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Tasrie IT Support

Start a conversation

The Pod Debugging Workflow

Step 1: Check Pod Status

Step 2: Read Events and Describe

Step 3: Check Logs

Troubleshooting Pending Pods

Insufficient Resources

Node Selector or Affinity Mismatch

Taint and Toleration Mismatch

PVC Not Bound

Troubleshooting CrashLoopBackOff

Troubleshooting ImagePullBackOff

Troubleshooting OOMKilled Pods

Troubleshooting Probe Failures

Troubleshooting Pods Stuck in Terminating

Troubleshooting Init Container Failures

Advanced Debugging with Ephemeral Containers

Pod Debugging Cheat Sheet

Stop Spending Hours Debugging Kubernetes Pods

Related Articles

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Don't Miss Out on Expert DevOps Insights

Get Started

You're In!

Tasrie IT Support

Start a conversation