~/blog/how-to-fix-crashloopbackoff-kubernetes-pod-2026
zsh
KUBERNETES

How to Fix CrashLoopBackOff Kubernetes Pod: We Fix 30+ a Week (2026 Guide)

Engineering Team 2026-04-26

CrashLoopBackOff is the single most common error in Kubernetes. It means a container starts, crashes, and Kubernetes keeps restarting it with increasing delays (10s, 20s, 40s, up to 5 minutes). At Tasrie IT Services, we fix over 30 CrashLoopBackOff issues per week across client clusters. The root cause is almost always one of six things.

This guide walks through each cause with the exact commands to diagnose it and the fix that works.

Quick Diagnosis: Two Commands

Run these two commands and you will know the root cause within 60 seconds:

# Step 1: Check the exit code and reason
kubectl describe pod <pod-name> -n <namespace>

# Step 2: Check the previous container's logs
kubectl logs <pod-name> -n <namespace> --previous

In the describe output, look for the Last State section:

Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      ...
  Finished:     ...

The exit code tells you exactly what happened:

Exit CodeSignalMeaningMost Likely Cause
0SuccessContainer completed normally (should it be a Job instead?)
1Application errorBug in code, missing config, unhandled exception
126Cannot executeWrong file permissions on entrypoint
127Command not foundWrong entrypoint/CMD in Dockerfile
137SIGKILLKilledOOMKilled or forced termination
139SIGSEGVSegfaultApplication crash, often in native code
143SIGTERMTerminatedGraceful shutdown (normal during rollouts)

Fix 1: Application Error (Exit Code 1)

Exit code 1 is a generic application error. The application started but crashed because of a code issue, missing dependency, or configuration problem.

# Check what the application printed before crashing
kubectl logs <pod-name> -n <namespace> --previous

Common causes we see:

Missing Environment Variables

The application expects environment variables that are not set in the pod spec.

# Check what env vars are configured
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].env}' | jq .

# Check if env vars reference ConfigMaps or Secrets that exist
kubectl get configmap -n <namespace>
kubectl get secret -n <namespace>

Fix: Add the missing environment variables to the deployment spec, or create the missing ConfigMap/Secret.

Database Connection Failure

The application tries to connect to a database that is not reachable.

# Test connectivity from the pod's network
kubectl run dbtest --image=busybox --rm -it --restart=Never -n <namespace> -- nc -zv <db-host> <db-port>

Fix: Check the database hostname, port, credentials, and network policies. Ensure the database service is running and the pod can reach it.

Missing Configuration File

The application expects a configuration file that was not mounted.

# Check volume mounts
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].volumeMounts}' | jq .

# Check if ConfigMap exists
kubectl get configmap <configmap-name> -n <namespace>

For more on ConfigMap issues, see our Kubernetes ConfigMap guide.

Fix 2: OOMKilled (Exit Code 137)

Exit code 137 means the container was killed by SIGKILL, almost always because the Linux OOM killer terminated it for exceeding the memory limit.

# Confirm OOMKilled
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Last State"
# Look for: Reason: OOMKilled

# Check memory limit vs actual usage
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].resources.limits.memory}'
kubectl top pod <pod-name> -n <namespace>

Fixes:

Increase Memory Limit

If the application genuinely needs more memory:

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Fix Memory Leak

If memory usage keeps growing over time, the application has a memory leak. Check:

  • Java/JVM apps: Ensure -Xmx is set to 75% of the container memory limit. The JVM uses memory outside the heap (metaspace, thread stacks, native memory) that must fit within the remaining 25%
  • Node.js apps: Check for event listener leaks, unclosed database connections, or growing caches
  • Python apps: Check for circular references, growing global state, or unclosed file handles

Use Vertical Pod Autoscaler

If you are not sure what the right memory limit should be, use VPA to automatically right-size based on actual usage.

For a dedicated deep dive, see our OOMKilled troubleshooting guide.

Fix 3: Command Not Found (Exit Code 127)

Exit code 127 means the container’s entrypoint command does not exist in the image.

# Check what command the pod is trying to run
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].command}'
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].args}'

Common causes:

  • Typo in the command or args field
  • Using command in the pod spec which overrides the Dockerfile’s ENTRYPOINT
  • The binary exists in a different path inside the container
  • Using a distroless or minimal image that does not include a shell

Fix:

# Debug by running the image with an interactive shell (if available)
kubectl run debug --image=<image>:<tag> --rm -it --restart=Never -- /bin/sh

# List files to find the correct binary path
ls /usr/local/bin/
which <expected-binary>

If the image is distroless and has no shell, use an ephemeral debug container:

kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name>

Fix 4: Permission Denied (Exit Code 126)

Exit code 126 means the entrypoint file exists but cannot be executed.

Common causes:

  • The entrypoint script does not have execute permissions
  • The container runs as a non-root user but the script is owned by root
  • A security context prevents execution
# Check security context
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].securityContext}' | jq .

# Check if the pod runs as non-root
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.securityContext}' | jq .

Fix: Add execute permissions to the entrypoint script in the Dockerfile:

RUN chmod +x /app/entrypoint.sh

Or fix the security context in the pod spec:

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000

Fix 5: Liveness Probe Killing the Container

A misconfigured liveness probe can cause CrashLoopBackOff by repeatedly killing the container before it has a chance to start.

# Check liveness probe configuration
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].livenessProbe}' | jq .

# Check events for probe failures
kubectl describe pod <pod-name> -n <namespace> | grep -i "unhealthy\|liveness"

Common causes:

  • initialDelaySeconds too short for the application’s startup time
  • The probe endpoint is wrong (wrong path or port)
  • The probe timeout is too short for the endpoint’s response time
  • The liveness probe checks downstream dependencies (it should only check if the process is alive)

Fix:

# Add a startup probe for slow-starting apps
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
  # Gives the app up to 300 seconds to start

# Keep the liveness probe simple
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  timeoutSeconds: 5

Key principle: Liveness probes should answer “is this process fundamentally broken?” — not “are all my dependencies working?” Use readiness probes for dependency checks.

Fix 6: Container Starts and Exits Immediately (Exit Code 0)

If the container exits with code 0, it completed successfully. This is only a problem if the container was supposed to keep running.

Common causes:

  • The container runs a one-time command (like a database migration) but is defined as a Deployment instead of a Job
  • The entrypoint script finishes without starting a long-running process
  • A shell script does not call exec to replace the shell with the application process

Fix for scripts that exit too early:

# Bad: script finishes, container exits
#!/bin/bash
echo "Starting app"
./start-app &

# Good: script replaces itself with the app process
#!/bin/bash
echo "Starting app"
exec ./start-app

Fix for one-time tasks: Use a Job or CronJob instead of a Deployment:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: myapp:latest
        command: ["./migrate.sh"]
      restartPolicy: Never

Debugging CrashLoopBackOff When Logs Are Empty

Sometimes the container crashes so fast that there are no logs, even with --previous.

# Check if there are any logs at all
kubectl logs <pod-name> -n <namespace> --previous 2>&1

# If "previous terminated container not found", the container never started
# Check events instead
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"

If there are truly no logs:

  1. Run the image manually to see what happens:
kubectl run debug --image=<image>:<tag> --rm -it --restart=Never -- /bin/sh
# Inside the container, manually run the entrypoint
  1. Use an ephemeral debug container:
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name> -- /bin/sh
  1. Check if it is a CreateContainerConfigError (not actually CrashLoopBackOff):
kubectl describe pod <pod-name> -n <namespace> | grep "Warning"
# Look for: "Error: configmap not found" or "Error: secret not found"

CrashLoopBackOff During Deployments

When a new deployment causes CrashLoopBackOff, roll back immediately and debug in a non-production environment.

# Roll back to the previous version
kubectl rollout undo deployment/<deployment-name> -n <namespace>

# Verify the rollback succeeded
kubectl rollout status deployment/<deployment-name> -n <namespace>

Then debug the failing image:

# Run the failing image as a standalone pod for debugging
kubectl run debug-crash --image=<failing-image>:<tag> --restart=Never -n <namespace> -- sleep 3600

# Exec in and test manually
kubectl exec -it debug-crash -n <namespace> -- /bin/sh

# Clean up when done
kubectl delete pod debug-crash -n <namespace>

For deployment rollback procedures, see our Kubernetes deployment troubleshooting guide.

Preventing CrashLoopBackOff

  1. Set up proper health checks — use startup probes for slow-starting apps, liveness probes for crash detection, readiness probes for traffic management
  2. Set memory limits based on actual usage — use VPA recommendations or kubectl top pod data
  3. Validate container images before deployment — run smoke tests in CI/CD
  4. Use --dry-run=server before applying deployments to catch config errors
  5. Monitor restart counts with Prometheus alerts: alert when kube_pod_container_status_restarts_total increases rapidly

For the broader Kubernetes troubleshooting methodology, see our production troubleshooting guide.


Stop CrashLoopBackOff From Disrupting Production

CrashLoopBackOff in production means your application is down and users are affected. Our engineers at Tasrie IT Services fix these issues daily and can help you build the monitoring, CI/CD gates, and deployment strategies to catch them before they reach production.

Our Kubernetes consulting services include:

  • Production debugging support for immediate CrashLoopBackOff resolution
  • CI/CD pipeline hardening with pre-deployment validation and automated rollbacks
  • Monitoring setup with alerts for pod restart counts, exit codes, and resource exhaustion

Get expert Kubernetes support →

Continue exploring these related topics

$ suggest --service

Need Kubernetes expertise?

From architecture to production support, we help teams run Kubernetes reliably at scale.

Get started
Chat with real humans
Chat on WhatsApp