How to Troubleshoot Kubernetes Deployment: We Debug Deployments Daily (2026 Guide)

A Kubernetes Deployment that will not roll out, gets stuck, or keeps creating failing pods is one of the most common issues teams face. At Tasrie IT Services, we debug deployment issues daily across client clusters. The good news is that deployments follow a predictable lifecycle, and once you understand it, troubleshooting becomes systematic.

This guide covers every deployment failure mode we have encountered, with the exact commands and fixes for each one.

Quick Diagnosis: Check Deployment Status

Start with these three commands to understand what is happening:

# Check deployment status
kubectl get deployment <deployment-name> -n <namespace>

# Check rollout status
kubectl rollout status deployment/<deployment-name> -n <namespace>

# Check deployment events and conditions
kubectl describe deployment <deployment-name> -n <namespace>

The kubectl rollout status command tells you whether the rollout is progressing, complete, or failed. If it hangs, the deployment is stuck.

The kubectl describe deployment output contains:

Conditions — Available, Progressing, and ReplicaFailure conditions
Events — chronological history of scaling actions
Replicas — desired vs current vs ready vs available counts

Understanding Deployment Conditions

Condition	Status	Meaning
`Progressing`	`True`	Rollout is actively creating new pods
`Progressing`	`True` (reason: `NewReplicaSetAvailable`)	Rollout completed successfully
`Progressing`	`False` (reason: `ProgressDeadlineExceeded`)	Rollout timed out
`Available`	`True`	Minimum required pods are running
`Available`	`False`	Not enough pods are ready
`ReplicaFailure`	`True`	Cannot create pods (quota, admission webhook)

Troubleshooting Failed Rollouts

Rollout Stuck: Pods in CrashLoopBackOff

The most common rollout failure. New pods are created but keep crashing, preventing the rollout from completing.

# Check the new ReplicaSet's pods
kubectl get replicasets -n <namespace> -l app=<app-label>
kubectl get pods -n <namespace> -l app=<app-label> | grep -v Running

# Check the failing pod's logs
kubectl logs <failing-pod> -n <namespace> --previous

# Check events on the failing pod
kubectl describe pod <failing-pod> -n <namespace>

Common causes:

Application bug in the new version
Missing or wrong environment variables
Missing ConfigMap or Secret
Wrong command or entrypoint in the container image
Insufficient memory causing OOMKilled

Fix: Either fix the issue in the container image/config or roll back:

# Roll back to the previous version
kubectl rollout undo deployment/<deployment-name> -n <namespace>

# Roll back to a specific revision
kubectl rollout history deployment/<deployment-name> -n <namespace>
kubectl rollout undo deployment/<deployment-name> -n <namespace> --to-revision=<N>

For detailed CrashLoopBackOff troubleshooting, see our CrashLoopBackOff fix guide.

Rollout Stuck: ProgressDeadlineExceeded

By default, Kubernetes waits 600 seconds (10 minutes) for a deployment to make progress. If no new pods become ready within that time, the rollout is marked as failed.

# Check deployment conditions
kubectl get deployment <deployment-name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Progressing")]}'

Common causes:

New pods failing readiness probes
New pods stuck in Pending (insufficient resources)
Image pull taking too long
Init containers timing out

Fix the deadline (if the application legitimately needs more time):

spec:
  progressDeadlineSeconds: 1200  # 20 minutes

Or investigate why pods are not becoming ready:

# Find the new ReplicaSet
kubectl get rs -n <namespace> -l app=<app-label> --sort-by='.metadata.creationTimestamp' | tail -1

# Check pods in the new ReplicaSet
kubectl get pods -n <namespace> -l pod-template-hash=<hash>
kubectl describe pod <pod-name> -n <namespace>

Rollout Stuck: Pods in Pending

New pods cannot be scheduled to any node.

kubectl describe pod <pending-pod> -n <namespace> | grep -A 10 "Events:"

Common causes and fixes:

Event Message	Cause	Fix
`Insufficient cpu`	Not enough CPU on any node	Scale the cluster or reduce resource requests
`Insufficient memory`	Not enough memory on any node	Scale the cluster or reduce resource requests
`node(s) had taint`	Taint/toleration mismatch	Add tolerations or remove taints
`node(s) didn't match Pod's node affinity`	Node selector/affinity mismatch	Fix labels or affinity rules
`persistentvolumeclaim not found`	Missing PVC	Create the PVC

For more on scheduling issues, see our Kubernetes taints and tolerations guide.

Rollout Stuck: ImagePullBackOff

New pods cannot pull the container image.

kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Events:" | grep -i "image\|pull"

Common causes:

Typo in image name or tag
Image tag does not exist in the registry
Private registry without imagePullSecrets
Docker Hub rate limiting

Quick check:

# Verify image exists (if using Docker Hub)
docker manifest inspect <image>:<tag>

# Check if imagePullSecrets are configured
kubectl get deployment <deployment-name> -n <namespace> -o jsonpath='{.spec.template.spec.imagePullSecrets}'

# Check if the secret exists and contains valid credentials
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

Rollout Stuck: Readiness Probe Failures

New pods start but never become ready, so the deployment never progresses.

# Check for Unhealthy events
kubectl describe pod <pod-name> -n <namespace> | grep -i "unhealthy\|readiness"

# Test the health endpoint manually
kubectl exec -it <pod-name> -n <namespace> -- curl -s localhost:<port>/health

Common causes:

Wrong probe path or port
initialDelaySeconds too short for application startup time
Application listening on a different port than configured
Probe endpoint checking downstream dependencies that are unavailable

Fix:

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30   # Give the app time to start
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

For slow-starting applications, add a startup probe:

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10
  # Gives the app up to 300 seconds (30 * 10) to start

Troubleshooting Deployment Scaling Issues

Deployment Not Scaling Up

# Check current replicas vs desired
kubectl get deployment <deployment-name> -n <namespace>

# Check if HPA is managing the deployment
kubectl get hpa -n <namespace>

# Check HPA status
kubectl describe hpa <hpa-name> -n <namespace>

Common causes:

HPA target metric is below the scale-up threshold
Metrics server not running
Pod resource requests not set (HPA needs requests to calculate utilisation)
Cluster out of capacity

# Check if metrics server is running
kubectl get pods -n kube-system | grep metrics-server
kubectl top pods -n <namespace>  # If this fails, metrics server has issues

Deployment Not Scaling Down

# Check PodDisruptionBudget
kubectl get pdb -n <namespace>

# Check if HPA minReplicas is preventing scale-down
kubectl get hpa <hpa-name> -n <namespace> -o jsonpath='{.spec.minReplicas}'

Troubleshooting Deployment Updates

Changes Not Taking Effect

You updated the deployment spec but nothing happened.

# Check if the deployment spec actually changed
kubectl get deployment <deployment-name> -n <namespace> -o yaml | grep -A 5 "image:"

# Check rollout history
kubectl rollout history deployment/<deployment-name> -n <namespace>

Common causes:

You changed a field that does not trigger a rollout (like replicas)
The deployment is paused

# Check if deployment is paused
kubectl get deployment <deployment-name> -n <namespace> -o jsonpath='{.spec.paused}'

# Resume if paused
kubectl rollout resume deployment/<deployment-name> -n <namespace>

Fields that trigger a rollout (new ReplicaSet):

spec.template.spec.containers[*].image
spec.template.spec.containers[*].env
spec.template.metadata.labels
spec.template.metadata.annotations
Any change under spec.template

Fields that do NOT trigger a rollout:

spec.replicas
spec.strategy
spec.minReadySeconds
spec.progressDeadlineSeconds

Rolling Update Causing Downtime

By default, Kubernetes performs a rolling update that creates new pods before terminating old ones. But misconfigurations can cause brief outages.

# Check the deployment strategy
kubectl get deployment <deployment-name> -n <namespace> -o jsonpath='{.spec.strategy}'

Prevent downtime during rollouts:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Create 1 extra pod at a time
      maxUnavailable: 0    # Never remove a pod until the new one is ready
  minReadySeconds: 10      # Wait 10s after pod is ready before continuing

Also ensure:

Readiness probes are configured and working
terminationGracePeriodSeconds is long enough for graceful shutdown
A preStop hook is defined to allow connections to drain

For monitoring rollout progress, see our kubectl rollout status guide.

Troubleshooting Deployment Rollbacks

Rollback Not Working

# Check rollout history
kubectl rollout history deployment/<deployment-name> -n <namespace>

# Check if there is a previous revision to roll back to
kubectl rollout history deployment/<deployment-name> -n <namespace> --revision=1

Common issues:

revisionHistoryLimit set to 0 (no previous ReplicaSets are kept)
The previous revision has the same issues

Check revision history limit:

kubectl get deployment <deployment-name> -n <namespace> -o jsonpath='{.spec.revisionHistoryLimit}'

If it is 0 or very low, increase it:

spec:
  revisionHistoryLimit: 10  # Keep last 10 revisions

Rollback Caused the Same Problem

If rolling back does not fix the issue, the problem is likely not in the container image but in the environment:

ConfigMap or Secret was changed independently of the deployment
A dependent service (database, cache, API) is down
A network policy was recently applied that blocks traffic
PVC or storage backend is failing

# Check if ConfigMaps changed recently
kubectl get configmap -n <namespace> -o yaml | head -20

# Check dependent services
kubectl get pods -n <namespace>
kubectl get endpoints -n <namespace>

Debugging Deployment YAML Issues

Validate Before Applying

# Dry-run to catch YAML errors before applying
kubectl apply -f deployment.yaml --dry-run=client

# Server-side dry-run (also validates admission webhooks)
kubectl apply -f deployment.yaml --dry-run=server

# Diff against current state
kubectl diff -f deployment.yaml

Common YAML Mistakes

Label selector mismatch — spec.selector.matchLabels must match spec.template.metadata.labels

# This will fail - labels don't match
spec:
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: my-app  # Mismatch!

Port mismatch — containerPort in the pod spec must match what the application listens on
Resource requests exceeding limits — requests cannot be greater than limits
Invalid characters in labels — labels must match [a-z0-9A-Z.-_] pattern

Deployment Troubleshooting Cheat Sheet

# Full deployment status overview
kubectl get deploy,rs,pods -n <namespace> -l app=<app-label>

# Check why rollout is stuck
kubectl rollout status deployment/<name> -n <namespace>
kubectl describe deployment <name> -n <namespace>

# Check new vs old ReplicaSet
kubectl get rs -n <namespace> -l app=<app-label> --sort-by='.metadata.creationTimestamp'

# Check failing pods in new ReplicaSet
kubectl get pods -n <namespace> -l pod-template-hash=<new-rs-hash>
kubectl logs <failing-pod> -n <namespace> --previous

# Quick rollback
kubectl rollout undo deployment/<name> -n <namespace>

# Check rollout history
kubectl rollout history deployment/<name> -n <namespace>

# Force restart all pods (useful for picking up ConfigMap changes)
kubectl rollout restart deployment/<name> -n <namespace>

For more on restarting deployments, see our kubectl restart deployment guide.

For a broader Kubernetes troubleshooting methodology, see our comprehensive troubleshooting guide.

Need Help With Kubernetes Deployments?

Failed deployments in production cost money and erode user trust. Our engineers at Tasrie IT Services troubleshoot deployment issues daily and can help you build CI/CD pipelines and deployment strategies that minimise risk.

Our Kubernetes consulting services include:

Deployment strategy design with rolling updates, canary releases, and blue-green deployments
CI/CD pipeline setup with ArgoCD, Flux, or traditional pipelines
Automated rollback and alerting that catches failed deployments before they impact users

Get expert Kubernetes deployment support →

How to Troubleshoot Kubernetes Deployment: We Debug Deployments Daily (2026 Guide)

Quick Diagnosis: Check Deployment Status

Understanding Deployment Conditions

Troubleshooting Failed Rollouts

Rollout Stuck: Pods in CrashLoopBackOff

Rollout Stuck: ProgressDeadlineExceeded

Rollout Stuck: Pods in Pending

Rollout Stuck: ImagePullBackOff

Rollout Stuck: Readiness Probe Failures

Troubleshooting Deployment Scaling Issues

Deployment Not Scaling Up

Deployment Not Scaling Down

Troubleshooting Deployment Updates

Changes Not Taking Effect

Rolling Update Causing Downtime

Troubleshooting Deployment Rollbacks

Rollback Not Working

Rollback Caused the Same Problem

Debugging Deployment YAML Issues

Validate Before Applying

Common YAML Mistakes

Deployment Troubleshooting Cheat Sheet

Need Help With Kubernetes Deployments?

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Tasrie IT Support

Start a conversation

Quick Diagnosis: Check Deployment Status

Understanding Deployment Conditions

Troubleshooting Failed Rollouts

Rollout Stuck: Pods in CrashLoopBackOff

Rollout Stuck: ProgressDeadlineExceeded

Rollout Stuck: Pods in Pending

Rollout Stuck: ImagePullBackOff

Rollout Stuck: Readiness Probe Failures

Troubleshooting Deployment Scaling Issues

Deployment Not Scaling Up

Deployment Not Scaling Down

Troubleshooting Deployment Updates

Changes Not Taking Effect

Rolling Update Causing Downtime

Troubleshooting Deployment Rollbacks

Rollback Not Working

Rollback Caused the Same Problem

Debugging Deployment YAML Issues

Validate Before Applying

Common YAML Mistakes

Deployment Troubleshooting Cheat Sheet

Need Help With Kubernetes Deployments?

Related Articles

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Don't Miss Out on Expert DevOps Insights

Get Started

You're In!

Tasrie IT Support

Start a conversation