How to Fix CrashLoopBackOff Kubernetes Pod on Azure AKS (2026 Guide)

CrashLoopBackOff on Azure Kubernetes Service (AKS) has all the same causes as any Kubernetes cluster, plus a set of Azure-specific issues that only appear when running on AKS. At Tasrie IT Services, we manage dozens of AKS clusters for clients across Europe and the Middle East. Many CrashLoopBackOff issues we see are caused by Azure-specific configurations that do not exist on EKS or GKE.

This guide covers both the general CrashLoopBackOff debugging process and the Azure-specific causes you need to check when running on AKS.

Quick Diagnosis on AKS

Start with the standard Kubernetes debugging commands:

# Check pod status
kubectl get pods -n <namespace>

# Get exit code and events
kubectl describe pod <pod-name> -n <namespace>

# Check previous container logs
kubectl logs <pod-name> -n <namespace> --previous

If the standard debugging does not reveal the cause, move to the Azure-specific checks below.

General CrashLoopBackOff Fixes (Apply to All Clusters)

Before checking Azure-specific issues, rule out the common causes:

Exit Code	Cause	Fix
1	Application error	Check logs for error messages, missing env vars, connection failures
127	Command not found	Fix entrypoint command or image
137	OOMKilled	Increase memory limit or fix memory leak
139	Segfault	Update base image or check architecture mismatch

For detailed exit code debugging, see our CrashLoopBackOff error guide.

Azure-Specific Cause 1: Azure Key Vault CSI Driver Failures

Many AKS applications use the Azure Key Vault Provider for Secrets Store CSI Driver to mount secrets from Azure Key Vault. When this fails, the pod enters CrashLoopBackOff because the application cannot read its secrets.

# Check if the SecretProviderClass exists
kubectl get secretproviderclass -n <namespace>

# Check CSI driver pods
kubectl get pods -n kube-system -l app=secrets-store-csi-driver

# Check the SecretProviderClass events
kubectl describe secretproviderclass <name> -n <namespace>

# Check pod events for mount errors
kubectl describe pod <pod-name> -n <namespace> | grep -i "key vault\|csi\|secret"

Common error messages:

Error	Cause	Fix
`failed to get keyvault client`	Identity not configured	Set up Workload Identity or managed identity
`keyvault forbidden`	Identity lacks access policy	Add Get/List permissions in Key Vault access policies
`secret not found`	Secret name wrong or does not exist	Verify secret name in Key Vault matches SecretProviderClass
`SecretProviderClass not found`	Missing SecretProviderClass resource	Create the SecretProviderClass in the correct namespace

Fix Key Vault access with Workload Identity:

# Check if the service account has the workload identity annotation
kubectl get sa <service-account> -n <namespace> -o yaml | grep azure.workload.identity

# Verify the federated credential exists
az identity federated-credential list --identity-name <identity-name> --resource-group <rg>

# Check Key Vault access policies
az keyvault show --name <vault-name> --query 'properties.accessPolicies'

Azure-Specific Cause 2: Azure CNI IP Exhaustion

AKS with Azure CNI assigns a real Azure VNET IP to each pod. When the subnet runs out of IPs, new pods cannot get an IP address and may crash during startup.

# Check subnet IP usage
az network vnet subnet show --resource-group <rg> --vnet-name <vnet> --name <subnet> --query 'ipConfigurations | length(@)'

# Check available IPs
az network vnet subnet show --resource-group <rg> --vnet-name <vnet> --name <subnet> --query addressPrefix

# Check the Azure CNI pods
kubectl get pods -n kube-system -l k8s-app=azure-cni
kubectl logs -n kube-system -l k8s-app=azure-cni --tail=20

Signs of IP exhaustion:

Pods stuck in ContainerCreating before entering CrashLoopBackOff
Events showing Failed to allocate address
New nodes failing to join the cluster

Fixes:

Expand the subnet address range
Switch to Azure CNI Overlay mode which uses a separate overlay network for pod IPs
Switch to kubenet networking (simpler but with limitations)
Reduce the number of pods per node with --max-pods

Azure-Specific Cause 3: Managed Identity and AAD Issues

Applications using Azure Managed Identity (via Workload Identity or AAD Pod Identity) crash when they cannot authenticate to Azure services.

# Check Workload Identity webhook
kubectl get pods -n kube-system -l azure-workload-identity.io/system=true

# Check if the service account is annotated correctly
kubectl get sa <sa-name> -n <namespace> -o yaml

# Check pod for identity-related env vars
kubectl exec <pod-name> -n <namespace> -- env | grep AZURE

Common errors in application logs:

Error	Cause	Fix
`ManagedIdentityCredential authentication failed`	Workload Identity not configured	Configure federated credential for the service account
`DefaultAzureCredential failed`	No Azure identity available	Ensure identity is assigned and RBAC is correct
`AADSTS700016: Application not found`	Wrong client ID	Verify the client ID in the service account annotation

Fix Workload Identity setup:

# Create user-assigned managed identity
az identity create --name <identity-name> --resource-group <rg>

# Create federated credential
az identity federated-credential create \
  --name <fedcred-name> \
  --identity-name <identity-name> \
  --resource-group <rg> \
  --issuer $(az aks show -n <cluster-name> -g <rg> --query oidcIssuerProfile.issuerUrl -o tsv) \
  --subject system:serviceaccount:<namespace>:<sa-name>

# Annotate the service account
kubectl annotate sa <sa-name> -n <namespace> \
  azure.workload.identity/client-id=<client-id>

# Label the service account
kubectl label sa <sa-name> -n <namespace> \
  azure.workload.identity/use=true

Azure-Specific Cause 4: AKS Node Pool Issues

CrashLoopBackOff can be caused by AKS node pool problems that do not appear on other platforms.

VM Size Mismatch

If the node pool uses a VM size that does not have enough CPU or memory for the pod’s resource requests, the pod gets scheduled but crashes under load.

# Check node pool VM size
az aks nodepool list --resource-group <rg> --cluster-name <cluster-name> -o table

# Check node resources
kubectl describe node <node-name> | grep -A 5 "Allocatable"

Ephemeral OS Disk Full

AKS nodes using ephemeral OS disks have limited disk space tied to the VM cache size. Container images and logs can fill this quickly.

# Check node disk pressure
kubectl describe node <node-name> | grep DiskPressure

# SSH into node and check disk
kubectl debug node/<node-name> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
# Inside: chroot /host && df -h

Fix: Use a larger VM size or switch to managed OS disks.

Accelerated Networking Conflicts

Some container networking configurations conflict with Azure Accelerated Networking, causing intermittent pod crashes.

# Check if accelerated networking is enabled
az vmss show --resource-group <node-rg> --name <vmss-name> --query 'virtualMachineProfile.networkProfile.networkInterfaceConfigurations[0].enableAcceleratedNetworking'

Azure-Specific Cause 5: Azure Container Registry (ACR) Pull Failures

AKS pods can enter CrashLoopBackOff if the initial pull succeeds but subsequent pulls fail (for imagePullPolicy: Always).

# Check ACR integration
az aks check-acr --name <cluster-name> --resource-group <rg> --acr <acr-name>

# Check if AKS has ACR pull role
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ContainerRegistry/registries/<acr-name> --query "[?principalName=='<kubelet-identity>']"

Fix ACR pull permissions:

# Attach ACR to AKS (adds AcrPull role)
az aks update --name <cluster-name> --resource-group <rg> --attach-acr <acr-name>

Azure-Specific Cause 6: Network Security Group (NSG) Blocking Traffic

NSG rules can block outbound traffic that the application needs, causing it to crash when it cannot reach Azure services or external APIs.

# Check effective NSG rules on the node subnet
az network nsg rule list --nsg-name <nsg-name> --resource-group <rg> -o table

# Test connectivity from inside a pod
kubectl run nettest --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 --rm -it --restart=Never -- wget -qO- https://management.azure.com/health 2>&1

Ports that must be open for AKS:

443 — API server, Azure services
9000 — Tunnelfront/aks-link
1194 — UDP for tunnel communication
123 — NTP (UDP)

AKS Debugging Tools

Azure Monitor and Container Insights

# Enable Container Insights if not already enabled
az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <rg>

# Query container logs in Azure Monitor
az monitor log-analytics query \
  --workspace <workspace-id> \
  --analytics-query "ContainerLogV2 | where PodName == '<pod-name>' | order by TimeGenerated desc | take 50"

AKS Diagnostics

# Run AKS diagnostics
az aks get-credentials --resource-group <rg> --name <cluster-name>
az aks show --resource-group <rg> --name <cluster-name> --query 'provisioningState'

# Check AKS cluster health
az aks show --resource-group <rg> --name <cluster-name> --query 'powerState'

kubectl debug on AKS Nodes

# AKS-specific node debugging (uses Mariner base image)
kubectl debug node/<node-name> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0

AKS CrashLoopBackOff Checklist

[ ] Check exit code and logs (kubectl describe pod, kubectl logs --previous)
[ ] Check if Key Vault CSI driver is failing (SecretProviderClass, identity)
[ ] Check subnet IP availability (az network vnet subnet show)
[ ] Check Workload Identity configuration (service account annotations)
[ ] Check ACR pull permissions (az aks check-acr)
[ ] Check NSG rules for blocked outbound traffic
[ ] Check node pool VM size and disk pressure
[ ] Check Azure service health (status.azure.com)

For general CrashLoopBackOff debugging that applies to all Kubernetes platforms, see our CrashLoopBackOff pod fix guide. For a broader overview of Kubernetes troubleshooting, see our production troubleshooting guide.

Need Help With AKS CrashLoopBackOff Issues?

Azure AKS adds a layer of complexity that requires Azure-specific expertise. Our engineers at Tasrie IT Services manage dozens of AKS clusters and can help you resolve CrashLoopBackOff issues fast and build the infrastructure to prevent them.

Our AKS consulting services include:

AKS incident response with engineers who understand Azure networking, identity, and storage
Workload Identity and Key Vault integration setup and troubleshooting
AKS cluster optimisation for reliability, security, and performance

We also provide consulting for EKS and GKE if you run multi-cloud Kubernetes.

Get AKS expert support →

How to Fix CrashLoopBackOff Kubernetes Pod on Azure AKS (2026 Guide)

Quick Diagnosis on AKS

General CrashLoopBackOff Fixes (Apply to All Clusters)

Azure-Specific Cause 1: Azure Key Vault CSI Driver Failures

Azure-Specific Cause 2: Azure CNI IP Exhaustion

Azure-Specific Cause 3: Managed Identity and AAD Issues

Azure-Specific Cause 4: AKS Node Pool Issues

VM Size Mismatch

Ephemeral OS Disk Full

Accelerated Networking Conflicts

Azure-Specific Cause 5: Azure Container Registry (ACR) Pull Failures

Azure-Specific Cause 6: Network Security Group (NSG) Blocking Traffic

AKS Debugging Tools

Azure Monitor and Container Insights

AKS Diagnostics

kubectl debug on AKS Nodes

AKS CrashLoopBackOff Checklist

Need Help With AKS CrashLoopBackOff Issues?

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Tasrie IT Support

Start a conversation

Quick Diagnosis on AKS

General CrashLoopBackOff Fixes (Apply to All Clusters)

Azure-Specific Cause 1: Azure Key Vault CSI Driver Failures

Azure-Specific Cause 2: Azure CNI IP Exhaustion

Azure-Specific Cause 3: Managed Identity and AAD Issues

Azure-Specific Cause 4: AKS Node Pool Issues

VM Size Mismatch

Ephemeral OS Disk Full

Accelerated Networking Conflicts

Azure-Specific Cause 5: Azure Container Registry (ACR) Pull Failures

Azure-Specific Cause 6: Network Security Group (NSG) Blocking Traffic

AKS Debugging Tools

Azure Monitor and Container Insights

AKS Diagnostics

kubectl debug on AKS Nodes

AKS CrashLoopBackOff Checklist

Need Help With AKS CrashLoopBackOff Issues?

Related Articles

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Don't Miss Out on Expert DevOps Insights

Get Started

You're In!

Tasrie IT Support

Start a conversation