~/blog/how-to-fix-oomkilled-kubernetes-2026
zsh
KUBERNETES

How to Fix OOMKilled in Kubernetes: We Resolved 500+ Cases (2026 Guide)

Engineering Team 2026-04-26

OOMKilled is the second most common pod failure in Kubernetes after CrashLoopBackOff. It means the Linux Out-of-Memory (OOM) killer terminated your container because it exceeded its memory limit. At Tasrie IT Services, we have resolved over 500 OOMKilled incidents across client clusters, and the fix is almost always one of four things: the limit is too low, the JVM heap is misconfigured, there is a memory leak, or the application is caching too aggressively.

This guide covers how to diagnose the exact cause and apply the right fix.

Confirming OOMKilled

# Check pod status
kubectl describe pod <pod-name> -n <namespace>

Look for this in the output:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Mon, 26 Apr 2026 10:00:00 +0000
  Finished:     Mon, 26 Apr 2026 10:15:32 +0000

Exit code 137 = 128 + 9 (SIGKILL). The container was forcibly killed by the Linux kernel’s OOM killer.

Important distinction: OOMKilled can happen at two levels:

  1. Container-level OOMKilled — The container exceeded its Kubernetes memory limit. This is the most common case.
  2. Node-level OOMKilled — The node itself ran out of memory and the kernel killed a container to free memory, even if the container was within its limit.
# Check if it is container-level (memory limit set)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].resources.limits.memory}'

# Check if node is under memory pressure
kubectl describe node <node-name> | grep MemoryPressure

Quick Fix: Is the Memory Limit Just Too Low?

Before investigating memory leaks, check if the container simply needs more memory than its limit allows.

# Check the memory limit
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

# Check peak memory usage (if the pod is currently running)
kubectl top pod <pod-name> -n <namespace> --containers

If the memory usage consistently approaches the limit before getting killed, the limit is probably too low.

Fix: Increase the memory limit:

resources:
  requests:
    memory: "512Mi"   # Used for scheduling
  limits:
    memory: "1Gi"     # Hard cap — OOMKilled if exceeded

How to choose the right limit:

  • Set the limit to 1.5-2x the typical memory usage
  • Monitor actual usage with kubectl top pod or Prometheus over several days
  • Use Vertical Pod Autoscaler (VPA) to get data-driven recommendations

Fix for JVM Applications (Java, Kotlin, Scala)

JVM applications are the most common source of OOMKilled we see. The problem is usually that the JVM uses more memory than just the heap, and teams set -Xmx equal to the container limit.

Understanding JVM Memory

Total JVM Memory = Heap + Metaspace + Thread Stacks + Native Memory + Code Cache + GC + Buffers

The heap (-Xmx) is only one part. A JVM with -Xmx=1g in a container with a 1Gi limit will get OOMKilled because:

ComponentTypical Size
Heap (-Xmx)1024 MB
Metaspace50-200 MB
Thread stacks (200 threads × 1MB)200 MB
Code cache50-100 MB
Native memory50-200 MB
GC overhead50-100 MB
Total~1500-1800 MB

The Fix: Set Heap to 75% of Container Limit

env:
  - name: JAVA_OPTS
    value: "-Xmx768m -Xms256m -XX:MaxMetaspaceSize=128m"
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Modern JVM (Java 17+) alternative — use container-aware flags:

env:
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=25.0"

-XX:MaxRAMPercentage=75.0 automatically sets the heap to 75% of the detected container memory limit. This is the recommended approach for containerised JVM applications.

Verify JVM Memory Usage

# Check JVM memory from inside the pod
kubectl exec -it <pod-name> -n <namespace> -- jcmd 1 VM.native_memory summary

# If jcmd is not available, check the heap
kubectl exec -it <pod-name> -n <namespace> -- java -XshowSettings:all -version 2>&1 | grep -i "heap\|memory"

Fix for Node.js Applications

Node.js has a default heap limit of approximately 1.5 GB on 64-bit systems. For containers with smaller memory limits, you must set --max-old-space-size.

env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=768"
resources:
  limits:
    memory: "1Gi"

Common Node.js memory leak sources:

  • Event listeners not being removed
  • Growing arrays or objects in module-level scope
  • Unclosed database connections
  • Express.js middleware that accumulates data per request

Debug Node.js memory:

# Take a heap snapshot
kubectl exec -it <pod-name> -n <namespace> -- node -e "process.kill(process.pid, 'SIGUSR2')"

# Check heap usage
kubectl exec -it <pod-name> -n <namespace> -- node -e "console.log(process.memoryUsage())"

Fix for Python Applications

Python applications can consume more memory than expected due to the way Python manages objects.

Common causes:

  • Large Pandas DataFrames loaded into memory
  • Growing dictionaries or lists
  • Django/Flask caching accumulating data
  • Celery workers not releasing memory between tasks
resources:
  limits:
    memory: "512Mi"

Debug Python memory:

kubectl exec -it <pod-name> -n <namespace> -- python3 -c "
import tracemalloc
tracemalloc.start()
# Your app's key operations
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)
"

Fix for Go Applications

Go applications rarely have traditional memory leaks, but goroutine leaks are common.

# Check goroutine count
kubectl exec -it <pod-name> -n <namespace> -- curl localhost:6060/debug/pprof/goroutine?debug=1 | head -5

# Check heap profile
kubectl exec -it <pod-name> -n <namespace> -- curl localhost:6060/debug/pprof/heap > heap.prof

Common Go memory issues:

  • Goroutine leaks (blocked goroutines accumulating)
  • Large slice allocations without proper garbage collection
  • CGo memory not tracked by Go runtime

Detecting Memory Leaks

A memory leak causes OOMKilled to happen after the pod has been running for some time (hours or days), not immediately after startup.

Using kubectl top

# Watch memory usage over time
watch -n 10 kubectl top pod <pod-name> -n <namespace>

If memory usage continuously grows without plateauing, you have a leak.

Using Prometheus

Query memory usage over time:

container_memory_usage_bytes{pod="<pod-name>", namespace="<namespace>"}

If the graph shows a steady upward trend without flattening, investigate the leak.

Using VPA Recommendations

The Vertical Pod Autoscaler analyses actual memory consumption and recommends appropriate limits:

# Install VPA (if not installed)
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml

# Create a VPA in recommendation mode
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
  namespace: <namespace>
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <deployment-name>
  updatePolicy:
    updateMode: "Off"  # Recommendation only, no auto-update
EOF

# Check recommendations after a few hours
kubectl describe vpa myapp-vpa -n <namespace>

Node-Level OOMKilled (No Memory Limit Set)

If your container does not have a memory limit, it can still be OOMKilled by the node’s kernel OOM killer when the node runs out of memory.

# Check if the node has memory pressure
kubectl describe node <node-name> | grep MemoryPressure

# Check kubelet eviction thresholds
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz | jq '.kubeletconfig.evictionHard'

# Check what is consuming memory on the node
kubectl top pods --all-namespaces --field-selector spec.nodeName=<node-name> --sort-by=memory

Fix: Always set memory requests and limits. Pods without limits are classified as BestEffort QoS and are the first to be evicted during memory pressure.

resources:
  requests:
    memory: "256Mi"   # Minimum guaranteed memory
  limits:
    memory: "512Mi"   # Maximum allowed memory

QoS classes and eviction priority:

QoS ClassWhen AppliedEviction Priority
Guaranteedrequests == limits for all resourcesLast (most protected)
Burstablerequests < limitsMiddle
BestEffortNo requests or limits setFirst (least protected)

For critical workloads, use Guaranteed QoS (set requests equal to limits) to minimise the chance of being evicted.

Prevention: Stop OOMKilled Before It Happens

Set Up Prometheus Alerts

groups:
  - name: oom-alerts
    rules:
      - alert: ContainerNearOOMLimit
        expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} in pod {{ $labels.pod }} is using {{ $value | humanizePercentage }} of its memory limit"

      - alert: ContainerOOMKilled
        expr: kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.container }} in pod {{ $labels.pod }} was OOMKilled"

For full Prometheus setup, see our Prometheus monitoring guide for Kubernetes.

Resource Right-Sizing Checklist

  1. Monitor actual usage with kubectl top pod or Prometheus for at least 24 hours before setting limits
  2. Set limits to 1.5-2x typical usage to handle spikes
  3. Set requests to typical (P50) usage for accurate scheduling
  4. Use VPA recommendations for data-driven sizing
  5. Account for language-specific overhead (JVM metaspace, Node.js V8 overhead, Python object overhead)
  6. Test under load — memory usage during peak traffic may be much higher than idle

Common Mistakes

  1. Setting requests equal to limits unnecessarily — this wastes cluster capacity. Only do this for critical workloads that need Guaranteed QoS
  2. Not accounting for JVM non-heap memory — the heap is only 50-70% of total JVM memory
  3. Setting limits based on idle usage — always test under realistic load
  4. Ignoring ephemeral storage — large temporary files count against the container’s memory if written to tmpfs-backed emptyDir volumes

For broader Kubernetes troubleshooting, see our production troubleshooting guide.


Stop OOMKilled From Disrupting Your Applications

OOMKilled errors cause unexpected restarts, data loss, and degraded user experience. Our engineers at Tasrie IT Services have resolved hundreds of OOMKilled cases and can help you right-size your workloads and build monitoring to catch memory issues before they crash your pods.

Our Kubernetes consulting services include:

  • Resource right-sizing with VPA setup and data-driven memory limit recommendations
  • Memory leak diagnosis with profiling and heap analysis for JVM, Node.js, Python, and Go applications
  • Monitoring and alerting with Prometheus alerts for memory usage trends and OOMKilled events

Get expert Kubernetes support →

Continue exploring these related topics

$ suggest --service

Need Kubernetes expertise?

From architecture to production support, we help teams run Kubernetes reliably at scale.

Get started
Chat with real humans
Chat on WhatsApp