How to Fix OOMKilled in Kubernetes: We Resolved 500+ Cases (2026 Guide)

OOMKilled is the second most common pod failure in Kubernetes after CrashLoopBackOff. It means the Linux Out-of-Memory (OOM) killer terminated your container because it exceeded its memory limit. At Tasrie IT Services, we have resolved over 500 OOMKilled incidents across client clusters, and the fix is almost always one of four things: the limit is too low, the JVM heap is misconfigured, there is a memory leak, or the application is caching too aggressively.

This guide covers how to diagnose the exact cause and apply the right fix.

Confirming OOMKilled

# Check pod status
kubectl describe pod <pod-name> -n <namespace>

Look for this in the output:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Mon, 26 Apr 2026 10:00:00 +0000
  Finished:     Mon, 26 Apr 2026 10:15:32 +0000

Exit code 137 = 128 + 9 (SIGKILL). The container was forcibly killed by the Linux kernel’s OOM killer.

Important distinction: OOMKilled can happen at two levels:

Container-level OOMKilled — The container exceeded its Kubernetes memory limit. This is the most common case.
Node-level OOMKilled — The node itself ran out of memory and the kernel killed a container to free memory, even if the container was within its limit.

# Check if it is container-level (memory limit set)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].resources.limits.memory}'

# Check if node is under memory pressure
kubectl describe node <node-name> | grep MemoryPressure

Quick Fix: Is the Memory Limit Just Too Low?

Before investigating memory leaks, check if the container simply needs more memory than its limit allows.

# Check the memory limit
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'

# Check peak memory usage (if the pod is currently running)
kubectl top pod <pod-name> -n <namespace> --containers

If the memory usage consistently approaches the limit before getting killed, the limit is probably too low.

Fix: Increase the memory limit:

resources:
  requests:
    memory: "512Mi"   # Used for scheduling
  limits:
    memory: "1Gi"     # Hard cap — OOMKilled if exceeded

How to choose the right limit:

Set the limit to 1.5-2x the typical memory usage
Monitor actual usage with kubectl top pod or Prometheus over several days
Use Vertical Pod Autoscaler (VPA) to get data-driven recommendations

Fix for JVM Applications (Java, Kotlin, Scala)

JVM applications are the most common source of OOMKilled we see. The problem is usually that the JVM uses more memory than just the heap, and teams set -Xmx equal to the container limit.

Understanding JVM Memory

Total JVM Memory = Heap + Metaspace + Thread Stacks + Native Memory + Code Cache + GC + Buffers

The heap (-Xmx) is only one part. A JVM with -Xmx=1g in a container with a 1Gi limit will get OOMKilled because:

Component	Typical Size
Heap (`-Xmx`)	1024 MB
Metaspace	50-200 MB
Thread stacks (200 threads × 1MB)	200 MB
Code cache	50-100 MB
Native memory	50-200 MB
GC overhead	50-100 MB
Total	~1500-1800 MB

The Fix: Set Heap to 75% of Container Limit

env:
  - name: JAVA_OPTS
    value: "-Xmx768m -Xms256m -XX:MaxMetaspaceSize=128m"
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Modern JVM (Java 17+) alternative — use container-aware flags:

env:
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=25.0"

-XX:MaxRAMPercentage=75.0 automatically sets the heap to 75% of the detected container memory limit. This is the recommended approach for containerised JVM applications.

Verify JVM Memory Usage

# Check JVM memory from inside the pod
kubectl exec -it <pod-name> -n <namespace> -- jcmd 1 VM.native_memory summary

# If jcmd is not available, check the heap
kubectl exec -it <pod-name> -n <namespace> -- java -XshowSettings:all -version 2>&1 | grep -i "heap\|memory"

Fix for Node.js Applications

Node.js has a default heap limit of approximately 1.5 GB on 64-bit systems. For containers with smaller memory limits, you must set --max-old-space-size.

env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=768"
resources:
  limits:
    memory: "1Gi"

Common Node.js memory leak sources:

Event listeners not being removed
Growing arrays or objects in module-level scope
Unclosed database connections
Express.js middleware that accumulates data per request

Debug Node.js memory:

# Take a heap snapshot
kubectl exec -it <pod-name> -n <namespace> -- node -e "process.kill(process.pid, 'SIGUSR2')"

# Check heap usage
kubectl exec -it <pod-name> -n <namespace> -- node -e "console.log(process.memoryUsage())"

Fix for Python Applications

Python applications can consume more memory than expected due to the way Python manages objects.

Common causes:

Large Pandas DataFrames loaded into memory
Growing dictionaries or lists
Django/Flask caching accumulating data
Celery workers not releasing memory between tasks

resources:
  limits:
    memory: "512Mi"

Debug Python memory:

kubectl exec -it <pod-name> -n <namespace> -- python3 -c "
import tracemalloc
tracemalloc.start()
# Your app's key operations
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)
"

Fix for Go Applications

Go applications rarely have traditional memory leaks, but goroutine leaks are common.

# Check goroutine count
kubectl exec -it <pod-name> -n <namespace> -- curl localhost:6060/debug/pprof/goroutine?debug=1 | head -5

# Check heap profile
kubectl exec -it <pod-name> -n <namespace> -- curl localhost:6060/debug/pprof/heap > heap.prof

Common Go memory issues:

Goroutine leaks (blocked goroutines accumulating)
Large slice allocations without proper garbage collection
CGo memory not tracked by Go runtime

Detecting Memory Leaks

A memory leak causes OOMKilled to happen after the pod has been running for some time (hours or days), not immediately after startup.

Using kubectl top

# Watch memory usage over time
watch -n 10 kubectl top pod <pod-name> -n <namespace>

If memory usage continuously grows without plateauing, you have a leak.

Using Prometheus

Query memory usage over time:

container_memory_usage_bytes{pod="<pod-name>", namespace="<namespace>"}

If the graph shows a steady upward trend without flattening, investigate the leak.

Using VPA Recommendations

The Vertical Pod Autoscaler analyses actual memory consumption and recommends appropriate limits:

# Install VPA (if not installed)
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml

# Create a VPA in recommendation mode
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
  namespace: <namespace>
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <deployment-name>
  updatePolicy:
    updateMode: "Off"  # Recommendation only, no auto-update
EOF

# Check recommendations after a few hours
kubectl describe vpa myapp-vpa -n <namespace>

Node-Level OOMKilled (No Memory Limit Set)

If your container does not have a memory limit, it can still be OOMKilled by the node’s kernel OOM killer when the node runs out of memory.

# Check if the node has memory pressure
kubectl describe node <node-name> | grep MemoryPressure

# Check kubelet eviction thresholds
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz | jq '.kubeletconfig.evictionHard'

# Check what is consuming memory on the node
kubectl top pods --all-namespaces --field-selector spec.nodeName=<node-name> --sort-by=memory

Fix: Always set memory requests and limits. Pods without limits are classified as BestEffort QoS and are the first to be evicted during memory pressure.

resources:
  requests:
    memory: "256Mi"   # Minimum guaranteed memory
  limits:
    memory: "512Mi"   # Maximum allowed memory

QoS classes and eviction priority:

QoS Class	When Applied	Eviction Priority
`Guaranteed`	requests == limits for all resources	Last (most protected)
`Burstable`	requests < limits	Middle
`BestEffort`	No requests or limits set	First (least protected)

For critical workloads, use Guaranteed QoS (set requests equal to limits) to minimise the chance of being evicted.

Prevention: Stop OOMKilled Before It Happens

Set Up Prometheus Alerts

groups:
  - name: oom-alerts
    rules:
      - alert: ContainerNearOOMLimit
        expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} in pod {{ $labels.pod }} is using {{ $value | humanizePercentage }} of its memory limit"

      - alert: ContainerOOMKilled
        expr: kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.container }} in pod {{ $labels.pod }} was OOMKilled"

For full Prometheus setup, see our Prometheus monitoring guide for Kubernetes.

Resource Right-Sizing Checklist

Monitor actual usage with kubectl top pod or Prometheus for at least 24 hours before setting limits
Set limits to 1.5-2x typical usage to handle spikes
Set requests to typical (P50) usage for accurate scheduling
Use VPA recommendations for data-driven sizing
Account for language-specific overhead (JVM metaspace, Node.js V8 overhead, Python object overhead)
Test under load — memory usage during peak traffic may be much higher than idle

Common Mistakes

Setting requests equal to limits unnecessarily — this wastes cluster capacity. Only do this for critical workloads that need Guaranteed QoS
Not accounting for JVM non-heap memory — the heap is only 50-70% of total JVM memory
Setting limits based on idle usage — always test under realistic load
Ignoring ephemeral storage — large temporary files count against the container’s memory if written to tmpfs-backed emptyDir volumes

For broader Kubernetes troubleshooting, see our production troubleshooting guide.

Stop OOMKilled From Disrupting Your Applications

OOMKilled errors cause unexpected restarts, data loss, and degraded user experience. Our engineers at Tasrie IT Services have resolved hundreds of OOMKilled cases and can help you right-size your workloads and build monitoring to catch memory issues before they crash your pods.

Our Kubernetes consulting services include:

Resource right-sizing with VPA setup and data-driven memory limit recommendations
Memory leak diagnosis with profiling and heap analysis for JVM, Node.js, Python, and Go applications
Monitoring and alerting with Prometheus alerts for memory usage trends and OOMKilled events

Get expert Kubernetes support →

Get a Kubernetes Production Readiness Audit

If OOMKilled errors are surfacing memory issues in your cluster, there may be deeper production-readiness gaps worth assessing. Our Kubernetes Production Readiness Audit delivers a senior CKA or CKS engineer review of your production EKS, AKS, or GKE cluster in 2 weeks for a fixed $495.

What you get:

Executive scorecard across security, reliability, scalability, observability, cost, and compliance
Detailed findings for all 47 checks with prioritized recommendations
90-day remediation roadmap mapped by effort and business risk
90-minute readout with your engineering and leadership teams
30 days of post-audit Slack Q&A

Book your Kubernetes audit for $495 →

How to Fix OOMKilled in Kubernetes: We Resolved 500+ Cases (2026 Guide)

Confirming OOMKilled

Quick Fix: Is the Memory Limit Just Too Low?

Fix for JVM Applications (Java, Kotlin, Scala)

Understanding JVM Memory

The Fix: Set Heap to 75% of Container Limit

Verify JVM Memory Usage

Fix for Node.js Applications

Fix for Python Applications

Fix for Go Applications

Detecting Memory Leaks

Using kubectl top

Using Prometheus

Using VPA Recommendations

Node-Level OOMKilled (No Memory Limit Set)

Prevention: Stop OOMKilled Before It Happens

Set Up Prometheus Alerts

Resource Right-Sizing Checklist

Common Mistakes

Stop OOMKilled From Disrupting Your Applications

Get a Kubernetes Production Readiness Audit

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Tasrie IT Support

Start a conversation

Confirming OOMKilled

Quick Fix: Is the Memory Limit Just Too Low?

Fix for JVM Applications (Java, Kotlin, Scala)

Understanding JVM Memory

The Fix: Set Heap to 75% of Container Limit

Verify JVM Memory Usage

Fix for Node.js Applications

Fix for Python Applications

Fix for Go Applications

Detecting Memory Leaks

Using kubectl top

Using Prometheus

Using VPA Recommendations

Node-Level OOMKilled (No Memory Limit Set)

Prevention: Stop OOMKilled Before It Happens

Set Up Prometheus Alerts

Resource Right-Sizing Checklist

Common Mistakes

Stop OOMKilled From Disrupting Your Applications

Get a Kubernetes Production Readiness Audit

Related Articles

Kubernetes Consulting Cost 2026: Real Rates From 100+ Quotes

Docker Compose vs Kubernetes: 17 Workloads We Moved Back

Kubernetes Consulting UK: We Audit 50+ Clusters

How to Fix CrashLoopBackOff in Kubernetes: The Complete Debugging Playbook (2026)

How to Fix CrashLoopBackOff Kubernetes Error: Exit Code Debugging Guide (2026)

Need Kubernetes expertise?

Don't Miss Out on Expert DevOps Insights

Get Started

You're In!

Tasrie IT Support

Start a conversation