Kubernetes clusters are expensive because they are easy to over-provision and hard to monitor at the resource level. The abstraction that makes Kubernetes powerful — pods, deployments, namespaces — also hides the cost of every decision.
We audit Kubernetes clusters as part of our cost optimisation engagements. The average cluster we review wastes 40-60% of its compute budget. This playbook covers exactly how we find and fix that waste.
Why Kubernetes Costs Spiral
Three things drive Kubernetes cost overruns:
1. Over-requested resources. Developers set CPU and memory requests based on worst-case assumptions. A pod requesting 2 CPU cores and 4GB memory that actually uses 0.3 cores and 800MB wastes 85% of its allocated resources. Those wasted resources still cost money because they reserve node capacity.
2. Cluster autoscaler adds nodes, rarely removes them. When pods request more resources, the autoscaler adds nodes. When load drops, pods still hold their requests, so nodes stay. Clusters grow but rarely shrink.
3. No cost visibility per team/service. Without namespace-level cost allocation, nobody owns the cost of their workloads. Teams have no incentive to optimise because they do not see the bill.
The 7-Step Optimisation Playbook
Step 1: Measure Actual Resource Usage (Day 1)
Before changing anything, understand what your pods actually consume vs what they request.
# Get resource requests vs actual usage for all pods
kubectl top pods --all-namespaces --sort-by=cpu
# Compare requests vs usage for a specific namespace
kubectl get pods -n production -o json | \
jq -r '.items[] | .metadata.name + " CPU-req:" +
(.spec.containers[0].resources.requests.cpu // "none") +
" Mem-req:" + (.spec.containers[0].resources.requests.memory // "none")'
For proper analysis, install Prometheus and query historical utilisation:
# Average CPU usage vs requests over 7 days
avg_over_time(
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])[7d:1h]
)
/
avg_over_time(
kube_pod_container_resource_requests{resource="cpu", namespace="production"}[7d:1h]
)
What we typically find:
- 60-80% of pods use less than 30% of requested CPU
- 50-70% of pods use less than 40% of requested memory
- 10-20% of pods have no resource requests at all (unbounded, dangerous)
Step 2: Right-Size Pod Requests (20-40% savings)
This is the single biggest lever. Reducing resource requests frees node capacity, which allows the cluster autoscaler to remove nodes.
Before right-sizing:
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
memory: 4Gi
After right-sizing (based on 14 days of metrics):
resources:
requests:
cpu: 400m # was 2000m — pod averages 300m, peaks at 600m
memory: 1Gi # was 4Gi — pod averages 700Mi, peaks at 900Mi
limits:
memory: 1.5Gi # headroom for spikes
Automate with VPA:
The Vertical Pod Autoscaler can recommend or automatically adjust resource requests based on actual usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web
updatePolicy:
updateMode: "Off" # Start with recommendations only
resourcePolicy:
containerPolicies:
- containerName: web
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 2
memory: 4Gi
Set updateMode: "Off" initially to get recommendations without automatic changes. Review the suggestions, then switch to "Auto" once you trust the recommendations.
Real result: A production cluster with 120 pods had total CPU requests of 180 cores. After right-sizing based on 14-day metrics, total requests dropped to 75 cores — a 58% reduction. The cluster autoscaler removed 8 nodes, saving $3,200/month.
Step 3: Optimise Node Types (15-40% savings)
Not all workloads need the same instance type. Match node pools to workload characteristics:
| Workload Type | Recommended Instance | Why |
|---|---|---|
| General web services | m7g.large (Graviton) | Best price-performance ratio |
| Memory-intensive (caches, JVM) | r7g.large (Graviton) | Optimised memory-to-CPU ratio |
| CPU-intensive (builds, processing) | c7g.large (Graviton) | Optimised CPU-to-memory ratio |
| Burstable (low-traffic services) | t3.medium | Cheap baseline with burst |
| CI/CD and batch | Spot instances | 60-90% cheaper, interruption-tolerant |
Use Karpenter to automatically select optimal instance types:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"] # Graviton by default
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m7g.medium
- m7g.large
- m7g.xlarge
- c7g.large
- r7g.large
limits:
cpu: 200
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
Karpenter’s consolidation feature is critical — it actively moves pods to fewer nodes when utilisation drops, unlike Cluster Autoscaler which only removes fully empty nodes.
Step 4: Implement Namespace Cost Allocation (Governance)
Without cost visibility per team, nobody optimises. Set up namespace-level cost tracking:
Using Kubecost (free tier):
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token"
Kubecost provides per-namespace, per-deployment, and per-pod cost breakdowns. Share these reports with each team monthly.
Using Prometheus + custom dashboards:
# Monthly cost estimate per namespace (simplified)
sum by (namespace) (
kube_pod_container_resource_requests{resource="cpu"} * 0.0425 # cost per CPU-hour
+
kube_pod_container_resource_requests{resource="memory"} / 1073741824 * 0.005 # cost per GB-hour
) * 730 # hours per month
Real result: After deploying Kubecost and sharing namespace costs with team leads, one client saw teams voluntarily reduce their resource requests by 25% within two months — no enforcement needed, just visibility.
Step 5: Scale Down Non-Production Clusters (40-70% savings)
Development and staging clusters run 24/7 but are used during business hours only. Options:
Option A: Scale nodes to zero outside hours
# CronJob to scale down at 7 PM
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down
spec:
schedule: "0 19 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl
image: bitnami/kubectl
command:
- /bin/sh
- -c
- |
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging
restartPolicy: OnFailure
Option B: Use Karpenter with aggressive consolidation
Set consolidateAfter: 0s on non-production node pools so idle nodes are removed immediately.
Option C: Use virtual clusters (vcluster)
Run multiple lightweight virtual clusters on a single physical cluster. Dev teams get their own “cluster” without the cost of dedicated nodes.
Step 6: Use HPA for Variable Workloads (10-30% savings)
Instead of running enough replicas for peak load 24/7, scale based on actual demand with HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
Key settings:
averageUtilization: 70— scale up when average CPU exceeds 70%stabilizationWindowSeconds: 300— wait 5 minutes before scaling down to avoid flapping- Scale down by max 25% per minute — gradual reduction prevents thrashing
Step 7: Clean Up Abandoned Resources (Quick wins)
Every cluster has orphaned resources that cost money:
# Find PVCs not mounted by any pod
kubectl get pvc --all-namespaces -o json | \
jq -r '.items[] | select(.status.phase=="Bound") |
.metadata.namespace + "/" + .metadata.name'
# Find deployments scaled to 0 for over 30 days
kubectl get deployments --all-namespaces -o json | \
jq -r '.items[] | select(.spec.replicas==0) |
.metadata.namespace + "/" + .metadata.name'
# Find services with no endpoints
kubectl get endpoints --all-namespaces -o json | \
jq -r '.items[] | select(.subsets==null or .subsets==[]) |
.metadata.namespace + "/" + .metadata.name'
Optimisation Results Summary
From a recent cluster audit (140-node EKS cluster, $42,000/month):
| Strategy | Before | After | Monthly Savings |
|---|---|---|---|
| Pod right-sizing | 240 cores requested | 95 cores | $6,200 |
| Graviton migration | x86 nodes | arm64 (Graviton3) | $4,800 |
| Spot for non-critical | All on-demand | 40% spot mix | $3,600 |
| Non-prod scheduling | 24/7 dev + staging | Business hours only | $2,800 |
| HPA for web tier | 15 replicas fixed | 3-15 dynamic | $1,400 |
| Orphan cleanup | 12 unused PVCs, idle LBs | Removed | $800 |
| Total | $42,000/mo | $22,400/mo | $19,600 (47%) |
Annualised savings: $235,200.
Tools Comparison
| Tool | Best For | Cost | K8s Native |
|---|---|---|---|
| Kubecost | Namespace cost allocation | Free tier | Yes |
| Prometheus + Grafana | Custom metrics and dashboards | Free | Yes |
| AWS Cost Explorer | Account-level analysis | Free | No |
| Cast AI | Automated optimisation | Paid | Yes |
| Finout | Multi-cloud cost tracking | Paid | Yes |
| VPA | Automated right-sizing | Free | Yes |
| Karpenter | Node optimisation | Free | Yes |
We start with free tools (Kubecost free tier, Prometheus, VPA, Karpenter) and only recommend paid platforms for multi-cluster enterprises where the cost of the tool is justified by the scale of savings.
Want a Kubernetes Cost Audit?
We audit Kubernetes clusters and typically find 40-60% waste. Every engagement includes a detailed savings report with prioritised recommendations and implementation support.
Our Kubernetes cost optimisation services include:
- Cluster cost audit — identify waste across pods, nodes, storage, and networking
- Right-sizing implementation — VPA setup and manual optimisation
- Node optimisation — Graviton migration, Karpenter setup, spot integration
- Cost governance — Kubecost deployment, namespace-level reporting, budget alerts
- Ongoing management — monthly reviews and continuous optimisation
We also help teams set up EKS, AKS, and GKE clusters with cost optimisation built in from day one.