~/blog/finops-kubernetes-cluster-cost-optimization-2026
zsh
KUBERNETES

FinOps for Kubernetes: We Reduced Cluster Costs 45% (Playbook)

Engineering Team 2026-03-19

Kubernetes clusters are expensive because they are easy to over-provision and hard to monitor at the resource level. The abstraction that makes Kubernetes powerful — pods, deployments, namespaces — also hides the cost of every decision.

We audit Kubernetes clusters as part of our cost optimisation engagements. The average cluster we review wastes 40-60% of its compute budget. This playbook covers exactly how we find and fix that waste.

Why Kubernetes Costs Spiral

Three things drive Kubernetes cost overruns:

1. Over-requested resources. Developers set CPU and memory requests based on worst-case assumptions. A pod requesting 2 CPU cores and 4GB memory that actually uses 0.3 cores and 800MB wastes 85% of its allocated resources. Those wasted resources still cost money because they reserve node capacity.

2. Cluster autoscaler adds nodes, rarely removes them. When pods request more resources, the autoscaler adds nodes. When load drops, pods still hold their requests, so nodes stay. Clusters grow but rarely shrink.

3. No cost visibility per team/service. Without namespace-level cost allocation, nobody owns the cost of their workloads. Teams have no incentive to optimise because they do not see the bill.

The 7-Step Optimisation Playbook

Step 1: Measure Actual Resource Usage (Day 1)

Before changing anything, understand what your pods actually consume vs what they request.

# Get resource requests vs actual usage for all pods
kubectl top pods --all-namespaces --sort-by=cpu

# Compare requests vs usage for a specific namespace
kubectl get pods -n production -o json | \
  jq -r '.items[] | .metadata.name + " CPU-req:" +
  (.spec.containers[0].resources.requests.cpu // "none") +
  " Mem-req:" + (.spec.containers[0].resources.requests.memory // "none")'

For proper analysis, install Prometheus and query historical utilisation:

# Average CPU usage vs requests over 7 days
avg_over_time(
  rate(container_cpu_usage_seconds_total{namespace="production"}[5m])[7d:1h]
)
/
avg_over_time(
  kube_pod_container_resource_requests{resource="cpu", namespace="production"}[7d:1h]
)

What we typically find:

  • 60-80% of pods use less than 30% of requested CPU
  • 50-70% of pods use less than 40% of requested memory
  • 10-20% of pods have no resource requests at all (unbounded, dangerous)

Step 2: Right-Size Pod Requests (20-40% savings)

This is the single biggest lever. Reducing resource requests frees node capacity, which allows the cluster autoscaler to remove nodes.

Before right-sizing:

resources:
  requests:
    cpu: "2"
    memory: 4Gi
  limits:
    memory: 4Gi

After right-sizing (based on 14 days of metrics):

resources:
  requests:
    cpu: 400m        # was 2000m — pod averages 300m, peaks at 600m
    memory: 1Gi      # was 4Gi — pod averages 700Mi, peaks at 900Mi
  limits:
    memory: 1.5Gi    # headroom for spikes

Automate with VPA:

The Vertical Pod Autoscaler can recommend or automatically adjust resource requests based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Off"  # Start with recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 256Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Set updateMode: "Off" initially to get recommendations without automatic changes. Review the suggestions, then switch to "Auto" once you trust the recommendations.

Real result: A production cluster with 120 pods had total CPU requests of 180 cores. After right-sizing based on 14-day metrics, total requests dropped to 75 cores — a 58% reduction. The cluster autoscaler removed 8 nodes, saving $3,200/month.

Step 3: Optimise Node Types (15-40% savings)

Not all workloads need the same instance type. Match node pools to workload characteristics:

Workload TypeRecommended InstanceWhy
General web servicesm7g.large (Graviton)Best price-performance ratio
Memory-intensive (caches, JVM)r7g.large (Graviton)Optimised memory-to-CPU ratio
CPU-intensive (builds, processing)c7g.large (Graviton)Optimised CPU-to-memory ratio
Burstable (low-traffic services)t3.mediumCheap baseline with burst
CI/CD and batchSpot instances60-90% cheaper, interruption-tolerant

Use Karpenter to automatically select optimal instance types:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general
spec:
  template:
    spec:
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values: ["arm64"]        # Graviton by default
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - m7g.medium
        - m7g.large
        - m7g.xlarge
        - c7g.large
        - r7g.large
  limits:
    cpu: 200
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Karpenter’s consolidation feature is critical — it actively moves pods to fewer nodes when utilisation drops, unlike Cluster Autoscaler which only removes fully empty nodes.

Step 4: Implement Namespace Cost Allocation (Governance)

Without cost visibility per team, nobody optimises. Set up namespace-level cost tracking:

Using Kubecost (free tier):

helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

Kubecost provides per-namespace, per-deployment, and per-pod cost breakdowns. Share these reports with each team monthly.

Using Prometheus + custom dashboards:

# Monthly cost estimate per namespace (simplified)
sum by (namespace) (
  kube_pod_container_resource_requests{resource="cpu"} * 0.0425  # cost per CPU-hour
  +
  kube_pod_container_resource_requests{resource="memory"} / 1073741824 * 0.005  # cost per GB-hour
) * 730  # hours per month

Real result: After deploying Kubecost and sharing namespace costs with team leads, one client saw teams voluntarily reduce their resource requests by 25% within two months — no enforcement needed, just visibility.

Step 5: Scale Down Non-Production Clusters (40-70% savings)

Development and staging clusters run 24/7 but are used during business hours only. Options:

Option A: Scale nodes to zero outside hours

# CronJob to scale down at 7 PM
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down
spec:
  schedule: "0 19 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n dev
              kubectl scale deployment --all --replicas=0 -n staging
          restartPolicy: OnFailure

Option B: Use Karpenter with aggressive consolidation

Set consolidateAfter: 0s on non-production node pools so idle nodes are removed immediately.

Option C: Use virtual clusters (vcluster)

Run multiple lightweight virtual clusters on a single physical cluster. Dev teams get their own “cluster” without the cost of dedicated nodes.

Step 6: Use HPA for Variable Workloads (10-30% savings)

Instead of running enough replicas for peak load 24/7, scale based on actual demand with HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Key settings:

  • averageUtilization: 70 — scale up when average CPU exceeds 70%
  • stabilizationWindowSeconds: 300 — wait 5 minutes before scaling down to avoid flapping
  • Scale down by max 25% per minute — gradual reduction prevents thrashing

Step 7: Clean Up Abandoned Resources (Quick wins)

Every cluster has orphaned resources that cost money:

# Find PVCs not mounted by any pod
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase=="Bound") |
  .metadata.namespace + "/" + .metadata.name'

# Find deployments scaled to 0 for over 30 days
kubectl get deployments --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.replicas==0) |
  .metadata.namespace + "/" + .metadata.name'

# Find services with no endpoints
kubectl get endpoints --all-namespaces -o json | \
  jq -r '.items[] | select(.subsets==null or .subsets==[]) |
  .metadata.namespace + "/" + .metadata.name'

Optimisation Results Summary

From a recent cluster audit (140-node EKS cluster, $42,000/month):

StrategyBeforeAfterMonthly Savings
Pod right-sizing240 cores requested95 cores$6,200
Graviton migrationx86 nodesarm64 (Graviton3)$4,800
Spot for non-criticalAll on-demand40% spot mix$3,600
Non-prod scheduling24/7 dev + stagingBusiness hours only$2,800
HPA for web tier15 replicas fixed3-15 dynamic$1,400
Orphan cleanup12 unused PVCs, idle LBsRemoved$800
Total$42,000/mo$22,400/mo$19,600 (47%)

Annualised savings: $235,200.

Tools Comparison

ToolBest ForCostK8s Native
KubecostNamespace cost allocationFree tierYes
Prometheus + GrafanaCustom metrics and dashboardsFreeYes
AWS Cost ExplorerAccount-level analysisFreeNo
Cast AIAutomated optimisationPaidYes
FinoutMulti-cloud cost trackingPaidYes
VPAAutomated right-sizingFreeYes
KarpenterNode optimisationFreeYes

We start with free tools (Kubecost free tier, Prometheus, VPA, Karpenter) and only recommend paid platforms for multi-cluster enterprises where the cost of the tool is justified by the scale of savings.


Want a Kubernetes Cost Audit?

We audit Kubernetes clusters and typically find 40-60% waste. Every engagement includes a detailed savings report with prioritised recommendations and implementation support.

Our Kubernetes cost optimisation services include:

  • Cluster cost audit — identify waste across pods, nodes, storage, and networking
  • Right-sizing implementation — VPA setup and manual optimisation
  • Node optimisation — Graviton migration, Karpenter setup, spot integration
  • Cost governance — Kubecost deployment, namespace-level reporting, budget alerts
  • Ongoing management — monthly reviews and continuous optimisation

We also help teams set up EKS, AKS, and GKE clusters with cost optimisation built in from day one.

Get a free Kubernetes cost audit →

Continue exploring these related topics

$ suggest --service

Kubernetes costs out of control?

We help teams cut Kubernetes spend by 40-60% without sacrificing performance.

Get started
Chat with real humans
Chat on WhatsApp