FinOps for Kubernetes: We Reduced Cluster Costs 45% (Playbook)

Kubernetes clusters are expensive because they are easy to over-provision and hard to monitor at the resource level. The abstraction that makes Kubernetes powerful — pods, deployments, namespaces — also hides the cost of every decision.

We audit Kubernetes clusters as part of our cost optimisation engagements. The average cluster we review wastes 40-60% of its compute budget. This playbook covers exactly how we find and fix that waste.

Why Kubernetes Costs Spiral

Three things drive Kubernetes cost overruns:

1. Over-requested resources. Developers set CPU and memory requests based on worst-case assumptions. A pod requesting 2 CPU cores and 4GB memory that actually uses 0.3 cores and 800MB wastes 85% of its allocated resources. Those wasted resources still cost money because they reserve node capacity.

2. Cluster autoscaler adds nodes, rarely removes them. When pods request more resources, the autoscaler adds nodes. When load drops, pods still hold their requests, so nodes stay. Clusters grow but rarely shrink.

3. No cost visibility per team/service. Without namespace-level cost allocation, nobody owns the cost of their workloads. Teams have no incentive to optimise because they do not see the bill.

The 7-Step Optimisation Playbook

Step 1: Measure Actual Resource Usage (Day 1)

Before changing anything, understand what your pods actually consume vs what they request.

# Get resource requests vs actual usage for all pods
kubectl top pods --all-namespaces --sort-by=cpu

# Compare requests vs usage for a specific namespace
kubectl get pods -n production -o json | \
  jq -r '.items[] | .metadata.name + " CPU-req:" +
  (.spec.containers[0].resources.requests.cpu // "none") +
  " Mem-req:" + (.spec.containers[0].resources.requests.memory // "none")'

For proper analysis, install Prometheus and query historical utilisation:

# Average CPU usage vs requests over 7 days
avg_over_time(
  rate(container_cpu_usage_seconds_total{namespace="production"}[5m])[7d:1h]
)
/
avg_over_time(
  kube_pod_container_resource_requests{resource="cpu", namespace="production"}[7d:1h]
)

What we typically find:

60-80% of pods use less than 30% of requested CPU
50-70% of pods use less than 40% of requested memory
10-20% of pods have no resource requests at all (unbounded, dangerous)

Step 2: Right-Size Pod Requests (20-40% savings)

This is the single biggest lever. Reducing resource requests frees node capacity, which allows the cluster autoscaler to remove nodes.

Before right-sizing:

resources:
  requests:
    cpu: "2"
    memory: 4Gi
  limits:
    memory: 4Gi

After right-sizing (based on 14 days of metrics):

resources:
  requests:
    cpu: 400m        # was 2000m — pod averages 300m, peaks at 600m
    memory: 1Gi      # was 4Gi — pod averages 700Mi, peaks at 900Mi
  limits:
    memory: 1.5Gi    # headroom for spikes

Automate with VPA:

The Vertical Pod Autoscaler can recommend or automatically adjust resource requests based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Off"  # Start with recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 256Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Set updateMode: "Off" initially to get recommendations without automatic changes. Review the suggestions, then switch to "Auto" once you trust the recommendations.

Real result: A production cluster with 120 pods had total CPU requests of 180 cores. After right-sizing based on 14-day metrics, total requests dropped to 75 cores — a 58% reduction. The cluster autoscaler removed 8 nodes, saving $3,200/month.

Step 3: Optimise Node Types (15-40% savings)

Not all workloads need the same instance type. Match node pools to workload characteristics:

Workload Type	Recommended Instance	Why
General web services	`m7g.large` (Graviton)	Best price-performance ratio
Memory-intensive (caches, JVM)	`r7g.large` (Graviton)	Optimised memory-to-CPU ratio
CPU-intensive (builds, processing)	`c7g.large` (Graviton)	Optimised CPU-to-memory ratio
Burstable (low-traffic services)	`t3.medium`	Cheap baseline with burst
CI/CD and batch	Spot instances	60-90% cheaper, interruption-tolerant

Use Karpenter to automatically select optimal instance types:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general
spec:
  template:
    spec:
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values: ["arm64"]        # Graviton by default
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - m7g.medium
        - m7g.large
        - m7g.xlarge
        - c7g.large
        - r7g.large
  limits:
    cpu: 200
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Karpenter’s consolidation feature is critical — it actively moves pods to fewer nodes when utilisation drops, unlike Cluster Autoscaler which only removes fully empty nodes.

Step 4: Implement Namespace Cost Allocation (Governance)

Without cost visibility per team, nobody optimises. Set up namespace-level cost tracking:

Using Kubecost (free tier):

helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

Kubecost provides per-namespace, per-deployment, and per-pod cost breakdowns. Share these reports with each team monthly.

Using Prometheus + custom dashboards:

# Monthly cost estimate per namespace (simplified)
sum by (namespace) (
  kube_pod_container_resource_requests{resource="cpu"} * 0.0425  # cost per CPU-hour
  +
  kube_pod_container_resource_requests{resource="memory"} / 1073741824 * 0.005  # cost per GB-hour
) * 730  # hours per month

Real result: After deploying Kubecost and sharing namespace costs with team leads, one client saw teams voluntarily reduce their resource requests by 25% within two months — no enforcement needed, just visibility.

Step 5: Scale Down Non-Production Clusters (40-70% savings)

Development and staging clusters run 24/7 but are used during business hours only. Options:

Option A: Scale nodes to zero outside hours

# CronJob to scale down at 7 PM
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down
spec:
  schedule: "0 19 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment --all --replicas=0 -n dev
              kubectl scale deployment --all --replicas=0 -n staging
          restartPolicy: OnFailure

Option B: Use Karpenter with aggressive consolidation

Set consolidateAfter: 0s on non-production node pools so idle nodes are removed immediately.

Option C: Use virtual clusters (vcluster)

Run multiple lightweight virtual clusters on a single physical cluster. Dev teams get their own “cluster” without the cost of dedicated nodes.

Step 6: Use HPA for Variable Workloads (10-30% savings)

Instead of running enough replicas for peak load 24/7, scale based on actual demand with HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Key settings:

averageUtilization: 70 — scale up when average CPU exceeds 70%
stabilizationWindowSeconds: 300 — wait 5 minutes before scaling down to avoid flapping
Scale down by max 25% per minute — gradual reduction prevents thrashing

Step 7: Clean Up Abandoned Resources (Quick wins)

Every cluster has orphaned resources that cost money:

# Find PVCs not mounted by any pod
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase=="Bound") |
  .metadata.namespace + "/" + .metadata.name'

# Find deployments scaled to 0 for over 30 days
kubectl get deployments --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.replicas==0) |
  .metadata.namespace + "/" + .metadata.name'

# Find services with no endpoints
kubectl get endpoints --all-namespaces -o json | \
  jq -r '.items[] | select(.subsets==null or .subsets==[]) |
  .metadata.namespace + "/" + .metadata.name'

Optimisation Results Summary

From a recent cluster audit (140-node EKS cluster, $42,000/month):

Strategy	Before	After	Monthly Savings
Pod right-sizing	240 cores requested	95 cores	$6,200
Graviton migration	x86 nodes	arm64 (Graviton3)	$4,800
Spot for non-critical	All on-demand	40% spot mix	$3,600
Non-prod scheduling	24/7 dev + staging	Business hours only	$2,800
HPA for web tier	15 replicas fixed	3-15 dynamic	$1,400
Orphan cleanup	12 unused PVCs, idle LBs	Removed	$800
Total	$42,000/mo	$22,400/mo	$19,600 (47%)

Annualised savings: $235,200.

Tools Comparison

Tool	Best For	Cost	K8s Native
Kubecost	Namespace cost allocation	Free tier	Yes
Prometheus + Grafana	Custom metrics and dashboards	Free	Yes
AWS Cost Explorer	Account-level analysis	Free	No
Cast AI	Automated optimisation	Paid	Yes
Finout	Multi-cloud cost tracking	Paid	Yes
VPA	Automated right-sizing	Free	Yes
Karpenter	Node optimisation	Free	Yes

We start with free tools (Kubecost free tier, Prometheus, VPA, Karpenter) and only recommend paid platforms for multi-cluster enterprises where the cost of the tool is justified by the scale of savings.

Want a Kubernetes Cost Audit?

We audit Kubernetes clusters and typically find 40-60% waste. Every engagement includes a detailed savings report with prioritised recommendations and implementation support.

Our Kubernetes cost optimisation services include:

Cluster cost audit — identify waste across pods, nodes, storage, and networking
Right-sizing implementation — VPA setup and manual optimisation
Node optimisation — Graviton migration, Karpenter setup, spot integration
Cost governance — Kubecost deployment, namespace-level reporting, budget alerts
Ongoing management — monthly reviews and continuous optimisation

We also help teams set up EKS, AKS, and GKE clusters with cost optimisation built in from day one.

Get a free Kubernetes cost audit →

FinOps for Kubernetes: We Reduced Cluster Costs 45% (Playbook)

Why Kubernetes Costs Spiral

The 7-Step Optimisation Playbook

Step 1: Measure Actual Resource Usage (Day 1)

Step 2: Right-Size Pod Requests (20-40% savings)

Step 3: Optimise Node Types (15-40% savings)

Step 4: Implement Namespace Cost Allocation (Governance)

Step 5: Scale Down Non-Production Clusters (40-70% savings)

Step 6: Use HPA for Variable Workloads (10-30% savings)

Step 7: Clean Up Abandoned Resources (Quick wins)

Optimisation Results Summary

Tools Comparison

Want a Kubernetes Cost Audit?

Resize Kubernetes Pods Without a Restart: 1.35 Is GA

Cluster Autoscaler vs Auto Scaling Group: We've Configured Both on EKS

Cluster Autoscaler vs Horizontal Pod Autoscaler: We Run Both in Production

Cluster Autoscaler vs HPA vs VPA: We Tested All 3 (Here's What Works)

Cluster Autoscaler vs Karpenter: We Tested Both on 100+ Clusters

Kubernetes costs out of control?

Tasrie IT Support

Start a conversation

Why Kubernetes Costs Spiral

The 7-Step Optimisation Playbook

Step 1: Measure Actual Resource Usage (Day 1)

Step 2: Right-Size Pod Requests (20-40% savings)

Step 3: Optimise Node Types (15-40% savings)

Step 4: Implement Namespace Cost Allocation (Governance)

Step 5: Scale Down Non-Production Clusters (40-70% savings)

Step 6: Use HPA for Variable Workloads (10-30% savings)

Step 7: Clean Up Abandoned Resources (Quick wins)

Optimisation Results Summary

Tools Comparison

Want a Kubernetes Cost Audit?

Related Articles

Resize Kubernetes Pods Without a Restart: 1.35 Is GA

Cluster Autoscaler vs Auto Scaling Group: We've Configured Both on EKS

Cluster Autoscaler vs Horizontal Pod Autoscaler: We Run Both in Production

Cluster Autoscaler vs HPA vs VPA: We Tested All 3 (Here's What Works)

Cluster Autoscaler vs Karpenter: We Tested Both on 100+ Clusters

Kubernetes costs out of control?

One Production Insight a Week

What you'll get

Subscribe to weekly insights

You're subscribed.

Tasrie IT Support

Start a conversation