Kubernetes Cost Optimization: Complete Guide to Reducing Cloud Spend in 2026

Kubernetes costs can spiral out of control without proper optimization. Over-provisioned clusters, inefficient resource allocation, and lack of visibility lead to wasted cloud spend. This comprehensive guide shares proven strategies that have helped our clients reduce Kubernetes costs by 40-60% while maintaining performance and reliability.

Understanding Kubernetes Cost Drivers
Resource Right-Sizing Strategies
Autoscaling for Cost Efficiency
Spot and Preemptible Instances
Storage Optimization
Network Cost Reduction
Multi-Tenancy and Resource Sharing
Cost Visibility and Monitoring
FinOps Best Practices
Cost Optimization Tools

Understanding Kubernetes Cost Drivers

Primary Cost Components

Compute costs (60-75% of total):

Node instances (EC2, GCE, Azure VMs)
CPU and memory resources
Reserved vs on-demand vs spot pricing
Idle capacity and over-provisioning

Storage costs (15-25% of total):

Persistent volumes (EBS, Persistent Disks, Azure Disks)
Block storage vs object storage
Snapshot and backup storage
Storage I/O operations

Network costs (10-20% of total):

Data transfer between availability zones
Cross-region traffic
Internet egress charges
Load balancer costs

Additional costs:

Control plane fees (managed Kubernetes)
Container registries
Logging and monitoring infrastructure
Load balancers and ingress controllers

Real-World Cost Challenges

Our e-commerce Kubernetes migration project achieved 58% infrastructure cost reduction. Common cost issues we encountered:

Over-provisioning waste:

Pods requesting 2GB memory but using 300MB (85% waste)
CPU requests set to 1000m, actual usage 150m (85% waste)
Clusters running at 35% utilization paying for 100%
Development environments sized identically to production

Inefficient configurations:

Running expensive instance types without autoscaling
No use of spot instances for fault-tolerant workloads
Persistent volumes never resized or cleaned up
Multiple clusters when multi-tenancy would suffice

Lack of visibility:

No cost allocation by team, product, or environment
Unknown costs per application or customer
No budget alerts or governance
Developers unaware of resource costs

Resource Right-Sizing Strategies

Analyzing Current Resource Usage

Use Vertical Pod Autoscaler (VPA) in recommendation mode:

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Create VPA in recommendation mode
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updateMode: "Off"  # Recommendation only
EOF

# Get recommendations
kubectl describe vpa app-vpa

Use Prometheus metrics for historical analysis:

# CPU usage over time
avg(rate(container_cpu_usage_seconds_total{container="app"}[5m])) by (pod)

# Memory usage over time
avg(container_memory_working_set_bytes{container="app"}) by (pod)

# Request vs actual usage (identify over-provisioning)
(
  avg(container_memory_working_set_bytes{container="app"}) by (pod)
  /
  avg(kube_pod_container_resource_requests{resource="memory",container="app"}) by (pod)
) * 100

Implementing Right-Sizing

Before optimization (over-provisioned):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 10
  template:
    spec:
      containers:
      - name: api
        image: myapi:1.0
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"

After optimization (right-sized based on actual usage):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 6
  template:
    spec:
      containers:
      - name: api
        image: myapi:1.0
        resources:
          requests:
            cpu: "200m"      # Reduced from 1000m
            memory: "512Mi"  # Reduced from 2Gi
          limits:
            cpu: "500m"      # Reduced from 2000m
            memory: "1Gi"    # Reduced from 4Gi

Impact:

CPU requests: 10,000m → 1,200m (88% reduction)
Memory requests: 20Gi → 3Gi (85% reduction)
Monthly cost: $2,400 → $360 (85% reduction)
Same performance maintained

Setting Appropriate Limits

Anti-pattern: No limits set

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  # No limits - can consume entire node resources

Anti-pattern: Limits == Requests

resources:
  requests:
    cpu: "1000m"
    memory: "1Gi"
  limits:
    cpu: "1000m"
    memory: "1Gi"
  # Prevents bursting, over-provisions

Best practice: Appropriate limit-to-request ratio

resources:
  requests:
    cpu: "200m"      # Guaranteed minimum
    memory: "256Mi"
  limits:
    cpu: "500m"      # Allow 2.5x burst
    memory: "512Mi"  # Allow 2x burst

LimitRanges and ResourceQuotas

Enforce default limits per namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
  - default:  # Default limits
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:  # Default requests
      cpu: "100m"
      memory: "128Mi"
    max:  # Maximum allowed
      cpu: "2000m"
      memory: "4Gi"
    min:  # Minimum required
      cpu: "50m"
      memory: "64Mi"
    type: Container

Enforce namespace-level quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "50"
    requests.memory: "100Gi"
    limits.cpu: "100"
    limits.memory: "200Gi"
    persistentvolumeclaims: "20"
    services.loadbalancers: "2"

Autoscaling for Cost Efficiency

Horizontal Pod Autoscaler (HPA)

CPU-based autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Target 70% CPU
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scale down
      policies:
      - type: Percent
        value: 50  # Scale down max 50% of pods at once
        periodSeconds: 60

Multi-metric autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa-advanced
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # 1000 req/s per pod

Cluster Autoscaler

AWS EKS configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
  namespace: kube-system
data:
  config.yaml: |
    ---
    cordon-node-before-terminating: true
    scale-down-enabled: true
    scale-down-delay-after-add: 10m
    scale-down-unneeded-time: 10m
    scale-down-utilization-threshold: 0.5  # Scale down when < 50% utilized
    skip-nodes-with-local-storage: false
    skip-nodes-with-system-pods: false

Node group configuration for optimal scaling:

# AWS Auto Scaling Group tags
--tags=k8s.io/cluster-autoscaler/enabled=true
--tags=k8s.io/cluster-autoscaler/my-cluster=owned
--tags=k8s.io/cluster-autoscaler/node-template/label/workload=general

# Multiple node groups for different workloads
Node Group 1 (general):
  Instance types: t3.medium, t3.large
  Min: 2, Max: 20

Node Group 2 (compute-intensive):
  Instance types: c6i.xlarge, c6i.2xlarge
  Min: 0, Max: 10

Node Group 3 (memory-intensive):
  Instance types: r6i.xlarge, r6i.2xlarge
  Min: 0, Max: 10

KEDA for Event-Driven Autoscaling

Scale based on external metrics:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-processor-scaler
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0  # Scale to zero when idle
  maxReplicaCount: 50
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/my-queue
      queueLength: "5"  # Target 5 messages per pod
      awsRegion: "us-east-1"

Scale to zero for dev environments:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-api-scaler
  namespace: development
spec:
  scaleTargetRef:
    name: api
  minReplicaCount: 0  # Scale to zero outside business hours
  maxReplicaCount: 5
  triggers:
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 9 * * 1-5    # Scale up at 9 AM weekdays
      end: 0 18 * * 1-5     # Scale down at 6 PM weekdays
      desiredReplicas: "3"

Spot and Preemptible Instances

AWS Spot Instances for EKS

Cost savings: 60-90% vs on-demand

Node group configuration:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-east-1

nodeGroups:
  - name: spot-general
    instancesDistribution:
      instanceTypes:
        - t3.large
        - t3a.large
        - t2.large
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 3
    minSize: 2
    maxSize: 20
    labels:
      workload: general
      capacity-type: spot
    tags:
      k8s.io/cluster-autoscaler/node-template/label/capacity-type: spot

Workload scheduling on spot nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  template:
    spec:
      nodeSelector:
        capacity-type: spot  # Schedule on spot instances
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: processor
        image: batch-processor:1.0

GKE Preemptible VMs

Cost savings: 70-80% vs standard VMs

Our travel platform scaling case study used preemptible VMs to achieve 42% cost reduction.

Node pool configuration:

gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --preemptible \
  --num-nodes=3 \
  --enable-autoscaling \
  --min-nodes=0 \
  --max-nodes=20 \
  --machine-type=n1-standard-4 \
  --disk-size=100

Handling Spot/Preemptible Interruptions

Pod Disruption Budgets (PDB):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # Always keep 2 pods running
  selector:
    matchLabels:
      app: api

Node termination handlers:

# AWS Node Termination Handler
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-node-termination-handler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: aws-node-termination-handler
        image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ENABLE_SPOT_INTERRUPTION_DRAINING
          value: "true"
        - name: ENABLE_SCHEDULED_EVENT_DRAINING
          value: "true"

Storage Optimization

PersistentVolume Right-Sizing

Identify unused volumes:

# Find PVCs with low utilization
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase == "Bound") |
  "\(.metadata.namespace)/\(.metadata.name) - \(.spec.resources.requests.storage)"'

# Check actual volume usage (requires metrics-server)
kubectl top pv

Resize volumes dynamically:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi  # Reduced from 500Gi based on actual usage

Storage Class Optimization

AWS EBS storage classes:

# Expensive io2 (high IOPS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iopsPerGB: "50"
  fsType: ext4
# Cost: ~$0.125/GB-month + $0.065/provisioned IOPS

---
# Cost-effective gp3 (balanced)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: general-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
# Cost: ~$0.08/GB-month (37% cheaper)

Use gp3 volumes for 40% cost savings over gp2.

Backup Storage Optimization

Lifecycle policies for backups:

# Velero backup with TTL
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  ttl: 168h  # Retain for 7 days only
  includedNamespaces:
  - production
  storageLocation: aws-s3

Move old backups to cheaper storage:

# AWS S3 lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
  --bucket velero-backups \
  --lifecycle-configuration '{
    "Rules": [{
      "Id": "MoveToGlacier",
      "Status": "Enabled",
      "Transitions": [{
        "Days": 30,
        "StorageClass": "GLACIER"
      }],
      "Expiration": {
        "Days": 365
      }
    }]
  }'

Network Cost Reduction

Cross-AZ Traffic Costs

Problem: Cross-AZ data transfer costs $0.01-0.02/GB

Solution: Topology-aware routing

apiVersion: v1
kind: Service
metadata:
  name: api
  annotations:
    service.kubernetes.io/topology-mode: Auto  # Prefer same-zone routing
spec:
  selector:
    app: api
  ports:
  - port: 8080

NAT Gateway Costs

Problem: NAT Gateway costs $0.045/hour + $0.045/GB processed

Solution: VPC endpoints for AWS services

# Create VPC endpoint for S3 (free data transfer)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-12345678

# Create VPC endpoint for ECR (avoid NAT for image pulls)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.ecr.api \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-12345678

Annual savings: $4,000-8,000 per cluster

Load Balancer Optimization

Problem: Each LoadBalancer service creates expensive cloud load balancer

Solution: Ingress controller (single load balancer for multiple services)

# Before: 10 LoadBalancer services = 10 load balancers = $180/month
apiVersion: v1
kind: Service
metadata:
  name: app1
spec:
  type: LoadBalancer  # Creates dedicated ALB/NLB

---
# After: 1 Ingress = 1 load balancer = $18/month (90% savings)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: apps-ingress
spec:
  rules:
  - host: app1.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app1
            port:
              number: 8080
  - host: app2.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app2
            port:
              number: 8080

Namespace-Based Multi-Tenancy

Share clusters across teams/environments:

# Instead of 10 separate clusters, use 1 cluster with 10 namespaces
# Savings: 9 control planes ($0.10/hour each) = $650/month

# team-alpha namespace with quota
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    persistentvolumeclaims: "10"

Virtual Clusters (vcluster)

Cost-effective dev/test environments:

# Install vcluster
helm install vcluster vcluster/vcluster \
  --namespace team-dev \
  --set syncer.extraArgs={--out-kube-config-server=https://vcluster.example.com}

# Each vcluster runs inside pods (minimal cost)
# 10 virtual clusters ≈ cost of 2-3 small pods vs 10 full clusters

Cost Visibility and Monitoring

Kubecost for Cost Allocation

Install Kubecost:

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="YOUR_TOKEN"

Cost allocation by:

Namespace
Label (team, product, environment)
Controller (Deployment, StatefulSet)
Pod
Node

Set up cost alerts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-alerts
  namespace: kubecost
data:
  alerts.json: |
    [
      {
        "type": "budget",
        "threshold": 10000,
        "window": "month",
        "aggregation": "namespace",
        "filter": "namespace:production"
      }
    ]

Prometheus Queries for Cost Insights

# Cost per namespace (approximation)
sum(
  avg_over_time(container_memory_working_set_bytes[1h])
  * on(pod) group_left(namespace) kube_pod_info
) by (namespace) / 1024 / 1024 / 1024

# Waste from over-provisioning
sum(
  kube_pod_container_resource_requests{resource="memory"}
  - on(pod,container) container_memory_working_set_bytes
) by (namespace) / 1024 / 1024 / 1024

Grafana Dashboards

Import community dashboards:

FinOps Best Practices

1. Showback and Chargeback

Implement cost allocation:

# Label all resources for cost tracking
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    team: platform
    product: checkout
    environment: production
    cost-center: "1234"

Monthly cost reports by team:

# Generate cost report
kubectl cost --window 30d \
  --aggregate team \
  --output csv > team-costs.csv

2. Reserved Instances and Savings Plans

AWS Savings Plans:

Compute Savings Plans: Up to 66% discount
EC2 Instance Savings Plans: Up to 72% discount
Requires 1 or 3-year commitment

Recommendations:

Reserve 50-70% of baseline capacity
Use on-demand/spot for burst capacity
Review utilization quarterly

3. Development Environment Policies

Automatic shutdown of dev environments:

# Downscaler for non-production
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-downscaler
  namespace: kube-system
data:
  config.yaml: |
    downtime: Mon-Fri 19:00-08:00 America/New_York
    uptime: Mon-Fri 08:00-19:00 America/New_York
    exclude-namespaces: production,staging

Annual savings: $50,000-100,000 for medium-sized engineering teams

4. Budget Alerts and Governance

AWS Budget alerts:

aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "EKS-Monthly-Budget",
    "BudgetLimit": {
      "Amount": "10000",
      "Unit": "USD"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80
    },
    "Subscribers": [{
      "SubscriptionType": "EMAIL",
      "Address": "team@example.com"
    }]
  }]'

Cost Optimization Tools

OpenCost

Open-source cost monitoring:

kubectl apply -f https://raw.githubusercontent.com/opencost/opencost/main/kubernetes/opencost.yaml

# Access UI
kubectl port-forward -n opencost service/opencost 9090:9090

Goldilocks

Right-sizing recommendations:

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace

# Enable for namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Kube Resource Report

Generate cost reports:

helm install kube-resource-report kube-resource-report/kube-resource-report

# View report
kubectl port-forward -n kube-resource-report service/kube-resource-report 8080:80

Real-World Cost Optimization Results

Case Study 1: E-Commerce Platform

Initial state:

5 EKS clusters (dev, staging, prod, DR, testing)
All on-demand t3.xlarge instances
No autoscaling
35% average cluster utilization
Monthly cost: $18,400

Optimizations:

Consolidated to 2 clusters (prod, non-prod) with namespace isolation
Implemented cluster autoscaler
Right-sized resources (VPA recommendations)
60% spot instances for non-prod
40% spot instances for prod

Results:

Monthly cost: $7,700 (58% reduction)
Annual savings: $128,400
Better utilization: 72%

Read full case study

Case Study 2: SaaS Platform

Initial state:

Over-provisioned pod resources
No autoscaling to zero
Expensive storage classes
Monthly cost: $24,200

Optimizations:

Right-sized based on actual usage
Implemented KEDA for scale-to-zero
Switched gp2 → gp3 storage
Topology-aware routing

Results:

Monthly cost: $13,800 (43% reduction)
Annual savings: $124,800
Same performance and reliability

Implementation Roadmap

Week 1-2: Visibility and Baseline

Deploy Kubecost or OpenCost
Analyze current spending by namespace/team
Identify top cost drivers
Document baseline metrics

Week 3-4: Quick Wins

Delete unused PVCs and volumes
Consolidate load balancers to ingress
Implement VPC endpoints
Enable topology-aware routing

Week 5-6: Right-Sizing

Analyze VPA recommendations
Implement resource quotas
Right-size pod resources
Remove over-provisioning

Week 7-8: Autoscaling

Deploy Cluster Autoscaler
Implement HPA for applications
Configure KEDA for event-driven scaling
Test scale-down behavior

Week 9-10: Spot Instances

Create spot node groups
Migrate fault-tolerant workloads
Implement interruption handling
Gradually increase spot percentage

Week 11-12: Governance

Implement LimitRanges
Set up budget alerts
Create cost allocation reports
Establish FinOps processes

Conclusion

Kubernetes cost optimization is an ongoing process requiring visibility, governance, and continuous improvement. By implementing the strategies in this guide, you can achieve 40-60% cost reduction while maintaining performance and reliability.

Key takeaways:

Right-size resources based on actual usage, not guesses
Implement autoscaling at all levels (pod, node, cluster)
Use spot/preemptible instances for 60-90% savings
Consolidate infrastructure through multi-tenancy
Establish cost visibility and governance

Need help optimizing your Kubernetes costs? Tasrie IT Services specializes in Kubernetes cost optimization and FinOps. Our team has helped clients reduce cloud spending by $100K-500K annually while improving performance.

Schedule a free cost assessment to identify optimization opportunities in your Kubernetes infrastructure.

External resources: