Engineering

Kubernetes Cost Optimization: Complete Guide to Reducing Cloud Spend in 2026

Tasrie IT Services

Kubernetes costs can spiral out of control without proper optimization. Over-provisioned clusters, inefficient resource allocation, and lack of visibility lead to wasted cloud spend. This comprehensive guide shares proven strategies that have helped our clients reduce Kubernetes costs by 40-60% while maintaining performance and reliability.

Table of Contents

  1. Understanding Kubernetes Cost Drivers
  2. Resource Right-Sizing Strategies
  3. Autoscaling for Cost Efficiency
  4. Spot and Preemptible Instances
  5. Storage Optimization
  6. Network Cost Reduction
  7. Multi-Tenancy and Resource Sharing
  8. Cost Visibility and Monitoring
  9. FinOps Best Practices
  10. Cost Optimization Tools

Understanding Kubernetes Cost Drivers

Primary Cost Components

Compute costs (60-75% of total):

  • Node instances (EC2, GCE, Azure VMs)
  • CPU and memory resources
  • Reserved vs on-demand vs spot pricing
  • Idle capacity and over-provisioning

Storage costs (15-25% of total):

  • Persistent volumes (EBS, Persistent Disks, Azure Disks)
  • Block storage vs object storage
  • Snapshot and backup storage
  • Storage I/O operations

Network costs (10-20% of total):

  • Data transfer between availability zones
  • Cross-region traffic
  • Internet egress charges
  • Load balancer costs

Additional costs:

  • Control plane fees (managed Kubernetes)
  • Container registries
  • Logging and monitoring infrastructure
  • Load balancers and ingress controllers

Real-World Cost Challenges

Our e-commerce Kubernetes migration project achieved 58% infrastructure cost reduction. Common cost issues we encountered:

Over-provisioning waste:

  • Pods requesting 2GB memory but using 300MB (85% waste)
  • CPU requests set to 1000m, actual usage 150m (85% waste)
  • Clusters running at 35% utilization paying for 100%
  • Development environments sized identically to production

Inefficient configurations:

  • Running expensive instance types without autoscaling
  • No use of spot instances for fault-tolerant workloads
  • Persistent volumes never resized or cleaned up
  • Multiple clusters when multi-tenancy would suffice

Lack of visibility:

  • No cost allocation by team, product, or environment
  • Unknown costs per application or customer
  • No budget alerts or governance
  • Developers unaware of resource costs

Resource Right-Sizing Strategies

Analyzing Current Resource Usage

Use Vertical Pod Autoscaler (VPA) in recommendation mode:

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Create VPA in recommendation mode
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updateMode: "Off"  # Recommendation only
EOF

# Get recommendations
kubectl describe vpa app-vpa

Use Prometheus metrics for historical analysis:

# CPU usage over time
avg(rate(container_cpu_usage_seconds_total{container="app"}[5m])) by (pod)

# Memory usage over time
avg(container_memory_working_set_bytes{container="app"}) by (pod)

# Request vs actual usage (identify over-provisioning)
(
  avg(container_memory_working_set_bytes{container="app"}) by (pod)
  /
  avg(kube_pod_container_resource_requests{resource="memory",container="app"}) by (pod)
) * 100

Implementing Right-Sizing

Before optimization (over-provisioned):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 10
  template:
    spec:
      containers:
      - name: api
        image: myapi:1.0
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"

After optimization (right-sized based on actual usage):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 6
  template:
    spec:
      containers:
      - name: api
        image: myapi:1.0
        resources:
          requests:
            cpu: "200m"      # Reduced from 1000m
            memory: "512Mi"  # Reduced from 2Gi
          limits:
            cpu: "500m"      # Reduced from 2000m
            memory: "1Gi"    # Reduced from 4Gi

Impact:

  • CPU requests: 10,000m → 1,200m (88% reduction)
  • Memory requests: 20Gi → 3Gi (85% reduction)
  • Monthly cost: $2,400 → $360 (85% reduction)
  • Same performance maintained

Setting Appropriate Limits

Anti-pattern: No limits set

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  # No limits - can consume entire node resources

Anti-pattern: Limits == Requests

resources:
  requests:
    cpu: "1000m"
    memory: "1Gi"
  limits:
    cpu: "1000m"
    memory: "1Gi"
  # Prevents bursting, over-provisions

Best practice: Appropriate limit-to-request ratio

resources:
  requests:
    cpu: "200m"      # Guaranteed minimum
    memory: "256Mi"
  limits:
    cpu: "500m"      # Allow 2.5x burst
    memory: "512Mi"  # Allow 2x burst

LimitRanges and ResourceQuotas

Enforce default limits per namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
  - default:  # Default limits
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:  # Default requests
      cpu: "100m"
      memory: "128Mi"
    max:  # Maximum allowed
      cpu: "2000m"
      memory: "4Gi"
    min:  # Minimum required
      cpu: "50m"
      memory: "64Mi"
    type: Container

Enforce namespace-level quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "50"
    requests.memory: "100Gi"
    limits.cpu: "100"
    limits.memory: "200Gi"
    persistentvolumeclaims: "20"
    services.loadbalancers: "2"

Autoscaling for Cost Efficiency

Horizontal Pod Autoscaler (HPA)

CPU-based autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Target 70% CPU
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scale down
      policies:
      - type: Percent
        value: 50  # Scale down max 50% of pods at once
        periodSeconds: 60

Multi-metric autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa-advanced
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # 1000 req/s per pod

Cluster Autoscaler

AWS EKS configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
  namespace: kube-system
data:
  config.yaml: |
    ---
    cordon-node-before-terminating: true
    scale-down-enabled: true
    scale-down-delay-after-add: 10m
    scale-down-unneeded-time: 10m
    scale-down-utilization-threshold: 0.5  # Scale down when < 50% utilized
    skip-nodes-with-local-storage: false
    skip-nodes-with-system-pods: false

Node group configuration for optimal scaling:

# AWS Auto Scaling Group tags
--tags=k8s.io/cluster-autoscaler/enabled=true
--tags=k8s.io/cluster-autoscaler/my-cluster=owned
--tags=k8s.io/cluster-autoscaler/node-template/label/workload=general

# Multiple node groups for different workloads
Node Group 1 (general):
  Instance types: t3.medium, t3.large
  Min: 2, Max: 20

Node Group 2 (compute-intensive):
  Instance types: c6i.xlarge, c6i.2xlarge
  Min: 0, Max: 10

Node Group 3 (memory-intensive):
  Instance types: r6i.xlarge, r6i.2xlarge
  Min: 0, Max: 10

KEDA for Event-Driven Autoscaling

Scale based on external metrics:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-processor-scaler
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0  # Scale to zero when idle
  maxReplicaCount: 50
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/my-queue
      queueLength: "5"  # Target 5 messages per pod
      awsRegion: "us-east-1"

Scale to zero for dev environments:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-api-scaler
  namespace: development
spec:
  scaleTargetRef:
    name: api
  minReplicaCount: 0  # Scale to zero outside business hours
  maxReplicaCount: 5
  triggers:
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 9 * * 1-5    # Scale up at 9 AM weekdays
      end: 0 18 * * 1-5     # Scale down at 6 PM weekdays
      desiredReplicas: "3"

Spot and Preemptible Instances

AWS Spot Instances for EKS

Cost savings: 60-90% vs on-demand

Node group configuration:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-east-1

nodeGroups:
  - name: spot-general
    instancesDistribution:
      instanceTypes:
        - t3.large
        - t3a.large
        - t2.large
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 3
    minSize: 2
    maxSize: 20
    labels:
      workload: general
      capacity-type: spot
    tags:
      k8s.io/cluster-autoscaler/node-template/label/capacity-type: spot

Workload scheduling on spot nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  template:
    spec:
      nodeSelector:
        capacity-type: spot  # Schedule on spot instances
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      containers:
      - name: processor
        image: batch-processor:1.0

GKE Preemptible VMs

Cost savings: 70-80% vs standard VMs

Our travel platform scaling case study used preemptible VMs to achieve 42% cost reduction.

Node pool configuration:

gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --preemptible \
  --num-nodes=3 \
  --enable-autoscaling \
  --min-nodes=0 \
  --max-nodes=20 \
  --machine-type=n1-standard-4 \
  --disk-size=100

Handling Spot/Preemptible Interruptions

Pod Disruption Budgets (PDB):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # Always keep 2 pods running
  selector:
    matchLabels:
      app: api

Node termination handlers:

# AWS Node Termination Handler
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-node-termination-handler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: aws-node-termination-handler
        image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ENABLE_SPOT_INTERRUPTION_DRAINING
          value: "true"
        - name: ENABLE_SCHEDULED_EVENT_DRAINING
          value: "true"

Storage Optimization

PersistentVolume Right-Sizing

Identify unused volumes:

# Find PVCs with low utilization
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.status.phase == "Bound") |
  "\(.metadata.namespace)/\(.metadata.name) - \(.spec.resources.requests.storage)"'

# Check actual volume usage (requires metrics-server)
kubectl top pv

Resize volumes dynamically:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi  # Reduced from 500Gi based on actual usage

Storage Class Optimization

AWS EBS storage classes:

# Expensive io2 (high IOPS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iopsPerGB: "50"
  fsType: ext4
# Cost: ~$0.125/GB-month + $0.065/provisioned IOPS

---
# Cost-effective gp3 (balanced)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: general-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
# Cost: ~$0.08/GB-month (37% cheaper)

Use gp3 volumes for 40% cost savings over gp2.

Backup Storage Optimization

Lifecycle policies for backups:

# Velero backup with TTL
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  ttl: 168h  # Retain for 7 days only
  includedNamespaces:
  - production
  storageLocation: aws-s3

Move old backups to cheaper storage:

# AWS S3 lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
  --bucket velero-backups \
  --lifecycle-configuration '{
    "Rules": [{
      "Id": "MoveToGlacier",
      "Status": "Enabled",
      "Transitions": [{
        "Days": 30,
        "StorageClass": "GLACIER"
      }],
      "Expiration": {
        "Days": 365
      }
    }]
  }'

Network Cost Reduction

Cross-AZ Traffic Costs

Problem: Cross-AZ data transfer costs $0.01-0.02/GB

Solution: Topology-aware routing

apiVersion: v1
kind: Service
metadata:
  name: api
  annotations:
    service.kubernetes.io/topology-mode: Auto  # Prefer same-zone routing
spec:
  selector:
    app: api
  ports:
  - port: 8080

NAT Gateway Costs

Problem: NAT Gateway costs $0.045/hour + $0.045/GB processed

Solution: VPC endpoints for AWS services

# Create VPC endpoint for S3 (free data transfer)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-12345678

# Create VPC endpoint for ECR (avoid NAT for image pulls)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.ecr.api \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-12345678

Annual savings: $4,000-8,000 per cluster

Load Balancer Optimization

Problem: Each LoadBalancer service creates expensive cloud load balancer

Solution: Ingress controller (single load balancer for multiple services)

# Before: 10 LoadBalancer services = 10 load balancers = $180/month
apiVersion: v1
kind: Service
metadata:
  name: app1
spec:
  type: LoadBalancer  # Creates dedicated ALB/NLB

---
# After: 1 Ingress = 1 load balancer = $18/month (90% savings)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: apps-ingress
spec:
  rules:
  - host: app1.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app1
            port:
              number: 8080
  - host: app2.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app2
            port:
              number: 8080

Multi-Tenancy and Resource Sharing

Namespace-Based Multi-Tenancy

Share clusters across teams/environments:

# Instead of 10 separate clusters, use 1 cluster with 10 namespaces
# Savings: 9 control planes ($0.10/hour each) = $650/month

# team-alpha namespace with quota
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    persistentvolumeclaims: "10"

Virtual Clusters (vcluster)

Cost-effective dev/test environments:

# Install vcluster
helm install vcluster vcluster/vcluster \
  --namespace team-dev \
  --set syncer.extraArgs={--out-kube-config-server=https://vcluster.example.com}

# Each vcluster runs inside pods (minimal cost)
# 10 virtual clusters ≈ cost of 2-3 small pods vs 10 full clusters

Cost Visibility and Monitoring

Kubecost for Cost Allocation

Install Kubecost:

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="YOUR_TOKEN"

Cost allocation by:

  • Namespace
  • Label (team, product, environment)
  • Controller (Deployment, StatefulSet)
  • Pod
  • Node

Set up cost alerts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-alerts
  namespace: kubecost
data:
  alerts.json: |
    [
      {
        "type": "budget",
        "threshold": 10000,
        "window": "month",
        "aggregation": "namespace",
        "filter": "namespace:production"
      }
    ]

Prometheus Queries for Cost Insights

# Cost per namespace (approximation)
sum(
  avg_over_time(container_memory_working_set_bytes[1h])
  * on(pod) group_left(namespace) kube_pod_info
) by (namespace) / 1024 / 1024 / 1024

# Waste from over-provisioning
sum(
  kube_pod_container_resource_requests{resource="memory"}
  - on(pod,container) container_memory_working_set_bytes
) by (namespace) / 1024 / 1024 / 1024

Grafana Dashboards

Import community dashboards:

FinOps Best Practices

1. Showback and Chargeback

Implement cost allocation:

# Label all resources for cost tracking
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    team: platform
    product: checkout
    environment: production
    cost-center: "1234"

Monthly cost reports by team:

# Generate cost report
kubectl cost --window 30d \
  --aggregate team \
  --output csv > team-costs.csv

2. Reserved Instances and Savings Plans

AWS Savings Plans:

  • Compute Savings Plans: Up to 66% discount
  • EC2 Instance Savings Plans: Up to 72% discount
  • Requires 1 or 3-year commitment

Recommendations:

  • Reserve 50-70% of baseline capacity
  • Use on-demand/spot for burst capacity
  • Review utilization quarterly

3. Development Environment Policies

Automatic shutdown of dev environments:

# Downscaler for non-production
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-downscaler
  namespace: kube-system
data:
  config.yaml: |
    downtime: Mon-Fri 19:00-08:00 America/New_York
    uptime: Mon-Fri 08:00-19:00 America/New_York
    exclude-namespaces: production,staging

Annual savings: $50,000-100,000 for medium-sized engineering teams

4. Budget Alerts and Governance

AWS Budget alerts:

aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "EKS-Monthly-Budget",
    "BudgetLimit": {
      "Amount": "10000",
      "Unit": "USD"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80
    },
    "Subscribers": [{
      "SubscriptionType": "EMAIL",
      "Address": "team@example.com"
    }]
  }]'

Cost Optimization Tools

OpenCost

Open-source cost monitoring:

kubectl apply -f https://raw.githubusercontent.com/opencost/opencost/main/kubernetes/opencost.yaml

# Access UI
kubectl port-forward -n opencost service/opencost 9090:9090

Goldilocks

Right-sizing recommendations:

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace

# Enable for namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Kube Resource Report

Generate cost reports:

helm install kube-resource-report kube-resource-report/kube-resource-report

# View report
kubectl port-forward -n kube-resource-report service/kube-resource-report 8080:80

Real-World Cost Optimization Results

Case Study 1: E-Commerce Platform

Initial state:

  • 5 EKS clusters (dev, staging, prod, DR, testing)
  • All on-demand t3.xlarge instances
  • No autoscaling
  • 35% average cluster utilization
  • Monthly cost: $18,400

Optimizations:

  1. Consolidated to 2 clusters (prod, non-prod) with namespace isolation
  2. Implemented cluster autoscaler
  3. Right-sized resources (VPA recommendations)
  4. 60% spot instances for non-prod
  5. 40% spot instances for prod

Results:

  • Monthly cost: $7,700 (58% reduction)
  • Annual savings: $128,400
  • Better utilization: 72%

Read full case study

Case Study 2: SaaS Platform

Initial state:

  • Over-provisioned pod resources
  • No autoscaling to zero
  • Expensive storage classes
  • Monthly cost: $24,200

Optimizations:

  1. Right-sized based on actual usage
  2. Implemented KEDA for scale-to-zero
  3. Switched gp2 → gp3 storage
  4. Topology-aware routing

Results:

  • Monthly cost: $13,800 (43% reduction)
  • Annual savings: $124,800
  • Same performance and reliability

Implementation Roadmap

Week 1-2: Visibility and Baseline

  • Deploy Kubecost or OpenCost
  • Analyze current spending by namespace/team
  • Identify top cost drivers
  • Document baseline metrics

Week 3-4: Quick Wins

  • Delete unused PVCs and volumes
  • Consolidate load balancers to ingress
  • Implement VPC endpoints
  • Enable topology-aware routing

Week 5-6: Right-Sizing

  • Analyze VPA recommendations
  • Implement resource quotas
  • Right-size pod resources
  • Remove over-provisioning

Week 7-8: Autoscaling

  • Deploy Cluster Autoscaler
  • Implement HPA for applications
  • Configure KEDA for event-driven scaling
  • Test scale-down behavior

Week 9-10: Spot Instances

  • Create spot node groups
  • Migrate fault-tolerant workloads
  • Implement interruption handling
  • Gradually increase spot percentage

Week 11-12: Governance

  • Implement LimitRanges
  • Set up budget alerts
  • Create cost allocation reports
  • Establish FinOps processes

Conclusion

Kubernetes cost optimization is an ongoing process requiring visibility, governance, and continuous improvement. By implementing the strategies in this guide, you can achieve 40-60% cost reduction while maintaining performance and reliability.

Key takeaways:

  • Right-size resources based on actual usage, not guesses
  • Implement autoscaling at all levels (pod, node, cluster)
  • Use spot/preemptible instances for 60-90% savings
  • Consolidate infrastructure through multi-tenancy
  • Establish cost visibility and governance

Need help optimizing your Kubernetes costs? Tasrie IT Services specializes in Kubernetes cost optimization and FinOps. Our team has helped clients reduce cloud spending by $100K-500K annually while improving performance.

Schedule a free cost assessment to identify optimization opportunities in your Kubernetes infrastructure.

External resources:

Related Articles

Continue exploring these related topics

Chat with real humans