Kubernetes costs can spiral out of control without proper optimization. Over-provisioned clusters, inefficient resource allocation, and lack of visibility lead to wasted cloud spend. This comprehensive guide shares proven strategies that have helped our clients reduce Kubernetes costs by 40-60% while maintaining performance and reliability.
Table of Contents
- Understanding Kubernetes Cost Drivers
- Resource Right-Sizing Strategies
- Autoscaling for Cost Efficiency
- Spot and Preemptible Instances
- Storage Optimization
- Network Cost Reduction
- Multi-Tenancy and Resource Sharing
- Cost Visibility and Monitoring
- FinOps Best Practices
- Cost Optimization Tools
Understanding Kubernetes Cost Drivers
Primary Cost Components
Compute costs (60-75% of total):
- Node instances (EC2, GCE, Azure VMs)
- CPU and memory resources
- Reserved vs on-demand vs spot pricing
- Idle capacity and over-provisioning
Storage costs (15-25% of total):
- Persistent volumes (EBS, Persistent Disks, Azure Disks)
- Block storage vs object storage
- Snapshot and backup storage
- Storage I/O operations
Network costs (10-20% of total):
- Data transfer between availability zones
- Cross-region traffic
- Internet egress charges
- Load balancer costs
Additional costs:
- Control plane fees (managed Kubernetes)
- Container registries
- Logging and monitoring infrastructure
- Load balancers and ingress controllers
Real-World Cost Challenges
Our e-commerce Kubernetes migration project achieved 58% infrastructure cost reduction. Common cost issues we encountered:
Over-provisioning waste:
- Pods requesting 2GB memory but using 300MB (85% waste)
- CPU requests set to 1000m, actual usage 150m (85% waste)
- Clusters running at 35% utilization paying for 100%
- Development environments sized identically to production
Inefficient configurations:
- Running expensive instance types without autoscaling
- No use of spot instances for fault-tolerant workloads
- Persistent volumes never resized or cleaned up
- Multiple clusters when multi-tenancy would suffice
Lack of visibility:
- No cost allocation by team, product, or environment
- Unknown costs per application or customer
- No budget alerts or governance
- Developers unaware of resource costs
Resource Right-Sizing Strategies
Analyzing Current Resource Usage
Use Vertical Pod Autoscaler (VPA) in recommendation mode:
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Create VPA in recommendation mode
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app
updateMode: "Off" # Recommendation only
EOF
# Get recommendations
kubectl describe vpa app-vpa
Use Prometheus metrics for historical analysis:
# CPU usage over time
avg(rate(container_cpu_usage_seconds_total{container="app"}[5m])) by (pod)
# Memory usage over time
avg(container_memory_working_set_bytes{container="app"}) by (pod)
# Request vs actual usage (identify over-provisioning)
(
avg(container_memory_working_set_bytes{container="app"}) by (pod)
/
avg(kube_pod_container_resource_requests{resource="memory",container="app"}) by (pod)
) * 100
Implementing Right-Sizing
Before optimization (over-provisioned):
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 10
template:
spec:
containers:
- name: api
image: myapi:1.0
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
After optimization (right-sized based on actual usage):
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 6
template:
spec:
containers:
- name: api
image: myapi:1.0
resources:
requests:
cpu: "200m" # Reduced from 1000m
memory: "512Mi" # Reduced from 2Gi
limits:
cpu: "500m" # Reduced from 2000m
memory: "1Gi" # Reduced from 4Gi
Impact:
- CPU requests: 10,000m → 1,200m (88% reduction)
- Memory requests: 20Gi → 3Gi (85% reduction)
- Monthly cost: $2,400 → $360 (85% reduction)
- Same performance maintained
Setting Appropriate Limits
Anti-pattern: No limits set
resources:
requests:
cpu: "100m"
memory: "128Mi"
# No limits - can consume entire node resources
Anti-pattern: Limits == Requests
resources:
requests:
cpu: "1000m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "1Gi"
# Prevents bursting, over-provisions
Best practice: Appropriate limit-to-request ratio
resources:
requests:
cpu: "200m" # Guaranteed minimum
memory: "256Mi"
limits:
cpu: "500m" # Allow 2.5x burst
memory: "512Mi" # Allow 2x burst
LimitRanges and ResourceQuotas
Enforce default limits per namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: development
spec:
limits:
- default: # Default limits
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default requests
cpu: "100m"
memory: "128Mi"
max: # Maximum allowed
cpu: "2000m"
memory: "4Gi"
min: # Minimum required
cpu: "50m"
memory: "64Mi"
type: Container
Enforce namespace-level quotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "50"
requests.memory: "100Gi"
limits.cpu: "100"
limits.memory: "200Gi"
persistentvolumeclaims: "20"
services.loadbalancers: "2"
Autoscaling for Cost Efficiency
Horizontal Pod Autoscaler (HPA)
CPU-based autoscaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Target 70% CPU
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scale down
policies:
- type: Percent
value: 50 # Scale down max 50% of pods at once
periodSeconds: 60
Multi-metric autoscaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa-advanced
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000" # 1000 req/s per pod
Cluster Autoscaler
AWS EKS configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-config
namespace: kube-system
data:
config.yaml: |
---
cordon-node-before-terminating: true
scale-down-enabled: true
scale-down-delay-after-add: 10m
scale-down-unneeded-time: 10m
scale-down-utilization-threshold: 0.5 # Scale down when < 50% utilized
skip-nodes-with-local-storage: false
skip-nodes-with-system-pods: false
Node group configuration for optimal scaling:
# AWS Auto Scaling Group tags
--tags=k8s.io/cluster-autoscaler/enabled=true
--tags=k8s.io/cluster-autoscaler/my-cluster=owned
--tags=k8s.io/cluster-autoscaler/node-template/label/workload=general
# Multiple node groups for different workloads
Node Group 1 (general):
Instance types: t3.medium, t3.large
Min: 2, Max: 20
Node Group 2 (compute-intensive):
Instance types: c6i.xlarge, c6i.2xlarge
Min: 0, Max: 10
Node Group 3 (memory-intensive):
Instance types: r6i.xlarge, r6i.2xlarge
Min: 0, Max: 10
KEDA for Event-Driven Autoscaling
Scale based on external metrics:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-processor-scaler
spec:
scaleTargetRef:
name: queue-processor
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456/my-queue
queueLength: "5" # Target 5 messages per pod
awsRegion: "us-east-1"
Scale to zero for dev environments:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: dev-api-scaler
namespace: development
spec:
scaleTargetRef:
name: api
minReplicaCount: 0 # Scale to zero outside business hours
maxReplicaCount: 5
triggers:
- type: cron
metadata:
timezone: America/New_York
start: 0 9 * * 1-5 # Scale up at 9 AM weekdays
end: 0 18 * * 1-5 # Scale down at 6 PM weekdays
desiredReplicas: "3"
Spot and Preemptible Instances
AWS Spot Instances for EKS
Cost savings: 60-90% vs on-demand
Node group configuration:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
region: us-east-1
nodeGroups:
- name: spot-general
instancesDistribution:
instanceTypes:
- t3.large
- t3a.large
- t2.large
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotInstancePools: 3
minSize: 2
maxSize: 20
labels:
workload: general
capacity-type: spot
tags:
k8s.io/cluster-autoscaler/node-template/label/capacity-type: spot
Workload scheduling on spot nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
replicas: 10
template:
spec:
nodeSelector:
capacity-type: spot # Schedule on spot instances
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- name: processor
image: batch-processor:1.0
GKE Preemptible VMs
Cost savings: 70-80% vs standard VMs
Our travel platform scaling case study used preemptible VMs to achieve 42% cost reduction.
Node pool configuration:
gcloud container node-pools create spot-pool \
--cluster=my-cluster \
--preemptible \
--num-nodes=3 \
--enable-autoscaling \
--min-nodes=0 \
--max-nodes=20 \
--machine-type=n1-standard-4 \
--disk-size=100
Handling Spot/Preemptible Interruptions
Pod Disruption Budgets (PDB):
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # Always keep 2 pods running
selector:
matchLabels:
app: api
Node termination handlers:
# AWS Node Termination Handler
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-node-termination-handler
namespace: kube-system
spec:
template:
spec:
containers:
- name: aws-node-termination-handler
image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ENABLE_SPOT_INTERRUPTION_DRAINING
value: "true"
- name: ENABLE_SCHEDULED_EVENT_DRAINING
value: "true"
Storage Optimization
PersistentVolume Right-Sizing
Identify unused volumes:
# Find PVCs with low utilization
kubectl get pvc --all-namespaces -o json | \
jq -r '.items[] | select(.status.phase == "Bound") |
"\(.metadata.namespace)/\(.metadata.name) - \(.spec.resources.requests.storage)"'
# Check actual volume usage (requires metrics-server)
kubectl top pv
Resize volumes dynamically:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-pvc
spec:
storageClassName: gp3
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi # Reduced from 500Gi based on actual usage
Storage Class Optimization
AWS EBS storage classes:
# Expensive io2 (high IOPS)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: io2
iopsPerGB: "50"
fsType: ext4
# Cost: ~$0.125/GB-month + $0.065/provisioned IOPS
---
# Cost-effective gp3 (balanced)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: general-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
fsType: ext4
# Cost: ~$0.08/GB-month (37% cheaper)
Use gp3 volumes for 40% cost savings over gp2.
Backup Storage Optimization
Lifecycle policies for backups:
# Velero backup with TTL
apiVersion: velero.io/v1
kind: Backup
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *"
ttl: 168h # Retain for 7 days only
includedNamespaces:
- production
storageLocation: aws-s3
Move old backups to cheaper storage:
# AWS S3 lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket velero-backups \
--lifecycle-configuration '{
"Rules": [{
"Id": "MoveToGlacier",
"Status": "Enabled",
"Transitions": [{
"Days": 30,
"StorageClass": "GLACIER"
}],
"Expiration": {
"Days": 365
}
}]
}'
Network Cost Reduction
Cross-AZ Traffic Costs
Problem: Cross-AZ data transfer costs $0.01-0.02/GB
Solution: Topology-aware routing
apiVersion: v1
kind: Service
metadata:
name: api
annotations:
service.kubernetes.io/topology-mode: Auto # Prefer same-zone routing
spec:
selector:
app: api
ports:
- port: 8080
NAT Gateway Costs
Problem: NAT Gateway costs $0.045/hour + $0.045/GB processed
Solution: VPC endpoints for AWS services
# Create VPC endpoint for S3 (free data transfer)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345678 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-12345678
# Create VPC endpoint for ECR (avoid NAT for image pulls)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345678 \
--service-name com.amazonaws.us-east-1.ecr.api \
--vpc-endpoint-type Interface \
--subnet-ids subnet-12345678
Annual savings: $4,000-8,000 per cluster
Load Balancer Optimization
Problem: Each LoadBalancer service creates expensive cloud load balancer
Solution: Ingress controller (single load balancer for multiple services)
# Before: 10 LoadBalancer services = 10 load balancers = $180/month
apiVersion: v1
kind: Service
metadata:
name: app1
spec:
type: LoadBalancer # Creates dedicated ALB/NLB
---
# After: 1 Ingress = 1 load balancer = $18/month (90% savings)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: apps-ingress
spec:
rules:
- host: app1.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app1
port:
number: 8080
- host: app2.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app2
port:
number: 8080
Multi-Tenancy and Resource Sharing
Namespace-Based Multi-Tenancy
Share clusters across teams/environments:
# Instead of 10 separate clusters, use 1 cluster with 10 namespaces
# Savings: 9 control planes ($0.10/hour each) = $650/month
# team-alpha namespace with quota
apiVersion: v1
kind: Namespace
metadata:
name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
persistentvolumeclaims: "10"
Virtual Clusters (vcluster)
Cost-effective dev/test environments:
# Install vcluster
helm install vcluster vcluster/vcluster \
--namespace team-dev \
--set syncer.extraArgs={--out-kube-config-server=https://vcluster.example.com}
# Each vcluster runs inside pods (minimal cost)
# 10 virtual clusters ≈ cost of 2-3 small pods vs 10 full clusters
Cost Visibility and Monitoring
Kubecost for Cost Allocation
Install Kubecost:
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="YOUR_TOKEN"
Cost allocation by:
- Namespace
- Label (team, product, environment)
- Controller (Deployment, StatefulSet)
- Pod
- Node
Set up cost alerts:
apiVersion: v1
kind: ConfigMap
metadata:
name: kubecost-alerts
namespace: kubecost
data:
alerts.json: |
[
{
"type": "budget",
"threshold": 10000,
"window": "month",
"aggregation": "namespace",
"filter": "namespace:production"
}
]
Prometheus Queries for Cost Insights
# Cost per namespace (approximation)
sum(
avg_over_time(container_memory_working_set_bytes[1h])
* on(pod) group_left(namespace) kube_pod_info
) by (namespace) / 1024 / 1024 / 1024
# Waste from over-provisioning
sum(
kube_pod_container_resource_requests{resource="memory"}
- on(pod,container) container_memory_working_set_bytes
) by (namespace) / 1024 / 1024 / 1024
Grafana Dashboards
Import community dashboards:
FinOps Best Practices
1. Showback and Chargeback
Implement cost allocation:
# Label all resources for cost tracking
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
team: platform
product: checkout
environment: production
cost-center: "1234"
Monthly cost reports by team:
# Generate cost report
kubectl cost --window 30d \
--aggregate team \
--output csv > team-costs.csv
2. Reserved Instances and Savings Plans
AWS Savings Plans:
- Compute Savings Plans: Up to 66% discount
- EC2 Instance Savings Plans: Up to 72% discount
- Requires 1 or 3-year commitment
Recommendations:
- Reserve 50-70% of baseline capacity
- Use on-demand/spot for burst capacity
- Review utilization quarterly
3. Development Environment Policies
Automatic shutdown of dev environments:
# Downscaler for non-production
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-downscaler
namespace: kube-system
data:
config.yaml: |
downtime: Mon-Fri 19:00-08:00 America/New_York
uptime: Mon-Fri 08:00-19:00 America/New_York
exclude-namespaces: production,staging
Annual savings: $50,000-100,000 for medium-sized engineering teams
4. Budget Alerts and Governance
AWS Budget alerts:
aws budgets create-budget \
--account-id 123456789012 \
--budget '{
"BudgetName": "EKS-Monthly-Budget",
"BudgetLimit": {
"Amount": "10000",
"Unit": "USD"
},
"TimeUnit": "MONTHLY",
"BudgetType": "COST"
}' \
--notifications-with-subscribers '[{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80
},
"Subscribers": [{
"SubscriptionType": "EMAIL",
"Address": "team@example.com"
}]
}]'
Cost Optimization Tools
OpenCost
Open-source cost monitoring:
kubectl apply -f https://raw.githubusercontent.com/opencost/opencost/main/kubernetes/opencost.yaml
# Access UI
kubectl port-forward -n opencost service/opencost 9090:9090
Goldilocks
Right-sizing recommendations:
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace
# Enable for namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
Kube Resource Report
Generate cost reports:
helm install kube-resource-report kube-resource-report/kube-resource-report
# View report
kubectl port-forward -n kube-resource-report service/kube-resource-report 8080:80
Real-World Cost Optimization Results
Case Study 1: E-Commerce Platform
Initial state:
- 5 EKS clusters (dev, staging, prod, DR, testing)
- All on-demand t3.xlarge instances
- No autoscaling
- 35% average cluster utilization
- Monthly cost: $18,400
Optimizations:
- Consolidated to 2 clusters (prod, non-prod) with namespace isolation
- Implemented cluster autoscaler
- Right-sized resources (VPA recommendations)
- 60% spot instances for non-prod
- 40% spot instances for prod
Results:
- Monthly cost: $7,700 (58% reduction)
- Annual savings: $128,400
- Better utilization: 72%
Case Study 2: SaaS Platform
Initial state:
- Over-provisioned pod resources
- No autoscaling to zero
- Expensive storage classes
- Monthly cost: $24,200
Optimizations:
- Right-sized based on actual usage
- Implemented KEDA for scale-to-zero
- Switched gp2 → gp3 storage
- Topology-aware routing
Results:
- Monthly cost: $13,800 (43% reduction)
- Annual savings: $124,800
- Same performance and reliability
Implementation Roadmap
Week 1-2: Visibility and Baseline
- Deploy Kubecost or OpenCost
- Analyze current spending by namespace/team
- Identify top cost drivers
- Document baseline metrics
Week 3-4: Quick Wins
- Delete unused PVCs and volumes
- Consolidate load balancers to ingress
- Implement VPC endpoints
- Enable topology-aware routing
Week 5-6: Right-Sizing
- Analyze VPA recommendations
- Implement resource quotas
- Right-size pod resources
- Remove over-provisioning
Week 7-8: Autoscaling
- Deploy Cluster Autoscaler
- Implement HPA for applications
- Configure KEDA for event-driven scaling
- Test scale-down behavior
Week 9-10: Spot Instances
- Create spot node groups
- Migrate fault-tolerant workloads
- Implement interruption handling
- Gradually increase spot percentage
Week 11-12: Governance
- Implement LimitRanges
- Set up budget alerts
- Create cost allocation reports
- Establish FinOps processes
Conclusion
Kubernetes cost optimization is an ongoing process requiring visibility, governance, and continuous improvement. By implementing the strategies in this guide, you can achieve 40-60% cost reduction while maintaining performance and reliability.
Key takeaways:
- Right-size resources based on actual usage, not guesses
- Implement autoscaling at all levels (pod, node, cluster)
- Use spot/preemptible instances for 60-90% savings
- Consolidate infrastructure through multi-tenancy
- Establish cost visibility and governance
Need help optimizing your Kubernetes costs? Tasrie IT Services specializes in Kubernetes cost optimization and FinOps. Our team has helped clients reduce cloud spending by $100K-500K annually while improving performance.
Schedule a free cost assessment to identify optimization opportunities in your Kubernetes infrastructure.
Related Resources
- Kubernetes Consulting Services
- AWS EKS Optimization
- Google GKE Cost Management
- Azure AKS Cost Optimization
- Cloud Migration with Cost Efficiency
External resources: