Kubernetes autoscaling continues to evolve rapidly in 2026, with significant updates to HPA, VPA, KEDA, and Karpenter. This comprehensive guide covers the latest developments, best practices, and emerging trends in workload and cluster autoscaling.
Quick Overview: Kubernetes Autoscaling in 2026
| Tool | Type | Scales | Best For |
|---|---|---|---|
| HPA | Horizontal Pod | Pod count | Stateless workloads |
| VPA | Vertical Pod | CPU/Memory requests | Stateful, batch jobs |
| KEDA | Event-Driven | Pod count (0-N) | Event-driven, queue processing |
| Karpenter | Cluster | Nodes | Dynamic node provisioning |
| Cluster Autoscaler | Cluster | Nodes | Traditional node scaling |
Horizontal Pod Autoscaler (HPA) Updates 2026
What’s New in HPA
The Horizontal Pod Autoscaler has received significant improvements:
- Container-level metrics - Scale on individual container resources
- Better scale-to-zero integration with KEDA
- Improved stabilization windows - Reduce thrashing
- Enhanced custom metrics support
Basic HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Container-Level Metrics (New in 2026)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: ContainerResource
containerResource:
name: cpu
container: api # Specific container
target:
type: Utilization
averageUtilization: 70
Custom Metrics with Prometheus
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: orders
target:
type: AverageValue
averageValue: 30
HPA Best Practices 2026
- Set appropriate stabilization windows to prevent thrashing
- Use behavior policies to control scale-up/down speed
- Combine with VPA for right-sized pods
- Monitor HPA decisions with kubectl describe hpa
Vertical Pod Autoscaler (VPA) Updates 2026
What’s New in VPA
- Improved recommendation accuracy using ML models
- Better integration with HPA
- Reduced pod disruption for in-place updates
- Multidimensional recommendations
VPA Modes
| Mode | Description | Use Case |
|---|---|---|
Off | Recommendations only | View suggestions |
Initial | Set at pod creation | New pods only |
Auto | Update running pods | Full automation |
VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Auto" # or "Off", "Initial"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
View VPA Recommendations
kubectl describe vpa api-vpa
Output:
Recommendation:
Container Recommendations:
Container Name: api
Lower Bound:
Cpu: 100m
Memory: 256Mi
Target:
Cpu: 500m
Memory: 512Mi
Upper Bound:
Cpu: 2
Memory: 2Gi
VPA + HPA Together
# HPA scales on custom metrics (not CPU/memory)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
---
# VPA controls CPU/memory
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Auto"
KEDA (Kubernetes Event-Driven Autoscaling) 2026
What’s New in KEDA
KEDA graduated as a CNCF project and now includes:
- 80+ scalers for various event sources
- Scale-to-zero capability
- Improved ScaledJob for batch workloads
- Better Prometheus integration
- Paused scaling for maintenance windows
KEDA Architecture
Event Source → KEDA Operator → HPA → Deployment
↓
Metrics Server
Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda -n keda --create-namespace
Scale Based on Kafka Lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer
namespace: production
spec:
scaleTargetRef:
name: kafka-consumer
kind: Deployment
pollingInterval: 15
cooldownPeriod: 300
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 100
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my-group
topic: orders
lagThreshold: "100"
Scale Based on Prometheus Metrics
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-scaler
spec:
scaleTargetRef:
name: api
pollingInterval: 30
cooldownPeriod: 300
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: |
sum(rate(http_requests_total{app="api"}[2m]))
threshold: "100"
Scale Based on AWS SQS
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-consumer
spec:
scaleTargetRef:
name: sqs-consumer
minReplicaCount: 0
maxReplicaCount: 30
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
queueLength: "5"
awsRegion: us-east-1
authenticationRef:
name: aws-credentials
ScaledJob for Batch Processing
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: batch-processor
spec:
jobTargetRef:
template:
spec:
containers:
- name: processor
image: batch-processor:latest
pollingInterval: 30
maxReplicaCount: 10
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/batch
queueLength: "1"
Karpenter 2026 Updates
What’s New in Karpenter
- Multi-cloud support (AWS, Azure)
- Consolidation improvements - Better bin-packing
- Drift detection - Replace outdated nodes
- Spot interruption handling improvements
Karpenter vs Cluster Autoscaler
| Feature | Karpenter | Cluster Autoscaler |
|---|---|---|
| Speed | Seconds | Minutes |
| Node selection | Just-in-time | Pre-defined groups |
| Bin-packing | Intelligent | Basic |
| Spot handling | Native | Limited |
| Multi-cloud | AWS, Azure | All major clouds |
NodePool Configuration
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 1m
EC2NodeClass
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
instanceStorePolicy: RAID0
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: true
Autoscaling Patterns 2026
Pattern 1: Predictive Scaling
Pre-scale based on historical patterns:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 100
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 900 # Scale up fast
periodSeconds: 15
Pattern 2: Cost-Aware Scaling
Prioritize spot instances:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"] # Prefer spot
disruption:
consolidationPolicy: WhenUnderutilized
Pattern 3: Multi-Metric Scaling
Scale on multiple signals:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: External
external:
metric:
name: queue_depth
target:
type: AverageValue
averageValue: 10
Emerging Trends
1. AI-Powered Autoscaling
ML models predict scaling needs before demand spikes.
2. Carbon-Aware Scaling
Scale workloads to regions with clean energy:
# Future: carbon-aware scheduling
spec:
carbonAware:
preferLowCarbonRegions: true
maxCarbonIntensity: 200
3. FinOps Integration
Autoscaling tied to budget constraints:
limits:
cpu: 1000
memory: 1000Gi
monthly-cost: 10000 # Future: cost limits
4. Serverless Kubernetes
More workloads using scale-to-zero with KEDA and Knative.
Best Practices Summary
HPA Best Practices
- Set
stabilizationWindowSecondsto prevent thrashing - Use
behaviorpolicies to control scaling speed - Prefer custom metrics over CPU for business-logic scaling
- Set appropriate
minReplicasfor availability
VPA Best Practices
- Start with
Offmode to observe recommendations - Set
minAllowedandmaxAllowedconstraints - Use
Initialmode for stable workloads - Combine with HPA on different metrics
KEDA Best Practices
- Set appropriate
cooldownPeriodto prevent thrashing - Use
ScaledJobfor batch workloads - Monitor
pollingIntervalfor responsiveness vs API load - Configure
idleReplicaCountfor warm standby
Karpenter Best Practices
- Use
consolidationPolicy: WhenUnderutilizedfor cost savings - Prefer spot instances for fault-tolerant workloads
- Set
limitsto control cluster size - Use diverse instance types for availability
Conclusion
Kubernetes autoscaling in 2026 offers powerful options for right-sizing workloads and infrastructure. Key takeaways:
- HPA - Use for horizontal scaling with custom metrics
- VPA - Use for right-sizing individual pods
- KEDA - Use for event-driven and scale-to-zero
- Karpenter - Use for fast, intelligent node provisioning
- Combine multiple autoscalers for comprehensive scaling
- Monitor and tune for your specific workloads
Master these tools to build efficient, cost-effective Kubernetes clusters.
Related Resources
- Kubernetes News Today 2026
- Kubernetes Cost Optimization Guide 2026
- EKS Architecture Best Practices 2026
- Kubernetes Consulting Services
Need help with Kubernetes autoscaling? Book a free 30-minute consultation with our experts.