Kubernetes Autoscaling News 2026: HPA, VPA, KEDA, and Karpenter Updates

Kubernetes autoscaling continues to evolve rapidly in 2026, with significant updates to HPA, VPA, KEDA, and Karpenter. This comprehensive guide covers the latest developments, best practices, and emerging trends in workload and cluster autoscaling.

Quick Overview: Kubernetes Autoscaling in 2026

Tool	Type	Scales	Best For
HPA	Horizontal Pod	Pod count	Stateless workloads
VPA	Vertical Pod	CPU/Memory requests	Stateful, batch jobs
KEDA	Event-Driven	Pod count (0-N)	Event-driven, queue processing
Karpenter	Cluster	Nodes	Dynamic node provisioning
Cluster Autoscaler	Cluster	Nodes	Traditional node scaling

Horizontal Pod Autoscaler (HPA) Updates 2026

What’s New in HPA

The Horizontal Pod Autoscaler has received significant improvements:

Container-level metrics - Scale on individual container resources
Better scale-to-zero integration with KEDA
Improved stabilization windows - Reduce thrashing
Enhanced custom metrics support

Basic HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Container-Level Metrics (New in 2026)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: ContainerResource
      containerResource:
        name: cpu
        container: api  # Specific container
        target:
          type: Utilization
          averageUtilization: 70

Custom Metrics with Prometheus

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100
    - type: External
      external:
        metric:
          name: queue_messages_ready
          selector:
            matchLabels:
              queue: orders
        target:
          type: AverageValue
          averageValue: 30

HPA Best Practices 2026

Set appropriate stabilization windows to prevent thrashing
Use behavior policies to control scale-up/down speed
Combine with VPA for right-sized pods
Monitor HPA decisions with kubectl describe hpa

Vertical Pod Autoscaler (VPA) Updates 2026

What’s New in VPA

Improved recommendation accuracy using ML models
Better integration with HPA
Reduced pod disruption for in-place updates
Multidimensional recommendations

VPA Modes

Mode	Description	Use Case
`Off`	Recommendations only	View suggestions
`Initial`	Set at pod creation	New pods only
`Auto`	Update running pods	Full automation

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"  # or "Off", "Initial"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits

View VPA Recommendations

kubectl describe vpa api-vpa

Output:

Recommendation:
  Container Recommendations:
    Container Name: api
    Lower Bound:
      Cpu:     100m
      Memory:  256Mi
    Target:
      Cpu:     500m
      Memory:  512Mi
    Upper Bound:
      Cpu:     2
      Memory:  2Gi

VPA + HPA Together

# HPA scales on custom metrics (not CPU/memory)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100
---
# VPA controls CPU/memory
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"

KEDA (Kubernetes Event-Driven Autoscaling) 2026

What’s New in KEDA

KEDA graduated as a CNCF project and now includes:

80+ scalers for various event sources
Scale-to-zero capability
Improved ScaledJob for batch workloads
Better Prometheus integration
Paused scaling for maintenance windows

KEDA Architecture

Event Source → KEDA Operator → HPA → Deployment
     ↓
  Metrics Server

Install KEDA

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda -n keda --create-namespace

Scale Based on Kafka Lag

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer
    kind: Deployment
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 100
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: my-group
        topic: orders
        lagThreshold: "100"

Scale Based on Prometheus Metrics

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-scaler
spec:
  scaleTargetRef:
    name: api
  pollingInterval: 30
  cooldownPeriod: 300
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_total
        query: |
          sum(rate(http_requests_total{app="api"}[2m]))
        threshold: "100"

Scale Based on AWS SQS

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer
spec:
  scaleTargetRef:
    name: sqs-consumer
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
        queueLength: "5"
        awsRegion: us-east-1
      authenticationRef:
        name: aws-credentials

ScaledJob for Batch Processing

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: batch-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: processor
            image: batch-processor:latest
  pollingInterval: 30
  maxReplicaCount: 10
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/batch
        queueLength: "1"

Karpenter 2026 Updates

What’s New in Karpenter

Multi-cloud support (AWS, Azure)
Consolidation improvements - Better bin-packing
Drift detection - Replace outdated nodes
Spot interruption handling improvements

Karpenter vs Cluster Autoscaler

Feature	Karpenter	Cluster Autoscaler
Speed	Seconds	Minutes
Node selection	Just-in-time	Pre-defined groups
Bin-packing	Intelligent	Basic
Spot handling	Native	Limited
Multi-cloud	AWS, Azure	All major clouds

NodePool Configuration

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 1m

EC2NodeClass

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  instanceStorePolicy: RAID0
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true

Autoscaling Patterns 2026

Pattern 1: Predictive Scaling

Pre-scale based on historical patterns:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 100
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 900  # Scale up fast
          periodSeconds: 15

Pattern 2: Cost-Aware Scaling

Prioritize spot instances:

spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]  # Prefer spot
  disruption:
    consolidationPolicy: WhenUnderutilized

Pattern 3: Multi-Metric Scaling

Scale on multiple signals:

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: External
    external:
      metric:
        name: queue_depth
      target:
        type: AverageValue
        averageValue: 10

Emerging Trends

1. AI-Powered Autoscaling

ML models predict scaling needs before demand spikes.

2. Carbon-Aware Scaling

Scale workloads to regions with clean energy:

# Future: carbon-aware scheduling
spec:
  carbonAware:
    preferLowCarbonRegions: true
    maxCarbonIntensity: 200

3. FinOps Integration

Autoscaling tied to budget constraints:

limits:
  cpu: 1000
  memory: 1000Gi
  monthly-cost: 10000  # Future: cost limits

4. Serverless Kubernetes

More workloads using scale-to-zero with KEDA and Knative.

Best Practices Summary

HPA Best Practices

Set stabilizationWindowSeconds to prevent thrashing
Use behavior policies to control scaling speed
Prefer custom metrics over CPU for business-logic scaling
Set appropriate minReplicas for availability

VPA Best Practices

Start with Off mode to observe recommendations
Set minAllowed and maxAllowed constraints
Use Initial mode for stable workloads
Combine with HPA on different metrics

KEDA Best Practices

Set appropriate cooldownPeriod to prevent thrashing
Use ScaledJob for batch workloads
Monitor pollingInterval for responsiveness vs API load
Configure idleReplicaCount for warm standby

Karpenter Best Practices

Use consolidationPolicy: WhenUnderutilized for cost savings
Prefer spot instances for fault-tolerant workloads
Set limits to control cluster size
Use diverse instance types for availability

Conclusion

Kubernetes autoscaling in 2026 offers powerful options for right-sizing workloads and infrastructure. Key takeaways:

HPA - Use for horizontal scaling with custom metrics
VPA - Use for right-sizing individual pods
KEDA - Use for event-driven and scale-to-zero
Karpenter - Use for fast, intelligent node provisioning
Combine multiple autoscalers for comprehensive scaling
Monitor and tune for your specific workloads

Master these tools to build efficient, cost-effective Kubernetes clusters.

Need help with Kubernetes autoscaling? Book a free 30-minute consultation with our experts.