Engineering

Kubernetes Autoscaling News 2026: HPA, VPA, KEDA, and Karpenter Updates

Engineering Team

Kubernetes autoscaling continues to evolve rapidly in 2026, with significant updates to HPA, VPA, KEDA, and Karpenter. This comprehensive guide covers the latest developments, best practices, and emerging trends in workload and cluster autoscaling.

Quick Overview: Kubernetes Autoscaling in 2026

ToolTypeScalesBest For
HPAHorizontal PodPod countStateless workloads
VPAVertical PodCPU/Memory requestsStateful, batch jobs
KEDAEvent-DrivenPod count (0-N)Event-driven, queue processing
KarpenterClusterNodesDynamic node provisioning
Cluster AutoscalerClusterNodesTraditional node scaling

Horizontal Pod Autoscaler (HPA) Updates 2026

What’s New in HPA

The Horizontal Pod Autoscaler has received significant improvements:

  • Container-level metrics - Scale on individual container resources
  • Better scale-to-zero integration with KEDA
  • Improved stabilization windows - Reduce thrashing
  • Enhanced custom metrics support

Basic HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Container-Level Metrics (New in 2026)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: ContainerResource
      containerResource:
        name: cpu
        container: api  # Specific container
        target:
          type: Utilization
          averageUtilization: 70

Custom Metrics with Prometheus

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100
    - type: External
      external:
        metric:
          name: queue_messages_ready
          selector:
            matchLabels:
              queue: orders
        target:
          type: AverageValue
          averageValue: 30

HPA Best Practices 2026

  1. Set appropriate stabilization windows to prevent thrashing
  2. Use behavior policies to control scale-up/down speed
  3. Combine with VPA for right-sized pods
  4. Monitor HPA decisions with kubectl describe hpa

Vertical Pod Autoscaler (VPA) Updates 2026

What’s New in VPA

  • Improved recommendation accuracy using ML models
  • Better integration with HPA
  • Reduced pod disruption for in-place updates
  • Multidimensional recommendations

VPA Modes

ModeDescriptionUse Case
OffRecommendations onlyView suggestions
InitialSet at pod creationNew pods only
AutoUpdate running podsFull automation

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"  # or "Off", "Initial"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits

View VPA Recommendations

kubectl describe vpa api-vpa

Output:

Recommendation:
  Container Recommendations:
    Container Name: api
    Lower Bound:
      Cpu:     100m
      Memory:  256Mi
    Target:
      Cpu:     500m
      Memory:  512Mi
    Upper Bound:
      Cpu:     2
      Memory:  2Gi

VPA + HPA Together

# HPA scales on custom metrics (not CPU/memory)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100
---
# VPA controls CPU/memory
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"

KEDA (Kubernetes Event-Driven Autoscaling) 2026

What’s New in KEDA

KEDA graduated as a CNCF project and now includes:

  • 80+ scalers for various event sources
  • Scale-to-zero capability
  • Improved ScaledJob for batch workloads
  • Better Prometheus integration
  • Paused scaling for maintenance windows

KEDA Architecture

Event Source → KEDA Operator → HPA → Deployment

  Metrics Server

Install KEDA

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda -n keda --create-namespace

Scale Based on Kafka Lag

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer
    kind: Deployment
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 100
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: my-group
        topic: orders
        lagThreshold: "100"

Scale Based on Prometheus Metrics

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-scaler
spec:
  scaleTargetRef:
    name: api
  pollingInterval: 30
  cooldownPeriod: 300
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_total
        query: |
          sum(rate(http_requests_total{app="api"}[2m]))
        threshold: "100"

Scale Based on AWS SQS

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer
spec:
  scaleTargetRef:
    name: sqs-consumer
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
        queueLength: "5"
        awsRegion: us-east-1
      authenticationRef:
        name: aws-credentials

ScaledJob for Batch Processing

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: batch-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: processor
            image: batch-processor:latest
  pollingInterval: 30
  maxReplicaCount: 10
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/batch
        queueLength: "1"

Karpenter 2026 Updates

What’s New in Karpenter

  • Multi-cloud support (AWS, Azure)
  • Consolidation improvements - Better bin-packing
  • Drift detection - Replace outdated nodes
  • Spot interruption handling improvements

Karpenter vs Cluster Autoscaler

FeatureKarpenterCluster Autoscaler
SpeedSecondsMinutes
Node selectionJust-in-timePre-defined groups
Bin-packingIntelligentBasic
Spot handlingNativeLimited
Multi-cloudAWS, AzureAll major clouds

NodePool Configuration

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 1m

EC2NodeClass

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  instanceStorePolicy: RAID0
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true

Autoscaling Patterns 2026

Pattern 1: Predictive Scaling

Pre-scale based on historical patterns:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 100
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 900  # Scale up fast
          periodSeconds: 15

Pattern 2: Cost-Aware Scaling

Prioritize spot instances:

spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]  # Prefer spot
  disruption:
    consolidationPolicy: WhenUnderutilized

Pattern 3: Multi-Metric Scaling

Scale on multiple signals:

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: External
    external:
      metric:
        name: queue_depth
      target:
        type: AverageValue
        averageValue: 10

1. AI-Powered Autoscaling

ML models predict scaling needs before demand spikes.

2. Carbon-Aware Scaling

Scale workloads to regions with clean energy:

# Future: carbon-aware scheduling
spec:
  carbonAware:
    preferLowCarbonRegions: true
    maxCarbonIntensity: 200

3. FinOps Integration

Autoscaling tied to budget constraints:

limits:
  cpu: 1000
  memory: 1000Gi
  monthly-cost: 10000  # Future: cost limits

4. Serverless Kubernetes

More workloads using scale-to-zero with KEDA and Knative.


Best Practices Summary

HPA Best Practices

  1. Set stabilizationWindowSeconds to prevent thrashing
  2. Use behavior policies to control scaling speed
  3. Prefer custom metrics over CPU for business-logic scaling
  4. Set appropriate minReplicas for availability

VPA Best Practices

  1. Start with Off mode to observe recommendations
  2. Set minAllowed and maxAllowed constraints
  3. Use Initial mode for stable workloads
  4. Combine with HPA on different metrics

KEDA Best Practices

  1. Set appropriate cooldownPeriod to prevent thrashing
  2. Use ScaledJob for batch workloads
  3. Monitor pollingInterval for responsiveness vs API load
  4. Configure idleReplicaCount for warm standby

Karpenter Best Practices

  1. Use consolidationPolicy: WhenUnderutilized for cost savings
  2. Prefer spot instances for fault-tolerant workloads
  3. Set limits to control cluster size
  4. Use diverse instance types for availability

Conclusion

Kubernetes autoscaling in 2026 offers powerful options for right-sizing workloads and infrastructure. Key takeaways:

  • HPA - Use for horizontal scaling with custom metrics
  • VPA - Use for right-sizing individual pods
  • KEDA - Use for event-driven and scale-to-zero
  • Karpenter - Use for fast, intelligent node provisioning
  • Combine multiple autoscalers for comprehensive scaling
  • Monitor and tune for your specific workloads

Master these tools to build efficient, cost-effective Kubernetes clusters.


Need help with Kubernetes autoscaling? Book a free 30-minute consultation with our experts.

Chat with real humans
Chat on WhatsApp