Karpenter vs Cluster Autoscaler: We Migrated 50+ Clusters (Here's What Won)

Kubernetes cluster autoscaling is no longer optional for production workloads — it is foundational. If your nodes cannot keep pace with pod demand, you face scheduling delays, degraded performance, and unhappy users. If your nodes sit idle, you are haemorrhaging money on compute you never use.

For years, Cluster Autoscaler was the default answer. It works, it is battle-tested, and every major cloud provider supports it. But Karpenter, originally developed by AWS and now part of the Kubernetes SIG Autoscaling project, has changed the conversation. Its groupless architecture, sub-minute scaling, and intelligent consolidation have made it the preferred choice for teams running EKS at scale.

At Tasrie IT Services, we have migrated more than 50 production clusters from Cluster Autoscaler to Karpenter over the past 18 months. This post distils what we learned: the architecture differences that matter, the benchmarks we measured, the cost savings we achieved, and a practical migration playbook you can follow.

Architecture Comparison: ASG-Based vs Groupless

The fundamental difference between the two tools lies in how they provision compute.

Cluster Autoscaler: The Node Group Model

Cluster Autoscaler operates through AWS Auto Scaling Groups (ASGs). Each ASG defines a fixed set of instance types, sizes, and configurations. When CA detects pending pods that cannot be scheduled, it evaluates which existing node group can satisfy the request and instructs the ASG to scale up.

The flow looks like this:

Pending pods detected during periodic scan (every 10+ seconds)
CA evaluates available node groups against pod requirements
Scale-up request sent to the matching ASG
ASG launches an EC2 instance from its pre-defined configuration
Node registers with the cluster and pods are scheduled

This model is well understood and reliable, but it introduces constraints. You must pre-define your node groups, anticipate which instance types you will need, and manage separate ASGs for different workload profiles. Scaling decisions are limited to what the existing node group configurations permit.

Karpenter: The Groupless Model

Karpenter bypasses ASGs entirely. Instead, it calls the EC2 Fleet API directly, evaluating up to 60 compatible instance types per provisioning decision. There are no pre-defined node groups — Karpenter selects the right-sized instance for the current workload in real time.

The flow is fundamentally different:

Pending pod triggers an immediate event (no scan interval)
Karpenter evaluates workload requirements against available instance types
EC2 Fleet API called with Price-Capacity-Optimised allocation strategy
Right-sized instance launched and registered
Pod scheduled within seconds

This groupless approach means Karpenter can mix Spot and On-Demand instances in a single NodePool, select from a broad range of instance families without manual configuration, and respond to demand instantly rather than waiting for a periodic scan cycle. As the AWS EKS Best Practices guide describes it, Karpenter “redraws the picture every scheduling cycle” rather than optimising within the constraints of pre-defined groups.

Scaling Speed Benchmarks: 55 Seconds vs 3-4 Minutes

Speed is where Karpenter pulls decisively ahead. In our migrations, we consistently measured the following:

Metric	Karpenter	Cluster Autoscaler
Node provisioning (cold start)	30-60 seconds	3-5 minutes
CPU-bound pod scheduling (production)	~55 seconds	3-4 minutes
Scaling trigger	Event-driven (immediate)	Time-driven scan (every 10+ seconds)
Spot interruption replacement	Within 2-minute notice window	Slower (ASG-dependent)

These figures align with production benchmarks published by ScaleOps and CloudPilot AI, which report Karpenter bringing CPU-bound pods online in approximately 55 seconds while Cluster Autoscaler needed 3-4 minutes for the same workload.

The speed advantage comes from two architectural choices. First, Karpenter is event-driven: each pending pod immediately triggers a provisioning action rather than waiting for a scan cycle. Second, the direct EC2 Fleet API integration removes the ASG intermediary, eliminating an entire layer of orchestration latency.

For teams running customer-facing workloads with spiky traffic patterns, this difference translates directly into fewer 5xx errors during scaling events. We documented this on one e-commerce client where Karpenter reduced scale-up-related error rates by over 70% during flash sale events.

Cost Savings: Real Case Studies

Cost optimisation is not theoretical with Karpenter — the savings are well documented across organisations of all sizes. Here are the case studies we reference most often with clients.

Salesforce: 1,000+ EKS Clusters

Salesforce migrated from Cluster Autoscaler to Karpenter across their entire fleet of over 1,000 EKS clusters. The results were significant:

Scaling latency dropped from minutes to seconds
80% reduction in operational overhead as automated processes replaced manual node group management
~5% cost savings in FY2026, with a projected further 5-10% reduction in FY2027 as bin-packing and Spot utilisation continue to mature
Node utilisation improved through smarter bin-packing

Salesforce developed custom internal tooling to orchestrate the migration safely, cordoning and draining legacy nodes with full respect for pod disruption budgets (PDBs).

Grover: 80% Spot in Production

Grover increased their Spot instance usage to 80% of production workloads after migrating to Karpenter. The key enabler was Karpenter’s ability to mix Spot and On-Demand instances within a single NodePool — something that was impractical with Cluster Autoscaler’s rigid ASG-based model. Grover reported smoother scale-up operations and better performance during seasonal demand spikes such as Black Friday.

Tinybird: 20% AWS Bill Reduction

Tinybird reduced their AWS bill by 20% (and up to 90% on CI/CD workloads) by combining Karpenter with EC2 Spot instances. Their approach leveraged Karpenter’s intelligent instance selection and consolidation to eliminate the over-provisioning that was endemic with their previous Cluster Autoscaler setup.

Our Own Observations

Across our 50+ migrations, we typically see 20-40% compute cost reduction in the first quarter after migration, driven by three factors:

Better bin-packing — Karpenter selects right-sized instances rather than fitting pods into pre-defined node sizes
Aggressive consolidation — underutilised nodes are replaced with smaller, more efficient instances
Higher Spot adoption — the flexibility to diversify across many instance types reduces Spot interruption risk, enabling higher Spot percentages

For a deeper look at Kubernetes cost strategies, see our guide to Kubernetes cost optimisation and the best tools for managing Kubernetes costs.

Configuration Comparison

One of the most practical differences is how each tool is configured. Karpenter’s configuration model is simpler and more expressive.

Karpenter: NodePool + EC2NodeClass

Karpenter uses two primary resources. The NodePool defines scheduling rules, instance requirements, resource limits, and disruption policies. The EC2NodeClass handles AWS-specific settings such as AMIs, subnets, security groups, and block device mappings.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-workloads
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h
  limits:
    cpu: "1000"
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole-my-cluster"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true

This single configuration replaces what would require multiple ASGs, launch templates, and scaling policies with Cluster Autoscaler. You can learn more about NodePool and EC2NodeClass configuration from the official Karpenter documentation.

Cluster Autoscaler: ASGs + Annotations

Cluster Autoscaler relies on pre-defined Auto Scaling Groups, each with a fixed launch template specifying instance types, AMIs, and network configuration. Tuning behaviour requires command-line flags on the CA deployment:

# Cluster Autoscaler deployment flags
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--scale-down-utilization-threshold=0.5
--skip-nodes-with-local-storage=false
--expander=least-waste

Each unique instance type or capacity configuration requires its own ASG. For a cluster supporting web workloads, batch processing, and GPU jobs, you might end up managing six or more ASGs with their own scaling policies. Karpenter achieves the same with two or three NodePool definitions.

Advanced Karpenter Features

Beyond the core architectural advantages, Karpenter provides several advanced capabilities that have no direct equivalent in Cluster Autoscaler.

Drift Detection and Automated Node Upgrades

When you update your EKS control plane version or change the AMI in your EC2NodeClass, Karpenter automatically detects the drift and marks affected nodes for replacement. Drifted nodes are gradually replaced through a rolling deployment, without manual intervention.

This eliminates one of the most tedious operational tasks in Kubernetes: upgrading worker nodes. With Cluster Autoscaler, node upgrades typically require manual intervention or custom automation to cordon, drain, and replace nodes across multiple ASGs.

Disruption Budgets

Karpenter’s disruption budgets give you fine-grained control over when and how nodes can be disrupted. You can limit disruptions by percentage or absolute number, restrict disruptions to specific time windows using cron schedules, and scope budgets to specific disruption reasons such as Drifted, Underutilized, or Empty.

disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 1m
  budgets:
    - nodes: "10%"
    - nodes: "0"
      schedule: "0 9 * * 1-5"
      duration: 8h
      reasons:
        - Underutilized

This configuration allows consolidation of empty and underutilised nodes at all times, but restricts underutilisation-based disruptions during business hours (Monday to Friday, 9am to 5pm). This level of control is essential for production workloads where availability SLOs must be maintained.

Spot-to-Spot Consolidation

One of Karpenter’s most impactful cost features is Spot-to-Spot consolidation. When enabled via the SpotToSpotConsolidation feature flag, Karpenter can replace existing Spot instances with cheaper or more appropriately sized Spot alternatives, continuously optimising for cost without sacrificing capacity. Combined with the Price-Capacity-Optimised allocation strategy, this ensures you are always running on the most cost-effective Spot capacity available.

This is particularly valuable because EC2 Spot pricing fluctuates throughout the day. An m5.xlarge Spot instance that was the cheapest option at 2am may no longer be optimal at 2pm. Karpenter detects these pricing shifts and proactively consolidates onto cheaper capacity, compounding savings over time. Cluster Autoscaler has no equivalent capability — once an ASG launches a Spot instance, it remains until the workload scales down or the instance is interrupted.

Karpenter’s CNCF Journey and Current Status

Understanding Karpenter’s project maturity is important context for adoption decisions. AWS originally created Karpenter in 2021 as an open-source project. In 2023, AWS donated the vendor-neutral core to the CNCF through the Kubernetes SIG Autoscaling group, separating the core scheduling logic from the AWS-specific provider.

The project now lives in two repositories: kubernetes-sigs/karpenter for the vendor-neutral core, and aws/karpenter-provider-aws for the AWS-specific implementation. Karpenter v1.0.0 reached general availability in August 2024, with stable APIs that guarantee backward compatibility across 1.x releases.

Key milestones in the v1.0 release include the ability to specify disruption reasons for budgets, a forceful disruption mode for balancing availability against security patching, and expanded consolidateAfter configuration. The kubelet configuration was moved to the EC2NodeClass API, and NodeClaims became immutable after initial launch.

As of February 2026, Karpenter sits within the Kubernetes SIG Autoscaling umbrella but has not yet achieved formal CNCF graduated status. It continues to receive frequent releases, with v1.5 (July 2025) introducing faster bin-packing, new disruption metrics, and “emptiness-first” consolidation for more aggressive idle node recycling.

Multi-Cloud Support: Beyond AWS

Karpenter is no longer AWS-only. The vendor-neutral core, maintained under kubernetes-sigs/karpenter, enables cloud-specific providers to implement the same provisioning model on their own infrastructure.

Cloud Provider	Implementation	Status	NodeClass Resource
AWS (EKS)	karpenter-provider-aws	Production-ready (v1.0+)	`EC2NodeClass`
Azure (AKS)	Node Auto Provisioning (NAP)	GA as managed addon	`AKSNodeClass`
GCP (GKE)	karpenter-provider-gcp	Alpha/Preview	`GCPNodeClass`

EKS Auto Mode

EKS Auto Mode, generally available since December 2024, runs Karpenter as a managed, off-cluster component. You do not need to install, scale, or upgrade Karpenter yourself. It includes built-in General Purpose and GPU-Optimised NodePools, along with managed AWS Load Balancer Controller and EBS CSI Driver. For teams that want Karpenter’s benefits without the operational overhead of managing the controller, EKS Auto Mode is the simplest path.

AKS Node Auto Provisioning

Azure’s Node Auto Provisioning (NAP) uses Karpenter as a managed addon within AKS. It is the recommended approach for most AKS users, offering improved scale-up speed, automatic maintenance window integration, and Azure-specific configuration through the AKSNodeClass resource.

For a broader comparison of managed Kubernetes services, see our EKS vs AKS vs GKE comparison guide.

When Cluster Autoscaler Still Wins

Karpenter is not the right choice for every scenario. There are legitimate cases where Cluster Autoscaler remains the better option.

GPU and Machine Learning Workloads

For long-running GPU-based ML training jobs, Cluster Autoscaler’s explicit node group reservation can keep a small pool of GPU nodes (p4d, g5 families) running continuously. Karpenter, by default, will spin down idle GPU instances through consolidation, which can cause unnecessary pod rescheduling and resource churn for workloads that take hours or days to complete. While you can configure Karpenter to avoid this with do-not-disrupt annotations, Cluster Autoscaler’s dedicated GPU node groups provide a more straightforward solution for this specific use case.

Multi-Cloud and Hybrid Environments

If you are running Kubernetes across AWS, Azure, GCP, and on-premises infrastructure with a single unified autoscaling strategy, Cluster Autoscaler remains the more practical choice. It supports all major cloud providers and on-premises environments through a consistent interface. While Karpenter’s multi-cloud support is expanding, each cloud provider’s implementation is at a different maturity level, and there is no interoperability between them.

Simple, Stable Clusters

For small clusters with predictable, steady workloads that rarely scale, the additional complexity of migrating to Karpenter may not justify the benefits. Cluster Autoscaler is simple, well-documented, and continues to receive active maintenance from SIG Autoscaling with regular releases well into 2026.

Regulated Environments with Strict Change Controls

Some organisations in healthcare, finance, and government operate under strict change management policies that require every infrastructure component to be individually audited and approved. Cluster Autoscaler’s explicit, declarative ASG model can be easier to audit than Karpenter’s dynamic instance selection. That said, Karpenter’s NodePool constraints and disruption budgets do provide sufficient controls for most compliance frameworks — it simply requires a different documentation approach.

Migration Guide: Cluster Autoscaler to Karpenter

Based on our experience and the patterns established by Salesforce’s enterprise-scale migration, we recommend a phased approach. The official Karpenter migration guide provides the canonical reference; what follows is our operational playbook.

Phase 1: Preparation (1-2 Weeks)

Before touching your autoscaler, prepare your workloads:

Add health checks (liveness and readiness probes) to all deployments
Ensure multiple replicas with appropriate HPA configuration — see our Kubernetes autoscaling overview for current best practices
Apply resource right-sizing based on actual usage data
Configure PodDisruptionBudgets to protect critical workloads during node transitions

Phase 2: IAM and Infrastructure Setup (1-2 Days)

Create the required IAM roles:

Karpenter node role: Attach AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryPullOnly, and AmazonSSMManagedInstanceCore
Karpenter controller role: Configure using IAM Roles for Service Accounts (IRSA) with an OIDC endpoint

If you are managing your EKS infrastructure with Terraform, our Terraform EKS module guide covers the IAM setup in detail.

Phase 3: Install and Configure Karpenter (1 Day)

Deploy Karpenter using the official Helm chart:

helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.1.0" \
  --namespace kube-system \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi

Then apply your NodePool and EC2NodeClass configurations. Start conservatively — mirror your existing node group configuration before optimising.

Phase 4: Parallel Running and Validation (1-2 Weeks)

Run both autoscalers simultaneously:

Deploy Karpenter alongside the existing Cluster Autoscaler
Taint Karpenter-provisioned nodes initially to restrict which workloads land on them
Gradually migrate workloads by removing taints and cordoning CA-managed nodes
Monitor scheduling latency, pod availability, and node utilisation throughout

Phase 5: Cutover and Decommission (1-2 Days)

Once validation is complete:

Scale the Cluster Autoscaler deployment to zero replicas
Cordon and drain remaining CA-managed nodes (Karpenter will provision replacements)
Scale ASG node groups to the minimum required for cluster-critical workloads
Remove the Cluster Autoscaler deployment and associated ASG configurations

Phase 6: Optimise (Ongoing)

After migration, tune your configuration:

Enable WhenEmptyOrUnderutilized consolidation policy
Configure disruption budgets for your availability requirements
Experiment with Spot instance percentages, starting conservative and increasing
Monitor with Prometheus and Grafana using Karpenter’s built-in metrics

Decision Framework: Which Tool Should You Choose?

Use this comparison to guide your decision:

Criterion	Karpenter	Cluster Autoscaler	Winner
Scaling speed	30-60 seconds	3-5 minutes	Karpenter
Instance type flexibility	Up to 60 types per decision	Fixed per ASG	Karpenter
Spot instance handling	Native, multi-type diversification	Per-ASG configuration	Karpenter
Consolidation	Holistic, cluster-wide	Node-by-node, limited	Karpenter
Drift detection	Built-in, automated	Not available	Karpenter
Configuration complexity	2 CRDs (NodePool + NodeClass)	Multiple ASGs + flags	Karpenter
Multi-cloud support	AWS (GA), Azure (GA), GCP (alpha)	All providers + on-prem	Cluster Autoscaler
GPU workload stability	Requires tuning	Native node group reservation	Cluster Autoscaler
Maturity	v1.0+ (GA since August 2024)	10+ years in production	Cluster Autoscaler
Managed options	EKS Auto Mode, AKS NAP	Built into all managed K8s	Tie
Cost savings potential	20-40% typical	Baseline	Karpenter

Choose Karpenter if you run EKS (or AKS with NAP) with variable workloads, want sub-minute scaling, need cost optimisation through Spot and consolidation, and are willing to invest in a short migration effort.

Choose Cluster Autoscaler if you run multi-cloud or hybrid Kubernetes, have stable GPU/ML workloads requiring persistent node pools, or need a single autoscaling tool across diverse environments.

For most EKS teams in 2026, Karpenter is the clear choice. The architecture is more efficient, the scaling is faster, the cost savings are real, and the operational overhead is lower once you are past the initial migration.

Ready to Migrate Your EKS Clusters to Karpenter?

Migrating from Cluster Autoscaler to Karpenter is one of the highest-impact improvements you can make to your EKS infrastructure. But the migration requires careful planning, IAM configuration, workload preparation, and validation to avoid disruption.

Our team provides comprehensive Amazon EKS consulting services to help you:

Assess your current autoscaling setup and identify cost savings opportunities
Design and implement Karpenter configurations tailored to your workload profiles
Execute zero-downtime migrations using our proven phased playbook
Optimise Spot instance strategies to maximise savings while maintaining availability

We have migrated clusters ranging from 10-node development environments to 500-node production platforms, consistently delivering 20-40% compute cost reductions.

For broader Kubernetes architecture guidance beyond EKS, explore our Kubernetes consulting services.

Discuss your EKS autoscaling strategy with our team