If you have ever created an EKS node group and assumed your cluster would automatically scale based on pod demand, you are not alone. It is one of the most common misconceptions in the Kubernetes-on-AWS ecosystem. The reality is that Auto Scaling Groups (ASGs) and the Kubernetes Cluster Autoscaler (CA) serve fundamentally different roles, and understanding the distinction is critical to building a production-ready EKS cluster.
We have configured both Cluster Autoscaler and Auto Scaling Groups across 100+ EKS clusters for clients in healthcare, fintech, and e-commerce. In this post, we break down exactly how they work, how they work together, and when to use each approach—including the modern alternative, Karpenter, that bypasses ASGs entirely.
The #1 Misconception: ASGs Do NOT Auto-Scale Based on Pod Demand
Let us clear this up immediately: an Auto Scaling Group on its own will not scale your EKS nodes in response to Kubernetes pod scheduling needs. When you create an EKS managed node group, AWS creates an associated ASG behind the scenes. That ASG handles instance lifecycle—launching and terminating EC2 instances. But it has zero awareness of Kubernetes pods, resource requests, or scheduling constraints.
Without a Kubernetes-aware autoscaler like the Cluster Autoscaler or Karpenter, your ASG sits at whatever desired count you set and stays there. Pods will go into a Pending state with FailedScheduling events, and nothing in the AWS infrastructure layer will respond to that signal.
This is the key insight: the ASG is the muscle, but it needs a brain that understands Kubernetes. That brain is the Cluster Autoscaler.
Quick Comparison: Cluster Autoscaler vs Auto Scaling Group
Before going deeper, here is a side-by-side comparison:
| Feature | Cluster Autoscaler (CA) | Auto Scaling Group (ASG) |
|---|---|---|
| What it is | Kubernetes controller (runs as a pod) | AWS infrastructure resource |
| Awareness | Kubernetes-aware (pods, resource requests, taints, affinity) | Infrastructure-aware (CPU/memory metrics, instance health) |
| Scaling trigger | Unschedulable pods or underutilized nodes | CloudWatch alarms, target tracking policies, or manual adjustment |
| Scale-up mechanism | Modifies ASG DesiredCapacity | Launches EC2 instances to meet desired count |
| Scale-down mechanism | Cordons, drains nodes, then reduces ASG DesiredCapacity | Terminates instances based on termination policies |
| Pod scheduling | Understands pod affinity, taints, tolerations, resource requests | No concept of pods or Kubernetes scheduling |
| Configuration | Kubernetes Deployment with CLI flags | AWS Console, CLI, CloudFormation, or Terraform |
| Scope | Cluster-level (across multiple ASGs/node groups) | Single ASG |
The critical takeaway: CA and ASG are not alternatives to each other—they are complementary layers. CA makes the scaling decisions; ASG executes them at the infrastructure level.
How Cluster Autoscaler and ASGs Work Together
Understanding the end-to-end scaling workflow removes a lot of confusion. Here is exactly what happens when your cluster needs more capacity:
Scale-Up Flow
- Pod scheduling fails — A new pod is created (via Deployment, Job, etc.) but the Kubernetes scheduler cannot find a node with sufficient CPU, memory, or matching taints/labels.
- CA detects pending pods — The Cluster Autoscaler runs a scan loop (every 10 seconds by default) and identifies pods in
Pendingstate withFailedSchedulingevents. - CA simulates scheduling — CA evaluates each node group to determine which one could accommodate the pending pods. It considers instance types, labels, taints, and resource capacity.
- CA adjusts ASG DesiredCapacity — CA calls the AWS Auto Scaling API to increase the
DesiredCapacityof the selected ASG. - ASG launches instances — The ASG provisions new EC2 instances based on its launch template or launch configuration.
- Nodes join the cluster — The new instances run the EKS bootstrap script, register with the Kubernetes API server, and become
Readynodes. - Pods are scheduled — The Kubernetes scheduler places the pending pods onto the newly available nodes.
This entire process typically takes 2-5 minutes, depending on instance type and AMI caching.
Scale-Down Flow
- CA identifies underutilized nodes — During its scan loop, CA checks if node utilization (based on resource requests, not actual usage) falls below the
scale-down-utilization-threshold(default: 50%). - Cool-down period — The node must remain underutilized for
scale-down-unneeded-time(default: 10 minutes). - CA verifies safe eviction — CA checks for pods with
PodDisruptionBudgets, local storage, or thecluster-autoscaler.kubernetes.io/safe-to-evict: "false"annotation. - CA cordons and drains — The node is cordoned (no new pods scheduled) and existing pods are gracefully evicted.
- CA reduces ASG DesiredCapacity — CA calls the ASG API to decrement the desired count.
- ASG terminates the instance — The EC2 instance is terminated based on the ASG’s termination policy.
Why ASG Scaling Policies Conflict with Cluster Autoscaler
Here is a mistake we see regularly: teams configure both Cluster Autoscaler and ASG scaling policies (target tracking or step scaling) on the same node group. This creates a tug-of-war.
The Conflict
ASG target tracking policies scale based on CloudWatch metrics like average CPU utilization. If you set a target of 60% CPU utilization on your ASG, the ASG will try to add or remove instances to maintain that target—completely independent of what the Cluster Autoscaler is doing.
The result:
- CA scales up the ASG to accommodate pending pods, increasing
DesiredCapacityto 10. - ASG target tracking sees that average CPU across those 10 nodes is only 40%, decides the group is over-provisioned, and scales it back down to 7.
- Pods go Pending again, CA scales back up, and the cycle repeats.
The Fix
Remove all ASG scaling policies when using Cluster Autoscaler. The CA should be the sole entity managing DesiredCapacity. The ASG’s role is reduced to instance provisioning—it should only define MinSize, MaxSize, and let CA control everything in between.
If you need metric-based scaling for non-Kubernetes workloads on the same ASG (which we do not recommend), use separate ASGs for Kubernetes and non-Kubernetes instances.
Deploying Cluster Autoscaler on EKS
Here is a production-ready Cluster Autoscaler deployment manifest with the key configuration flags we recommend:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
priorityClassName: system-cluster-critical
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.0
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --scale-down-utilization-threshold=0.5
- --scale-down-unneeded-time=10m
- --scale-down-delay-after-add=10m
- --scan-interval=10s
- --max-node-provision-time=15m
resources:
requests:
cpu: 100m
memory: 600Mi
limits:
cpu: 100m
memory: 600Mi
env:
- name: AWS_REGION
value: eu-west-1
Key Flags Explained
--node-group-auto-discovery: Discovers ASGs automatically by tag, so you do not need to hardcode ASG names. Tag your ASGs withk8s.io/cluster-autoscaler/enabled=trueandk8s.io/cluster-autoscaler/<cluster-name>=owned.--expander=least-waste: When multiple node groups can accommodate pending pods, choose the one that results in the least wasted resources. The Cluster Autoscaler GitHub repository documents all available expanders.--balance-similar-node-groups: Keeps node counts balanced across node groups with identical scheduling properties—essential for multi-AZ deployments.--scan-interval=10s: How frequently CA checks for pending pods. The default is 10 seconds, but you can increase this to reduce API load (see tuning section below).--scale-down-unneeded-time=10m: A node must be underutilized for 10 minutes before CA considers it for removal.
Required ASG Tags
Your Auto Scaling Groups must have these tags for auto-discovery to work:
{
"Tags": [
{
"Key": "k8s.io/cluster-autoscaler/enabled",
"Value": "true"
},
{
"Key": "k8s.io/cluster-autoscaler/my-cluster",
"Value": "owned"
},
{
"Key": "kubernetes.io/cluster/my-cluster",
"Value": "owned"
}
]
}
For scaling from zero (where no running nodes exist in the ASG), you also need template tags so CA knows what instance types the ASG provides:
{
"Tags": [
{
"Key": "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type",
"Value": "m5.xlarge"
},
{
"Key": "k8s.io/cluster-autoscaler/node-template/resources/cpu",
"Value": "4"
},
{
"Key": "k8s.io/cluster-autoscaler/node-template/resources/memory",
"Value": "16Gi"
}
]
}
EKS Managed Node Groups: AWS-Managed ASGs
EKS Managed Node Groups simplify the ASG layer significantly. When you create a managed node group, AWS:
- Creates and manages the underlying ASG automatically
- Configures the launch template with the correct EKS-optimized AMI
- Handles node draining during updates (rolling updates with configurable surge)
- Tags the ASG for Cluster Autoscaler auto-discovery
- Supports graceful node termination via the node termination handler
This means you get the ASG lifecycle management without having to configure ASGs directly. However, you still need Cluster Autoscaler or Karpenter for pod-aware scaling. The managed node group does not change this fundamental requirement.
For a deeper dive into node group architecture, see our guide on EKS architecture best practices.
Tuning Cluster Autoscaler for Production
Based on the AWS EKS Cluster Autoscaler best practices documentation, here are the tuning parameters that matter most.
Scan Interval Tradeoffs
The --scan-interval flag controls how often CA checks for pending pods. The default is 10 seconds, but since launching a new EC2 instance takes 2+ minutes anyway, a more relaxed interval may be appropriate:
| Scan Interval | API Calls (relative) | Scale-Up Delay (relative) |
|---|---|---|
| 10s (default) | 1x | Baseline |
| 30s | 3x reduction | ~19% slower |
| 60s | 6x reduction | ~38% slower |
For clusters with 500+ nodes, increasing the scan interval to 30-60 seconds significantly reduces AWS API throttling risk with minimal impact on scaling responsiveness.
Node Group Design
The AWS best practices are clear on this:
- Prefer fewer node groups with many nodes over many node groups with few nodes. This has the single biggest impact on CA scalability.
- All nodes in a group must have identical scheduling properties: same labels, taints, and resource profiles (CPU, memory, GPU).
- Use Namespaces for workload isolation instead of separate node groups.
- Define a single ASG spanning multiple Availability Zones rather than one ASG per AZ (unless you need EBS volume affinity).
MixedInstancePolicies
When using MixedInstancePolicies for cost optimization, ensure all instance types have similar CPU, memory, and GPU capacity. CA uses the first instance type in the policy for scheduling simulation. If subsequent types are smaller, pods may fail to schedule after scale-up because the actual instance has less capacity than expected.
Use the EC2 Instance Selector to find compatible instance types:
ec2-instance-selector --memory 16 --vcpus 4 --cpu-architecture x86_64 --gpus 0 -r eu-west-1
This returns instance types with matching resource profiles (e.g., m5.xlarge, m5a.xlarge, m5n.xlarge, m4.xlarge).
Overprovisioning Strategy
Overprovisioning ensures there is always spare capacity to schedule pods immediately, avoiding the 2-5 minute wait for new nodes. The formula from AWS documentation:
overprovisioned_nodes = (average_scale_up_frequency × node_launch_time) + number_of_AZs
Example: If you need a new node every 30 seconds and node launch takes 30 seconds, you need 1 overprovisioned node. Add 3 more if you run across 3 AZs for optimal zone selection with pod anti-affinity.
Overprovisioning is implemented using low-priority “pause” pods that occupy space and get preempted when real workloads arrive:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -1
globalDefault: false
description: "Priority class for overprovisioning pods"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: overprovisioning
template:
metadata:
labels:
app: overprovisioning
spec:
priorityClassName: overprovisioning
containers:
- name: pause
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "2"
memory: 4Gi
When a real pod needs scheduling, Kubernetes preempts the pause pod, which immediately frees up capacity. The pause pod then goes Pending, triggering CA to scale up a new node—but the real pod is already running.
Spot Instance Strategy with ASGs
Spot instances can reduce compute costs by up to 90%, but they require careful ASG configuration. Based on our experience and AWS best practices, follow these rules:
Separate On-Demand and Spot ASGs
Never mix On-Demand and Spot instances in the same ASG. They have fundamentally different scheduling properties:
- Spot nodes should carry a taint (e.g.,
spotInstance=true:PreferNoSchedule) so workloads must explicitly tolerate interruption risk. - On-Demand nodes should run critical, interruption-sensitive workloads.
Use the Priority Expander for Spot Preference
Configure CA to prefer Spot node groups, falling back to On-Demand when Spot capacity is unavailable:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
10:
- .*on-demand.*
50:
- .*spot.*
CA will attempt to scale the Spot node group first. If no Spot capacity is available within --max-node-provision-time (default 15 minutes), it falls back to the On-Demand group.
Maximize Instance Diversity
Use MixedInstancePolicies with 10+ instance types across multiple families to maximize Spot pool access. More instance type diversity means lower interruption rates and better capacity availability.
Karpenter: The Modern Alternative That Bypasses ASGs
Karpenter takes a fundamentally different approach. Instead of managing ASGs, Karpenter provisions EC2 instances directly using the EC2 Fleet API. This eliminates the ASG layer entirely and brings several advantages:
- Faster provisioning — Karpenter launches instances in under 60 seconds (vs. 2-5 minutes with CA + ASG).
- No node group management — You define
NodePoolandEC2NodeClassresources instead of pre-configured ASGs. - Workload-aware instance selection — Karpenter selects the optimal instance type per pod, not per node group.
- Automatic consolidation — Karpenter continuously right-sizes the cluster by replacing underutilized nodes with smaller, better-fitting instances.
Karpenter NodePool Example
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: "1000"
memory: 2000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiSelectorTerms:
- alias: al2023@latest
role: KarpenterNodeRole-my-cluster
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
tags:
Environment: production
With this configuration, Karpenter handles everything: instance type selection, AZ placement, Spot vs. On-Demand decisions, and node lifecycle. No ASGs, no node groups, no launch templates.
When to Use Each Approach
Not every scenario calls for the same solution. Here is our recommendation based on the workloads we have seen:
ASG-Only Scaling (No CA, No Karpenter)
Use this when:
- You are running non-Kubernetes workloads (traditional EC2 applications)
- You have a fixed, predictable number of nodes
- You use ASG scaling policies tied to application-level CloudWatch metrics (e.g., SQS queue depth)
- Kubernetes pod scheduling is not a factor
Cluster Autoscaler + ASG
Use this when:
- You have an existing EKS cluster with well-defined node groups
- Your workloads have predictable instance type requirements
- You need fine-grained control over node group composition
- Your organization has existing Terraform/CloudFormation managing ASGs
- You are running EKS managed node groups and want minimal migration effort
Karpenter (Recommended for New Clusters)
Use this when:
- You are building a new EKS cluster or ready to migrate
- You have diverse workloads with varying resource requirements
- You want faster scaling (sub-60-second node provisioning)
- You want automatic instance type selection and consolidation
- You are comfortable with Karpenter’s NodePool/EC2NodeClass model
For organizations evaluating these options, our guide on Kubernetes cost optimization strategies covers the financial implications of each approach in detail.
Protecting Critical Workloads During Scale-Down
Regardless of which autoscaler you use, protect critical workloads from eviction during scale-down:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
This annotation tells Cluster Autoscaler to never evict this pod’s node during scale-down. Use it for long-running batch jobs, ML training workloads, or stateful applications where interruption is costly.
Also configure PodDisruptionBudgets to ensure minimum availability during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: critical-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: critical-app
Common Mistakes to Avoid
Based on our experience across hundreds of EKS deployments, these are the most frequent errors we encounter:
- Assuming ASGs auto-scale based on pod demand. They do not. You need CA or Karpenter.
- Running ASG scaling policies alongside Cluster Autoscaler. This creates scaling conflicts. Remove ASG policies and let CA control
DesiredCapacity. - Creating too many small node groups. Each additional node group increases CA scan time and complexity. Consolidate where possible.
- Mismatched instance types in MixedInstancePolicies. All instance types must have similar CPU, memory, and GPU. Use the EC2 Instance Selector to verify compatibility.
- Mixing Spot and On-Demand in the same ASG. Use separate ASGs with taints on Spot nodes.
- Not setting resource requests on pods. CA makes scaling decisions based on resource requests, not actual usage. Without requests, CA cannot determine if a node is underutilized.
- Using a CA version that does not match the Kubernetes version. Cross-version compatibility is not tested. Always match the CA minor version to your cluster version.
Teams working with AWS managed services often benefit from our guidance on avoiding these pitfalls early in their EKS journey.
Summary
The relationship between Cluster Autoscaler and Auto Scaling Groups is not either/or—it is a layered architecture where each component has a specific role. The ASG manages EC2 instance lifecycle. The Cluster Autoscaler provides the Kubernetes-aware intelligence that tells the ASG when and how to scale. Without CA (or Karpenter), your ASG is just a static pool of instances that has no idea your pods are stuck in Pending.
For teams building new clusters, Karpenter offers a simpler, faster alternative that eliminates the ASG layer entirely. For existing clusters with well-established ASG infrastructure, Cluster Autoscaler remains a proven, reliable choice—as long as you configure it correctly.
If you are evaluating your EKS autoscaling strategy, our Kubernetes consulting team can help you design the right approach for your workloads, whether that means tuning your existing CA configuration or migrating to Karpenter.
Get Expert Help with EKS Autoscaling
Configuring Cluster Autoscaler and Auto Scaling Groups correctly on EKS requires deep understanding of both Kubernetes scheduling and AWS infrastructure.
Our team provides expert EKS consulting services to help you:
- Configure Cluster Autoscaler and ASGs for optimal scaling on EKS
- Migrate from CA+ASG to Karpenter for faster, more cost-efficient scaling
- Implement Spot instance strategies with proper ASG configuration for up to 90% savings
We have architected and optimized EKS autoscaling for 100+ production clusters.