Cloud

EKS Migration: We Moved 100+ Workloads - Here's What Broke

We've run 100+ EKS migrations - ECS, on-prem, self-managed K8s, and version upgrades. Here's the real playbook, the gotchas, and what actually breaks in production.

Tasrie IT Services
16 min read

Most “EKS migration guide” posts you’ll find are written by someone who did it once, on a side project, with three pods. We’ve run more than 100 EKS migrations in production - some clean, some painful, a few we’d do very differently if we could go back. This is the post we wish existed when we started.

We’ll cover the four migration scenarios people actually mean when they say “EKS migration”, the order you should do things in, and the specific places we’ve seen teams burn weeks of engineering time. There’s also a real cost note at the end, because every other guide skips it.

What “EKS migration” actually means

When someone says “we’re doing an EKS migration”, they usually mean one of four very different projects:

  1. ECS to EKS - moving container workloads from Amazon ECS (Fargate or EC2) onto Kubernetes
  2. Self-managed or on-prem Kubernetes to EKS - moving from kubeadm, kops, OpenShift, or bare-metal K8s
  3. Another cloud’s Kubernetes to EKS - GKE, AKS, or DigitalOcean clusters moving to AWS
  4. An existing EKS cluster, modernized - upgrading versions, switching to AL2023 nodes, adopting Karpenter or EKS Auto Mode

The tools, risks, and timeline are different for each. Before you write a single Terraform file, agree internally which one you’re actually doing. We’ve watched teams plan for scenario 1 and discover halfway through that they were really doing scenario 4 in disguise.

A quick decision table:

If you’re moving from…Your real project is…Hardest part
ECS Fargate / EC2Workload migrationTranslating task definitions to manifests, IAM rework
On-prem K8sWorkload + control plane migrationNetworking (VPC, DNS), storage classes, secrets
GKE / AKSWorkload migration + cloud rewiringIAM model, ingress, observability stack
Older EKSVersion + data plane modernizationAPI deprecations, Karpenter cutover, AMI changes

The rest of this post is structured the same way. Skip to the section that matches you.


Scenario 1: ECS to EKS

This is the most common one we see. Teams hit ECS limits around autoscaling flexibility, multi-tenant workloads, GitOps tooling, or hiring (Kubernetes engineers are easier to find than ECS ones). Before you commit, it’s worth reading our ECS vs Kubernetes comparison - sometimes the answer is “stay on ECS and fix the actual problem”.

If you’ve decided to move, here’s the order that works.

Step 1: Set up the target EKS cluster in the same VPC

This is non-negotiable. Don’t put EKS in a fresh VPC with the idea of doing a “clean migration”. You will want service-to-service traffic between ECS and EKS during the cutover, and you’ll want the same security groups, route tables, and private endpoints. Same VPC, new subnets if you need them.

The architecture choices you make here matter for years. We’ve written up the patterns we use in EKS architecture best practices - read that before you terraform apply.

Step 2: Translate ECS task definitions to Kubernetes manifests

ECS task definitions and Kubernetes manifests look superficially similar. They are not. Things that bite people:

  • essential: true containers in ECS map to pod-level restart behavior in Kubernetes, not to individual container restartPolicy. If your task had a sidecar that should kill the whole task on exit, you need a different pattern (init containers, shared termination, or just rethink the dependency).
  • ECS service discovery via Cloud Map is not the same as Kubernetes Services. Don’t try to keep Cloud Map running for EKS pods. Use Kubernetes Services with an internal ALB or use Envoy Gateway for ingress.
  • ECS environment variables from SSM Parameter Store need to be re-architected. We strongly recommend moving secrets to AWS Secrets Manager and pulling them into pods with the External Secrets Operator. See Kubernetes secrets in 2026 for the pattern.
  • Task IAM roles become IRSA (IAM Roles for Service Accounts) or, on newer clusters, EKS Pod Identity. Pod Identity is the better choice in 2026 - it doesn’t require an OIDC provider per cluster and the audit story is cleaner.

If you have Docker Compose files lying around from local development, our Docker Compose to Kubernetes guide walks through the manifest translation more thoroughly.

Step 3: Install the controllers and CSI drivers you’ll need

Before you deploy a single app pod, install:

  • AWS Load Balancer Controller - for ALB and NLB ingresses
  • EBS CSI driver (managed add-on) - for persistent volumes
  • EFS CSI driver if you have shared filesystem needs
  • External Secrets Operator - to pull secrets from Secrets Manager
  • External DNS - to manage Route 53 records from Kubernetes objects
  • Karpenter or stick with Managed Node Groups (more on this below)

We deploy all of these through Helm and pin versions. Do not run them off latest. We’ve been bitten by Karpenter and AWS Load Balancer Controller minor releases that changed CRD schemas.

Step 4: Migrate state before traffic

Stateless apps are easy. State is what makes ECS-to-EKS hard. Some patterns we use:

  • RDS, ElastiCache, S3 - no migration needed. Same VPC, same endpoints, EKS pods connect the same way ECS tasks did. Just make sure your security groups allow EKS pod CIDR ranges.
  • EBS volumes attached to ECS tasks - these don’t move directly. If the data lives on the volume, snapshot it and restore into a new EBS volume that the EBS CSI driver provisions for the pod. The PVC has to match the AZ of the volume.
  • EFS - this is the easy case. Same EFS file system, mount it from EKS via the EFS CSI driver. Done.

Step 5: Cutover with weighted DNS

Don’t do a big-bang cutover. Use Route 53 weighted records or ALB weighted target groups to send 5%, then 25%, then 50%, then 100% of traffic to EKS. We typically run this over 3-7 days for production services. Keep the ECS service warm and scaled the whole time, in case you need to roll back.

A rollback isn’t just “shift traffic back” - it’s “shift traffic back AND make sure state changes that happened on EKS don’t break things on ECS”. Plan that scenario explicitly.

What we’ve seen go wrong

  • IRSA setup forgotten until pods crash. Teams build the cluster, deploy apps, and then realize the apps can’t talk to S3 because there’s no service account binding. Set up IRSA or Pod Identity in the bootstrap, not after.
  • Security group sprawl. ECS task ENIs and EKS pod ENIs both end up in the VPC. We’ve seen security group rule counts hit the AWS limit (60 per ENI by default) because both worlds were running in parallel.
  • CloudWatch Container Insights bills. If you turn on Container Insights for EKS without changing the log retention, your CloudWatch bill can grow 5-10x compared to ECS. Set retention to 7-14 days and ship logs out if you need longer.

Scenario 2: Self-managed or on-prem Kubernetes to EKS

This is the scenario AWS marketing materials make sound easy (“EKS is upstream Kubernetes!”). It’s true, but the migration is mostly about the environment around Kubernetes, not Kubernetes itself.

What actually changes

  • IAM model. On-prem you probably had service accounts and maybe Vault. On AWS, you’ll wire up IRSA or Pod Identity. Every workload that talks to AWS resources needs new bindings.
  • Storage classes. Your CSI drivers change. Anything that was on Ceph, vSphere, or local-path needs to move to EBS, EFS, or FSx.
  • Networking. Calico, Cilium, or whatever CNI you ran is replaced by AWS VPC CNI by default (you can run Cilium on top, and we often do for network policy). Pod IPs become VPC IPs - this matters for IP exhaustion planning. Use prefix delegation or custom networking if you’re tight on IP space.
  • Ingress. Nginx Ingress or HAProxy will still work, but the right answer on AWS is usually AWS Load Balancer Controller with ALB or NLB. Cheaper, integrated with WAF and ACM, less to operate.
  • Observability. If you were running Prometheus, Grafana, Loki, and Jaeger on-prem, that all still works on EKS. Don’t feel pressured to move to CloudWatch + X-Ray. We typically keep the open-source stack and just put it on EKS.

The migration pattern

For self-managed to EKS, we use GitOps-driven parallel deployment rather than velero-style backup-restore. The flow:

  1. Stand up the EKS cluster, install controllers and CSI drivers
  2. Get your GitOps tool (ArgoCD or Flux) deploying to both clusters simultaneously
  3. Re-create namespaces, RBAC, and workload manifests on EKS
  4. Migrate persistent data (this is the hard part - snapshot, restore, or replication depending on the database)
  5. Shift traffic via DNS or load balancer
  6. Decommission the old cluster

velero is great for disaster recovery within the same environment. We don’t recommend it for cross-environment migration because PVs, storage classes, IAM, and networking all change. You’ll spend more time fixing the restored manifests than you would re-applying them through GitOps.

For more on the broader cloud migration playbook, our 6Rs cloud migration framework covers the strategic decision of replatform vs rehost vs refactor.


Scenario 3: GKE or AKS to EKS

Less common, but we see it when companies consolidate clouds or when AWS commits are driving the conversation. If you’re still evaluating which managed Kubernetes to land on, our EKS vs AKS vs GKE comparison covers the operational tradeoffs.

The pattern looks like scenario 2 (self-managed to EKS) but with three additional headaches:

  1. Identity rewrite. Workload Identity (GKE) and Azure AD Pod Identity / Workload Identity (AKS) both translate to IRSA or Pod Identity on EKS, but the per-workload IAM permissions usually need to be re-derived from scratch. Budget a week for an engineer to map each service account.
  2. Network egress patterns. If your apps talk to other GCP or Azure services (Cloud SQL, Cosmos DB, Pub/Sub), the cross-cloud egress charges add up fast. Either move those services to AWS too, or use VPC peering / Cloud Interconnect and accept the egress bill.
  3. Ingress and DNS. GKE Ingress and AKS Application Gateway behaviors don’t map 1:1 to AWS ALB. Path-based routing rules, header manipulation, and TLS termination all need re-testing.

Cross-cloud migrations also have a hidden cost: the team has to learn AWS while doing the work. We typically embed at least one AWS-experienced engineer for the duration, even if it’s just for office hours.


Scenario 4: Modernizing an existing EKS cluster

If you already have EKS but you’re on an old version, running AL2 nodes, or still using Cluster Autoscaler, this is the migration you actually need.

Version upgrades

EKS supports running 14 months back from the latest version (extended support adds another 12 months but at significant cost). At the time of writing in 2026, anything older than v1.30 is in extended support territory and you’re paying a premium for the privilege.

The upgrade order matters:

  1. Run kubent or Pluto against your cluster to find deprecated APIs before you upgrade
  2. Upgrade managed add-ons (VPC CNI, CoreDNS, kube-proxy) - one minor version at a time
  3. Upgrade the control plane - EKS only lets you go one minor version at a time
  4. Drain and replace worker nodes with the new version - this is where you actually catch problems

Never skip minor versions. EKS forces you to do them sequentially. We’ve seen people try to script their way around it and the only thing they create is an outage.

Migrating to AL2023 nodes

Amazon Linux 2 is end-of-life in November 2026. If you’re still on AL2 AMIs, you need to be on AL2023 within the next few months. The migration is mostly painless if you use Managed Node Groups - just change the AMI type and let EKS roll the nodes. The places we’ve seen it break:

  • Userdata scripts that assume cloud-init paths from AL2 won’t work the same way on AL2023
  • systemd unit names changed for some services
  • Custom kubelet config that was baked into AL2 AMIs needs to be re-applied via the new Bottlerocket-style config API

If you’re running Bottlerocket already, you’re in a good spot - just keep up with version pinning.

Switching from Cluster Autoscaler to Karpenter

If you’re still on Cluster Autoscaler vs Karpenter is worth reading first. The short version: Karpenter is faster, simpler, and gives you better bin packing. The migration is:

  1. Install Karpenter alongside Cluster Autoscaler (they don’t conflict if configured carefully)
  2. Create NodePools that match your existing taints and labels
  3. Cordon and drain Cluster Autoscaler-managed nodes one at a time
  4. Let Karpenter provision replacements
  5. Remove Cluster Autoscaler

We typically run this over a week for a production cluster. Karpenter’s instance selection is broader than Cluster Autoscaler’s, which is part of why it saves money - but it also means you’ll see instance types in your fleet you didn’t expect. Make sure your monitoring and security tools handle that.

EKS Auto Mode

EKS Auto Mode (announced in late 2024, mature by 2026) is AWS’s response to GKE Autopilot. It manages nodes, Karpenter, networking, and storage controllers for you. If you’re starting fresh, it’s worth considering. If you have an existing cluster with custom CNI config, custom networking, or specific compliance requirements, Auto Mode is harder to retrofit and you may not want it. We’ve seen the best results on greenfield clusters where the team wants to focus on apps, not infrastructure.


The pre-migration checklist nobody writes

Regardless of which scenario you’re in, these are the things we make every team check before they touch production:

  • IAM and IRSA / Pod Identity strategy documented. Which workload talks to which AWS service, with which permissions?
  • VPC IP capacity confirmed. Pod IPs come from the VPC. Do the math: nodes x pods-per-node + headroom. Plan for prefix delegation if you’re tight.
  • Storage class mapping done. Every existing PVC needs a target storage class on EKS, and every storage class needs an AZ strategy.
  • Ingress / load balancer strategy decided. ALB vs NLB vs custom, with WAF, ACM, and Route 53 plans.
  • Observability stack decided. Prometheus + Grafana, CloudWatch Container Insights, Datadog, or something else. Pick before migration, not after.
  • Cost baseline measured. How much are you spending today? What’s the projected EKS cost? (More on this below.)
  • Rollback plan written down. What does a rollback look like 24 hours into the migration? 7 days in? After DNS is fully cut over?
  • Communication plan. Who gets paged on the day of cutover? Who approves the go/no-go?

If you’ve got broader cloud migration planning to do, our enterprise cloud migration checklist has the wider scope.


The cost reality

Other guides don’t talk about this, so we will.

EKS has a control plane fee: $0.10 per hour per cluster, or about $73 per month. Extended support adds $0.50/hour. This is small if you have one cluster and a lot of workloads. It hurts if you’ve sprawled into 30 clusters because every team wanted their own.

The bigger cost drivers we see post-migration:

  1. CloudWatch Container Insights. Can easily double your CloudWatch bill if logs and metrics are unbounded. Set retention. Consider shipping to S3 + Athena for cheap long-term storage.
  2. NAT Gateway data processing. Pods pulling images from public registries through a NAT Gateway is a sneaky bill. Use ECR pull-through cache or a registry proxy.
  3. Cross-AZ traffic. EKS pods talking across AZs are billed at $0.01/GB each way. Topology-aware routing helps. So does keeping chatty services in the same zone.
  4. Karpenter savings. Properly tuned Karpenter typically saves 20-40% on compute vs MNG with Cluster Autoscaler. We’ve seen 60% in extreme cases (lots of bursty workloads, mixed instance types).

A typical migration we run pays back its infrastructure cost in 3-6 months through Karpenter and right-sizing, assuming the team was on oversized EC2 or ECS Fargate before.


Timeline expectations

These are real numbers from real engagements:

Migration typeRealistic timelineWhere the time goes
ECS to EKS, <20 services6-10 weeksManifest translation, IAM rewrite, cutover
ECS to EKS, 50+ services4-6 monthsPer-service migration, testing, parallel running
On-prem to EKS, single cluster3-5 monthsNetworking, storage, GitOps setup
GKE/AKS to EKS4-8 monthsIdentity rewrite, cross-cloud egress, observability
EKS version upgrade (one minor version)2-3 weeksAdd-on prep, controlled rollout, validation
Cluster Autoscaler to Karpenter1-3 weeksNodePool setup, gradual cutover

Beware anyone who tells you their EKS migration took two weeks. Either it was tiny, or they’re about to find a problem in production.

For more on what tends to go wrong across cloud migrations broadly, common cloud migration challenges covers the patterns.


FAQ

What does EKS stand for?

EKS stands for Amazon Elastic Kubernetes Service. It’s AWS’s managed Kubernetes offering, where AWS runs the control plane and you manage (or let them manage, with Auto Mode) the worker nodes.

Why migrate from ECS to EKS?

The most common reasons we hear: more flexible autoscaling, a richer ecosystem (Helm charts, operators, GitOps tools), easier hiring (Kubernetes skills are more portable than ECS), and multi-cloud or hybrid strategy where Kubernetes is the common runtime. If your team is happy on ECS and you’re not hitting walls, you don’t have to migrate. ECS is simpler to operate for small footprints.

What is Kubernetes migration?

Kubernetes migration usually means one of: moving workloads onto Kubernetes for the first time (from VMs, ECS, or PaaS), moving between Kubernetes distributions (e.g., OpenShift to EKS), or upgrading a Kubernetes cluster to a newer version. The tooling and risk are different for each.

Why move from EC2 to EKS?

EKS gives you declarative deployments, automated rollouts and rollbacks, built-in service discovery, autoscaling that responds to pod demand (not just CPU), and a huge ecosystem of operators and controllers. The tradeoff is operational complexity - you’re now running a control plane (managed) plus all the controllers and add-ons. For a small fleet, plain EC2 with Auto Scaling Groups is still simpler.

Can I migrate from EKS to AKS or GKE later?

Yes. Workloads written as standard Kubernetes manifests are portable. The non-portable parts are usually: cloud-specific IAM, ingress controllers, storage classes, and observability integrations. If portability matters to you, keep those layers thin (e.g., use the standard Ingress and Gateway APIs, not ALB-specific annotations everywhere).

Should I use EKS Auto Mode?

If you’re starting a new cluster and don’t have strict compliance or custom networking needs, yes. If you have an existing cluster with custom CNI, Calico/Cilium policies, or specific node-level controls, Auto Mode is harder to retrofit and the operational savings are smaller. Pilot it on a non-production cluster first.

How long does an EKS migration take?

For a typical ECS-to-EKS migration of 20-30 services, plan 6-10 weeks. For self-managed Kubernetes to EKS at similar scale, 3-5 months. Version upgrades within EKS are 2-3 weeks per minor version. The biggest variable is how much state you’re moving and how many teams need to be coordinated.

Can I run EKS and ECS at the same time during migration?

Yes, and you should. Run them in the same VPC, share data stores (RDS, ElastiCache, S3), and shift traffic gradually with weighted DNS or ALB target groups. This gives you a real rollback path.


Ready to plan your EKS migration?

EKS migrations look straightforward on paper and get messy in practice. The teams that succeed are the ones that pick the right migration scenario, plan IAM and networking first, and move state carefully before they move traffic.

Our team provides hands-on EKS consulting services to help you:

  • Plan the right migration path based on your source environment, workload state, and team capacity
  • Build the EKS landing zone with IRSA / Pod Identity, VPC CNI tuning, and the controllers you’ll need on day one
  • Execute the cutover with weighted traffic shifts, rollback rehearsals, and zero-downtime patterns for stateful workloads
  • Modernize after migration with Karpenter, EKS Auto Mode, and cost-aware right-sizing

We’ve run more than 100 EKS migrations across ECS, on-prem, and other clouds. We know which problems happen at week two and which ones don’t show up until month six.

Talk to our EKS migration team →

T

Tasrie IT Services

Published on June 3, 2026

Continue exploring these related topics

Ready to get started?

Need help with Amazon EKS?

We manage 100+ EKS clusters in production. Let us handle yours.

Get started
Chat with real humans
Chat on WhatsApp