Cloud

VMware to AWS EC2 Migration: The 5-Phase Playbook (2026)

How to migrate VMware VMs to AWS EC2 in 5 phases: networking, app discovery, snapshot to AMI, choosing MGN vs DMS vs VM Import, and go-live validation.

Tasrie IT Services
19 min read

Most “VMware to AWS” guides jump straight to tooling. They tell you to run MGN or VM Import without explaining why, in what order, or what has to be in place before you press the button. This post walks through the five phases of an actual VMware-to-EC2 migration in the order they happen, with the choices that come up at each phase and the parts that break if you skip them.

If you’re at the strategic decision stage and not sure whether EC2 is even the right target, the VMware migration off Broadcom playbook covers the four real paths (AVS, EVS, native AWS, native Azure) and when each one wins. This post assumes you’ve already picked native AWS as the destination and now need to execute.

The five phases

PhaseWhat happensTypical duration
1. Landing zoneVPCs, subnets, connectivity, identity, baseline security4-8 weeks
2. Discovery and dependency mappingInventory, app dependencies, wave planning2-4 weeks
3. Migration executionVMs moved to EC2 via MGN, VM Import, or per-workload tools3-12 months (wave-based)
4. Database and data migrationDBs moved via DMS or native replication; file shares via DataSyncRuns alongside phase 3
5. Validation and go-liveQA, business sign-off, cutover, post-cutover monitoring2-4 weeks per wave

These phases overlap. Phase 1 starts first and never fully ends. Phases 3, 4, and 5 run in parallel for different application waves.


Phase 1: Set up the AWS landing zone

The biggest mistake at this phase is treating it as “make a VPC and start moving VMs”. The landing zone is the part that has to outlive every workload and every team rotation. Time spent here pays back during every later phase.

Account structure

For anything beyond a small migration, use AWS Organizations with at least these accounts:

  • Management account - no workloads, only Organizations, IAM Identity Center, billing
  • Shared services account - AD Connector, DNS resolvers, golden AMIs, monitoring
  • Production account - production workloads only
  • Non-production account - dev, test, staging
  • Security and audit account - centralized CloudTrail, Security Hub, GuardDuty findings, Config

For larger estates, split production further (per business unit or per criticality tier). Get the boundaries right early. Re-accounting workloads later is painful.

VPC and subnet design

The pattern that works for VMware migrations:

  • One VPC per environment (prod, non-prod) per region
  • Three subnet tiers per AZ: public (internet-facing load balancers, NAT), private (application VMs, databases), isolated (no internet, for high-sensitivity workloads)
  • At least 3 availability zones for production
  • Generous CIDR blocks. Pod IPs in any future Kubernetes work, ENIs for replication tools, EFS mount targets - they all eat IPs. Start with /16 if you can.

Plan the CIDR space against the on-prem network. Overlapping ranges break VPN routing and force re-IP work later. If you’re using Transit Gateway as the hub, document the routing table design before you create the TGW.

Connectivity to on-prem

You have to decide this before phase 3 starts, because the migration data has to flow across it:

  • AWS Site-to-Site VPN. Fast to set up, low cost, throughput limited to ~1.25 Gbps per tunnel. Fine for small estates, painful for large data moves.
  • AWS Direct Connect. Dedicated link, 1, 10, or 100 Gbps. Lead time of 4-12 weeks to provision. Order it on day one of the project, not when phase 3 starts.
  • Hybrid pattern. Direct Connect for migration data, VPN as backup. Most large migrations use this.

For replication during MGN-based migrations, plan on sustained 50-200 Mbps per replicating server depending on change rate. Aggregate that across the wave. Direct Connect at 10 Gbps handles around 100 simultaneous replicating VMs without congestion. A 1 Gbps link maxes out fast.

Identity

Most VMware estates lean heavily on Active Directory. The decisions:

  • AD Connector. Proxies AD requests to your on-prem domain controllers. Cheap, works for most read paths, requires reliable connectivity. Good interim pattern during migration.
  • AWS Managed Microsoft AD. Full DCs in AWS, trust to on-prem domain. More expensive, more flexible, the right answer for long-term AWS-resident AD.
  • Extend self-managed AD into AWS. Run your own EC2 DCs in a shared services account. Most control, most operational responsibility.

For workload IAM, use IAM Identity Center for human access and either IAM instance profiles or (better) EC2 Instance Connect / Systems Manager Session Manager for instance access. Stop using SSH keys early.

Other foundations to land before any workload moves

  • DNS strategy. Route 53 private hosted zones, conditional forwarding to on-prem DNS, hybrid resolver endpoints if needed
  • Centralized logging. CloudWatch Logs with retention configured (default is forever - expensive). For long-term retention, ship to S3 with Athena
  • Backup baseline. AWS Backup with policies before workloads land
  • Security baseline. GuardDuty, Security Hub, Inspector enabled. Tag policy enforced. Encryption defaults enforced via SCP
  • Cost visibility. Cost allocation tags, AWS Budgets, anomaly detection

The enterprise cloud migration checklist covers the full landing zone scope if you want a more exhaustive version.


Phase 2: Identify applications and their dependencies

You cannot move what you have not mapped. Phase 2 produces two artifacts: a complete application inventory and a wave plan.

Build the inventory

Tools that help:

  • AWS Application Discovery Service. Agent-based or agentless (via vCenter). Collects VM specs, network connections, and resource usage. Feeds into Migration Hub
  • RVTools. Free, runs against vCenter, produces a CSV of every VM with vCPU, RAM, disk, and OS. Use this even if you also use Discovery Service - it’s faster for initial sizing
  • VMware vRealize / Aria operations. If you already have it, use it for utilization data

For each VM, capture at minimum: hostname, OS, vCPU, RAM, disk size and type, IP, criticality, owner, and the application it supports.

Map the dependencies

This is the part that takes time and where mistakes show up later. For each application, identify:

  • Databases. Hostnames, ports, versions. Common: SQL Server, Oracle, MySQL, PostgreSQL, MongoDB
  • Message queues and event buses. RabbitMQ, ActiveMQ, IBM MQ, Kafka, NATS
  • Caches. Redis, Memcached, Hazelcast
  • File shares. SMB / CIFS for Windows, NFS for Linux, sometimes both
  • Internal APIs. Other applications this one calls, and what calls it
  • Authentication. Active Directory, LDAP, SAML / OIDC IdP
  • Monitoring and logging. SolarWinds, Splunk, Dynatrace - they all need either re-pointing or re-architecture
  • Load balancers and ingress. F5, NetScaler, internal Nginx, plus DNS dependencies
  • Scheduled jobs. Cron, Windows Task Scheduler, Control-M, AutoSys
  • License servers. FlexNet, RLM, vendor-specific dongles
  • Certificate stores. Where TLS certs come from, how they renew

Manual interviews with app owners are slow but irreplaceable. Discovery tools give you network-level connections but they miss intent. A flow from app A to app B might be a real dependency or might be a forgotten healthcheck from a decommissioned monitor.

Decide the migration pattern per application

For each app, pick one of the standard patterns:

PatternWhen
Rehost (lift-and-shift)Vendor-supplied app, no code access, legacy stack, no engineering capacity
ReplatformSame app, but DB moves to RDS, MQ to SQS or MSK, file shares to FSx or EFS
RefactorActive product team, containerizable, value in modernization
RepurchaseOff-the-shelf replacement available (e.g. email, ticketing, generic SaaS)
RetireMore common than you think - run a “who uses this?” audit and decommission the unused ones

Most VMware-to-EC2 migrations are dominated by rehost and replatform. Refactor happens after, on a separate timeline.

Plan the waves

Group applications into waves based on dependencies. The rule of thumb:

  • Wave 0: foundational shared services. AD Connector or DCs, DNS, monitoring agents, file shares, license servers
  • Wave 1: stateless or self-contained apps. Lowest risk, validates the migration tooling
  • Wave 2-N: business applications grouped by dependency cluster. Apps that share a database or queue migrate together
  • Last wave: the apps that everything talks to. Sometimes the same as wave 0 (in which case wave 0 is “set up the cloud-resident version” and the last wave is “cut over to it”)

For wave sizing, 10-30 VMs per wave is typical. Small enough to validate, large enough to make progress. Database migrations sit alongside the wave that depends on them.


Phase 3: Move VMs to EC2

The user-submitted version of this question is often phrased as “take a VMware snapshot, upload to S3, restore as AMI”. That works, and it’s worth understanding the manual path, but it’s rarely the right tool for a migration above ~10 VMs. Let’s cover all the options and when each one wins.

Option A: VM Import/Export (the manual path)

This is what most people picture when they imagine “move VM to AWS”. The steps:

  1. Power off the VM (or take a clean snapshot; live VMs can be imported but block-level consistency is at risk)
  2. Export the VM from vCenter as an OVA or VMDK file
  3. Create an S3 bucket in the target AWS account (with the right region, encryption, and access controls)
  4. Upload the OVA/VMDK to S3. Use the AWS CLI multipart upload or s5cmd for large files
  5. Create the vmimport IAM role with permissions to read from S3 and create snapshots
  6. Call import-image via the AWS CLI:
    aws ec2 import-image \
      --description "Migrated VM: app-server-01" \
      --disk-containers Format=ova,UserBucket="{S3Bucket=my-vmimport-bucket,S3Key=app-server-01.ova}"
  7. Poll describe-import-image-tasks until it completes. Large VMs (100+ GB) can take hours
  8. The result is an AMI. Launch an EC2 instance from it in the target VPC, subnet, and security group
  9. Reconfigure the EC2. New IP, SSM agent, CloudWatch agent, AD rejoin if needed

When this path makes sense:

  • Small batches (under 10 VMs). The manual overhead is acceptable
  • Specific VMs that cannot run agents. Some appliances or hardened images forbid running the MGN replication agent
  • One-off restores or template AMIs. Building golden images from on-prem templates

When it does not make sense:

  • Bulk migration. The manual steps multiply. Replication is offline, so cutover requires VM downtime equal to the export-upload-import cycle (often 4-12 hours per VM)
  • Frequent change. If the VM is still active, the imported AMI is stale by the time it’s done
  • Tight downtime windows. VM Import is not continuous replication

Option B: AWS Application Migration Service (MGN) - the default

MGN is the modern replacement for the deprecated CloudEndure Migration. It’s the right default for most VMware-to-EC2 migrations of any scale.

What it does:

  • Installs a lightweight replication agent on each source VM
  • Continuously replicates block-level changes to a staging area in AWS
  • Lets you launch a test instance at any time without affecting replication
  • Lets you launch a cutover instance with the latest data when ready, with minimal downtime (usually under 10 minutes per VM)

Workflow:

  1. Set up MGN in the target AWS account. Replication subnet, IAM roles, replication settings (server class, encryption, dedicated IP)
  2. Install the AWS Replication Agent on each source VM. This kicks off the initial sync (which can take hours per VM depending on data and bandwidth)
  3. Wait for “Healthy” status. Each VM shows up in the MGN console as continuously replicating
  4. Launch test instances to validate boot, app health, connectivity. Test instances run from a point-in-time snapshot and do not affect ongoing replication
  5. Plan the cutover. App teams confirm readiness, downtime windows agreed
  6. Cutover. MGN takes a final snapshot, launches the production EC2 instance, marks the source as “Cutover complete”. Re-IP and DNS update happen at this point
  7. Decommission the source VM after a soak period

MGN’s strengths:

  • Continuous replication. Source VMs can keep running until cutover. Test instances can be launched repeatedly
  • Minimal downtime. Final sync at cutover is delta-only (minutes, not hours)
  • Per-VM control. Each VM has its own status, settings, and cutover schedule
  • Free for the first 90 days per source server, then per-hour. Plan migrations to fit in the free window where you can

MGN’s limitations:

  • Requires installing an agent. Hardened or appliance-style VMs may refuse. Some compliance teams require approval
  • Source-target OS must match the supported list. Most mainstream Windows and Linux versions are covered, but check for older or niche distros
  • Network bandwidth matters. Initial sync can take days for VMs with large disks over a 1 Gbps link

For an estate above 20 VMs, MGN is almost always the right tool.

Option C: VMware HCX for cross-platform vMotion

HCX is VMware’s tool, not AWS’s. It’s relevant when migrating through Azure VMware Solution or Amazon EVS as an intermediate step, but it doesn’t move VMs directly to native EC2.

If you’re going through AVS or EVS first, HCX moves the VMs there with L2 extension and warm vMotion. Then a second migration step (often MGN again) moves the VMs from AVS / EVS into native EC2.

For most direct VMware-to-EC2 projects, you can skip HCX entirely.

Option D: Snowball Edge for offline bulk moves

When initial replication would saturate the link for weeks, AWS Snowball Edge ships you a physical appliance. You copy data to it on-prem, ship it back, and AWS loads it into S3. From there, you import as AMIs or restore to EBS volumes.

When this matters:

  • Total data set above 10 TB and bandwidth is constrained
  • Sites with no Direct Connect and slow internet
  • Initial bulk load before MGN takes over for incremental sync

For most VMware-to-EC2 migrations, Snowball is only relevant for outlier sites or initial data lake / NAS moves.


Phase 4: Move databases and shared data

VMs are easy to move with MGN. Databases need a separate strategy because:

  • They often outlive the VM that hosts them (and outlast multiple application generations)
  • They benefit massively from moving to a managed service (RDS, Aurora) rather than self-hosting on EC2
  • Live cutover with minimal data loss is achievable with the right replication tools

AWS Database Migration Service (DMS)

DMS does two things:

  • Initial bulk load of the source database into the target
  • Change Data Capture (CDC) to keep the target in sync with ongoing source changes

Supported source-target pairs include same-engine moves (SQL Server to SQL Server on RDS) and heterogeneous moves (Oracle to PostgreSQL on Aurora, SQL Server to Aurora MySQL). For heterogeneous moves, pair DMS with the AWS Schema Conversion Tool (SCT) to convert schemas and stored procedures.

DMS workflow:

  1. Create replication instance (DMS managed EC2 in your VPC)
  2. Create source and target endpoints (connection details for each side)
  3. Run a pre-migration assessment to flag incompatibilities
  4. Run SCT if the engine is changing - it produces converted DDL and a remediation report for anything that needs manual rework
  5. Start the replication task in “Full load + CDC” mode
  6. Monitor CDC lag. This is the number that tells you when you can cut over
  7. Cut over by stopping writes on the source, waiting for CDC to drain, then pointing the application at the new database

When DMS is the right tool:

  • Cross-engine migrations (Oracle to Postgres, SQL Server to Aurora MySQL)
  • Minimal downtime tolerable but not zero (a few minutes is typical at cutover)
  • Moving to managed databases (RDS, Aurora)

When DMS is not the right tool:

  • Same-engine, large databases where native tools are better. Native log shipping or Always On Availability Groups for SQL Server often outperform DMS for large same-engine moves
  • Real-time replication with sub-second lag requirements. Use native replication or third-party CDC tools
  • NoSQL migrations. DMS supports MongoDB and a few others, but for production-scale NoSQL, native tools or per-database approaches are usually better

Native database replication

For some scenarios, the database engine’s native replication is faster, cheaper, and lower risk:

  • SQL Server: Always On Availability Groups, log shipping, or backup-and-restore with differential catch-up
  • Oracle: Data Guard, GoldenGate (if licensed)
  • PostgreSQL: Native logical replication
  • MySQL: Binary log replication

For moves into managed services, native replication ends at the point where the target stops being a customer-managed instance. RDS doesn’t expose the underlying engine in the same way. For RDS targets, DMS is usually the cleanest path.

File shares and shared data

VMware estates typically have SMB or NFS file shares hosting application data, user profiles, or shared configuration.

Options for moving these:

  • AWS DataSync. Agent-based or agentless transfer for SMB and NFS shares to EFS, FSx for Windows, FSx for ONTAP, or S3. Handles incremental sync and validation
  • FSx File Gateway. Caching gateway that lets on-prem apps see FSx volumes as local SMB shares during transition
  • Direct copy via robocopy or rsync. Works for one-off moves of small data sets
  • Storage Gateway (File mode). For hybrid scenarios where some data stays on-prem during the migration

For Windows file servers being decommissioned, the typical pattern is: stand up FSx for Windows File Server in AWS, use DataSync to seed the data, run incremental syncs on a schedule, then cut over by repointing the SMB clients to the FSx endpoint.

For more on the broader tooling landscape, the cloud migration tools overview covers the wider set.


Phase 5: Validate, cutover, and go-live

The migration tooling moves the data. Phase 5 is everything that has to happen for the business to accept the migration as complete.

Pre-cutover validation

For each wave, the pre-cutover checklist:

  • Functional QA signed off against the test instance
  • Performance test at expected production load
  • Integration tests with upstream and downstream systems
  • Security scan of the target EC2 (Inspector, third-party scanners)
  • Backup verified - target EC2 is in AWS Backup, RDS snapshots enabled, recovery tested
  • Monitoring verified - CloudWatch agent reporting, application logs flowing, alerts configured
  • Runbooks updated for the AWS-resident version (restart procedures, scaling, incident response)
  • Business stakeholder sign-off documented

Business stakeholder validation is often the slowest step. Build it into the wave timeline rather than treating it as a 24-hour buffer.

Cutover plan

For each application, the cutover plan documents:

  • Cutover window (date, time, duration)
  • Who is on the cutover call (cloud team, application owner, DBA, network, security, comms)
  • Pre-cutover steps (final data sync, freeze writes, snapshot for rollback)
  • Cutover steps in order (MGN cutover, DNS update, traffic shift, smoke test)
  • Validation criteria (specific tests that confirm the cutover is healthy)
  • Rollback criteria (what triggers a rollback decision, who makes it)
  • Rollback steps (DNS rollback, point clients back at on-prem, restart on-prem if powered off)
  • Communications plan (who notifies users, business, support teams)

Run the cutover as a rehearsal once before the live event. Even a partial dry run surfaces missing steps that no plan document catches.

Traffic cutover patterns

For applications with traffic to manage:

  • DNS-based. Update Route 53 records with low TTL ahead of cutover. Switch the record at cutover. Simple, works for most apps
  • Load balancer weighted target groups. Run on-prem and AWS in parallel, shift traffic from 5% to 100% in stages
  • Application-controlled. Internal feature flag, gradual cohort migration. Used for stateful or sticky-session apps

The first option works for most apps. The second works well for HTTP-based applications. The third is the right answer for highly stateful systems.

Post-cutover monitoring window

For 7-14 days after cutover, keep the source VM running but disconnected from production. The reasons:

  • Faster rollback if something surfaces in the first week
  • Comparison baseline if performance regresses
  • Audit and compliance evidence

After the monitoring window expires, decommission the source VM, document the decommissioning, and update the asset register.

What success looks like

A wave is “done” when:

  • All target EC2 instances are healthy and stable
  • Application owners confirm performance matches or beats the source environment
  • Monitoring shows no recurring incidents related to the migration
  • Source VMs have been decommissioned
  • Documentation, runbooks, and architecture diagrams are updated

If any of these are open, the wave isn’t done. Don’t move to the next wave with prior waves trailing.


Common pitfalls

The mistakes that keep showing up across VMware-to-EC2 projects:

  1. Underestimating bandwidth. Replication for 50 VMs over a 1 Gbps link saturates faster than people expect. Order Direct Connect early
  2. Skipping the dependency map. Wave plans built on assumption fall apart when a “self-contained” app turns out to depend on an internal API in a wave 6 weeks later
  3. Same-IP migration without re-IP planning. Layer 2 extension via HCX is fragile and expensive. Re-IP and update DNS instead, with low-TTL records pre-staged
  4. Treating DBs as VMs. Migrating a SQL Server VM with MGN works, but losing the option to move to RDS later is an expensive shortcut
  5. AD migration not owned. AD Connector is usually the right interim answer, but someone has to own the long-term plan. Don’t let AD become a single point of failure that nobody manages
  6. No real rollback plan. “We’ll just turn the old VM back on” is not a rollback plan if it’s been off for a week and data has changed in AWS
  7. CloudWatch Logs without retention. Defaults to “never expire”. A few months in, the bill is meaningful. Set retention before launching the first workload
  8. Skipping the security baseline. GuardDuty, Inspector, encryption defaults via SCP - all of these are cheaper to set up before workloads land than to retrofit

For the wider risk surface, common cloud migration challenges covers the patterns across migration types.


FAQ

What’s the difference between VM Import/Export and AWS MGN?

VM Import/Export is an offline, snapshot-based import. It’s good for one-off VMs or small batches. AWS MGN (Application Migration Service) is continuous block-level replication with test-instance support and minimal-downtime cutover. For any migration above ~10 VMs, MGN is the right default. VM Import is the fallback for VMs that cannot run agents.

Can I migrate a VMware VM without downtime?

Practically, no - but you can get close. MGN cutover is typically under 10 minutes per VM and DMS database cutover with CDC is typically under 5 minutes if you stop writes briefly. True zero-downtime migration requires application-layer replication or active-active architecture, both of which are big engineering investments.

Do I need to install agents on every VM?

For MGN, yes - the AWS Replication Agent is required. For VM Import/Export, no - but you give up continuous replication and minimal-downtime cutover in exchange. For databases via DMS, agents are not required on the source database, only credentials.

How long does VMware-to-EC2 migration take?

For a single VM with MGN: initial replication 4-48 hours depending on size and bandwidth, cutover under 10 minutes. For an estate of 100-200 VMs: 4-8 months end-to-end including landing zone, discovery, and wave-based migration. For 500-1000 VMs: 9-18 months.

What does VMware-to-EC2 migration cost?

Two cost buckets. Project cost (landing zone, discovery, migration tooling, application work, partner support if used): typically low to mid six figures for a mid-sized estate, into seven figures for large enterprise. Cloud spend post-migration: depends on rightsizing, reserved instances, and what you do after lift-and-shift. For directional numbers, the VMware migration cost calculator compares 3-year TCO.

Should I move databases to RDS or run them on EC2?

For most production databases, RDS or Aurora is the better answer. You give up some configuration flexibility but you gain managed backups, patching, multi-AZ failover, and read replicas. For databases with hard requirements that RDS doesn’t support (specific SQL Server features, custom patches, third-party extensions), EC2 is the fallback.

Can I migrate VMware ESXi clusters wholesale?

Not directly to native EC2 - VMs migrate, the cluster does not. The “lift the cluster” pattern is what AVS and EVS exist for. For native EC2, you’re migrating individual VMs (one at a time or in waves) into EC2 instances, not preserving the cluster abstraction.

What about VMware on AWS (VMC on AWS)?

VMware Cloud on AWS was wound down for new customers in 2024. Existing customers were transitioned to Broadcom direct or to Amazon Elastic VMware Service (EVS). For new VMware-to-AWS projects in 2026, the choices are AVS (Azure), EVS (AWS with Broadcom licenses), or native EC2 (this post’s scope).

Can I run Windows Server on EC2 with my existing licenses?

Yes, via License Mobility through Software Assurance. For SQL Server, similar licensing transfer is available. Bring-your-own-license (BYOL) on Dedicated Hosts is the strictest path. For most workloads, AWS-provided Windows AMIs (with licensing in the EC2 price) are simpler unless you have a large Software Assurance investment to protect.


Ready to migrate VMware to AWS?

VMware-to-EC2 migration is one of the more predictable migration paths if the phases are run in order: landing zone, then discovery, then VM moves with the right tooling, then databases, then validation and go-live. The mistakes that hurt are mostly upstream of the migration tooling, not in it.

Tasrie IT Services provides hands-on cloud migration services to deliver each of these phases:

  • Landing zone design and build - AWS Organizations, multi-account, Direct Connect, identity, baseline security
  • Discovery and wave planning - Application Discovery Service deployment, dependency mapping, wave design
  • Migration execution - MGN at scale, DMS for databases, VM Import for the special cases, plus the runbooks for each wave
  • Validation and go-live - QA frameworks, cutover playbooks, rollback rehearsals, post-cutover stabilization

The teams that finish their migrations cleanly tend to be the ones that invested in phases 1 and 2 before touching phase 3. The rest is execution.

Talk to our AWS migration team →

T

Tasrie IT Services

Published on June 3, 2026

Continue exploring these related topics

Ready to get started?

Need AWS expertise?

From migration to managed services, we help teams get the most out of AWS.

Get started
Chat with real humans
Chat on WhatsApp