Engineering

Migrating to Kubernetes: From On-Premise VMs to Cloud-Native Infrastructure

Engineering Team

Migrating from traditional virtual machines to Kubernetes represents a significant architectural shift. Done well, it delivers improved scalability, deployment velocity, and operational efficiency. Done poorly, it creates complexity without benefits.

This guide covers the practical steps for migrating on-premise applications to managed Kubernetes services like Amazon EKS, Azure AKS, or Google GKE.

Why Migrate to Kubernetes?

Before starting migration, ensure Kubernetes solves real problems for your organization.

Valid Reasons to Migrate

Scalability requirements:

  • Applications need to scale rapidly based on demand
  • Current infrastructure cannot handle traffic spikes
  • Manual scaling is too slow for business needs

Deployment velocity:

  • Release cycles are too slow
  • Deployments are risky and require downtime
  • Rollbacks are difficult or impossible

Resource efficiency:

  • VMs are underutilized
  • Cannot bin-pack workloads efficiently
  • Over-provisioning to handle peak loads

Developer productivity:

  • Environment inconsistencies cause issues
  • Developers wait for infrastructure provisioning
  • Local development differs from production

Invalid Reasons to Migrate

Resume-driven development: “Kubernetes is popular” is not a migration justification.

Solving organizational problems: Kubernetes does not fix poor communication or unclear ownership.

Following competitors: What works for others may not fit your situation.

Assessment Phase

Application Portfolio Analysis

Evaluate each application for Kubernetes readiness:

| Application | Stateless | 12-Factor | Dependencies | Complexity | Priority |
|-------------|-----------|-----------|--------------|------------|----------|
| API Gateway | Yes | Yes | Redis | Low | High |
| User Service | Yes | Partial | PostgreSQL | Medium | High |
| Legacy CRM | No | No | Oracle, LDAP | High | Low |
| Batch Jobs | Yes | Yes | S3 | Low | Medium |

12-Factor App checklist:

  • Configuration via environment variables
  • Stateless processes
  • Port binding
  • Disposable processes (fast startup/shutdown)
  • Dev/prod parity
  • Logs as event streams

Applications meeting most criteria are good migration candidates.

Infrastructure Requirements

Document current infrastructure for capacity planning:

# Collect VM metrics
#!/bin/bash

for vm in $(get_vm_list); do
  echo "=== $vm ==="
  echo "CPU Cores: $(ssh $vm nproc)"
  echo "Memory: $(ssh $vm free -h | grep Mem | awk '{print $2}')"
  echo "Disk: $(ssh $vm df -h / | tail -1 | awk '{print $2}')"
  echo "Avg CPU (7d): $(get_avg_cpu $vm 7d)"
  echo "Avg Memory (7d): $(get_avg_memory $vm 7d)"
  echo "Peak CPU (7d): $(get_peak_cpu $vm 7d)"
  echo ""
done

Use this data to right-size Kubernetes resource requests and limits.

Dependency Mapping

Map all application dependencies:

# dependency-map.yaml
applications:
  user-service:
    type: api
    language: java
    dependencies:
      databases:
        - name: user-db
          type: postgresql
          version: "14"
      caches:
        - name: session-cache
          type: redis
          version: "7"
      services:
        - name: auth-service
          protocol: grpc
          port: 50051
      external:
        - name: stripe-api
          url: https://api.stripe.com

  order-service:
    type: api
    language: nodejs
    dependencies:
      databases:
        - name: order-db
          type: postgresql
          version: "14"
      queues:
        - name: order-events
          type: rabbitmq
      services:
        - name: user-service
          protocol: http
          port: 8080
        - name: inventory-service
          protocol: http
          port: 8080

Containerization

Dockerfile Best Practices

Create efficient, secure container images:

# Example: Java application
# Use multi-stage builds for smaller images
FROM maven:3.9-eclipse-temurin-21 AS builder

WORKDIR /app
COPY pom.xml .
# Cache dependencies
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# Production image
FROM eclipse-temurin:21-jre-alpine

# Security: Run as non-root
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -D appuser

WORKDIR /app

# Copy only the built artifact
COPY --from=builder /app/target/*.jar app.jar

# Set ownership
RUN chown -R appuser:appgroup /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s \
  CMD wget -q --spider http://localhost:8080/health || exit 1

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "app.jar"]
# Example: Node.js application
FROM node:20-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

# Production image
FROM node:20-alpine

RUN addgroup -g 1001 nodejs && \
    adduser -u 1001 -G nodejs -D nodejs

WORKDIR /app

COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./

USER nodejs

EXPOSE 3000

CMD ["node", "dist/server.js"]

Container Registry Setup

Set up a container registry for your images:

AWS ECR:

# Create repository
aws ecr create-repository \
  --repository-name my-app/user-service \
  --image-scanning-configuration scanOnPush=true

# Login and push
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

docker tag user-service:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app/user-service:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app/user-service:latest

Azure ACR:

# Create registry
az acr create --resource-group mygroup --name myregistry --sku Standard

# Login and push
az acr login --name myregistry
docker tag user-service:latest myregistry.azurecr.io/user-service:latest
docker push myregistry.azurecr.io/user-service:latest

GCP Artifact Registry:

# Create repository
gcloud artifacts repositories create my-repo \
  --repository-format=docker \
  --location=us-central1

# Configure Docker and push
gcloud auth configure-docker us-central1-docker.pkg.dev
docker tag user-service:latest us-central1-docker.pkg.dev/my-project/my-repo/user-service:latest
docker push us-central1-docker.pkg.dev/my-project/my-repo/user-service:latest

Kubernetes Cluster Setup

Managed Kubernetes Selection

Choose based on your cloud provider and requirements:

FactorEKSAKSGKE
Control plane cost$0.10/hourFreeFree (standard)
AWS integrationNativeLimitedLimited
Azure integrationLimitedNativeLimited
GCP integrationLimitedLimitedNative
Autopilot modeNoNoYes

Cluster Architecture

Design your cluster for production:

# Terraform example: EKS cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "production"
  cluster_version = "1.29"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Enable cluster logging
  cluster_enabled_log_types = [
    "api", "audit", "authenticator", "controllerManager", "scheduler"
  ]

  # Node groups
  eks_managed_node_groups = {
    # General workloads
    general = {
      min_size     = 3
      max_size     = 10
      desired_size = 3

      instance_types = ["m6i.large"]
      capacity_type  = "ON_DEMAND"

      labels = {
        workload-type = "general"
      }
    }

    # Spot instances for non-critical workloads
    spot = {
      min_size     = 0
      max_size     = 20
      desired_size = 2

      instance_types = ["m6i.large", "m5.large", "m5a.large"]
      capacity_type  = "SPOT"

      labels = {
        workload-type = "spot"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

For detailed cluster setup guidance, see our Kubernetes consulting services.

Essential Add-ons

Install necessary cluster components:

# Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - name: cluster-autoscaler
        image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
# Metrics Server for HPA
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    spec:
      containers:
      - name: metrics-server
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port

Application Migration

Kubernetes Manifests

Convert VM-based applications to Kubernetes resources:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  labels:
    app: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: myregistry/user-service:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: user-service-secrets
              key: database-url
        - name: REDIS_HOST
          value: "redis-master.default.svc.cluster.local"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: user-service
              topologyKey: kubernetes.io/hostname
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Database Migration

Migrate databases to managed services or Kubernetes:

Option 1: Managed database (recommended)

# Terraform: RDS PostgreSQL
resource "aws_db_instance" "user_db" {
  identifier           = "user-service-db"
  engine               = "postgres"
  engine_version       = "14"
  instance_class       = "db.r6g.large"
  allocated_storage    = 100
  storage_encrypted    = true

  db_name  = "users"
  username = "admin"
  password = var.db_password

  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name

  backup_retention_period = 7
  multi_az               = true
  skip_final_snapshot    = false
}

Option 2: Database in Kubernetes (for dev/test)

# PostgreSQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
spec:
  serviceName: postgresql
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - name: postgresql
        image: postgres:14
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: users
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgresql-secrets
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgresql-secrets
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 100Gi

Secrets Management

Migrate secrets from on-premise vaults to Kubernetes:

# External Secrets Operator with AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: user-service-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets
    kind: SecretStore
  target:
    name: user-service-secrets
  data:
  - secretKey: database-url
    remoteRef:
      key: production/user-service
      property: database_url
  - secretKey: api-key
    remoteRef:
      key: production/user-service
      property: api_key

CI/CD Pipeline Migration

Set up automated deployment pipelines:

# GitHub Actions for Kubernetes deployment
name: Deploy to Kubernetes

on:
  push:
    branches: [main]

env:
  REGISTRY: 123456789.dkr.ecr.us-east-1.amazonaws.com
  IMAGE_NAME: user-service

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
    - uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

    - name: Login to ECR
      uses: aws-actions/amazon-ecr-login@v2

    - name: Build and push
      uses: docker/build-push-action@v5
      with:
        context: .
        push: true
        tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

    - name: Update kubeconfig
      run: aws eks update-kubeconfig --name production

    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/user-service \
          user-service=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        kubectl rollout status deployment/user-service

For GitOps-based deployments, consider implementing ArgoCD for declarative continuous delivery.

Observability Setup

Monitoring with Prometheus

Deploy comprehensive monitoring:

# Prometheus configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: platform
  podMonitorSelector:
    matchLabels:
      team: platform
  resources:
    requests:
      memory: 2Gi
      cpu: 1
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: gp3
        resources:
          requests:
            storage: 100Gi
---
# ServiceMonitor for application
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: user-service
  labels:
    team: platform
spec:
  selector:
    matchLabels:
      app: user-service
  endpoints:
  - port: http
    path: /metrics
    interval: 30s

For production monitoring setup, see our Prometheus consulting services.

Logging with Fluentd

Collect and forward logs:

# Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.16-debian-cloudwatch
        env:
        - name: AWS_REGION
          value: "us-east-1"
        - name: LOG_GROUP_NAME
          value: "/kubernetes/production"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers

Migration Execution

Phased Migration Approach

Migrate in waves to reduce risk:

Wave 1: Stateless, non-critical applications

  • Internal tools
  • Development environments
  • Non-production workloads

Wave 2: Stateless production applications

  • APIs without state
  • Frontend applications
  • Microservices

Wave 3: Stateful applications

  • Applications with databases
  • Message queue consumers
  • Session-dependent services

Wave 4: Critical infrastructure

  • Core business applications
  • High-traffic services
  • Compliance-sensitive workloads

Traffic Migration

Use gradual traffic shifting:

# Nginx Ingress for traffic splitting
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: user-service
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "20"  # 20% to K8s
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 80

Increase canary weight gradually:

  • 10% → Monitor for 1 hour
  • 25% → Monitor for 4 hours
  • 50% → Monitor for 24 hours
  • 100% → Complete migration

Rollback Plan

Document and test rollback procedures:

#!/bin/bash
# rollback.sh - Revert to VM-based deployment

# 1. Update DNS to point back to load balancer
aws route53 change-resource-record-sets \
  --hosted-zone-id Z123456 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "Z789",
          "DNSName": "vm-load-balancer.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

# 2. Scale down Kubernetes deployment
kubectl scale deployment user-service --replicas=0

# 3. Verify traffic is back on VMs
curl -I https://api.example.com/health

echo "Rollback complete. Monitor VM metrics."

Post-Migration Optimization

Resource Right-Sizing

Analyze actual usage and adjust:

# Check resource usage
kubectl top pods -n production

# Analyze with metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/production/pods | jq '.items[] | {name: .metadata.name, cpu: .containers[].usage.cpu, memory: .containers[].usage.memory}'

Use Vertical Pod Autoscaler for recommendations:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: user-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  updatePolicy:
    updateMode: "Off"  # Recommendation only

Cost Optimization

Implement Kubernetes cost management:

  • Use Spot/Preemptible instances for non-critical workloads
  • Right-size node groups based on actual usage
  • Implement pod priority and preemption
  • Use cluster autoscaler to scale down unused capacity

For ongoing cloud cost optimization, see our AWS cost management services.

Summary

Migrating from on-premise VMs to Kubernetes requires:

  1. Thorough assessment - Evaluate applications for Kubernetes readiness
  2. Proper containerization - Build efficient, secure container images
  3. Production-ready clusters - Set up managed Kubernetes with proper configuration
  4. Comprehensive observability - Implement monitoring, logging, and alerting
  5. Phased migration - Reduce risk with gradual traffic shifting
  6. Continuous optimization - Right-size resources and optimize costs

The effort is significant, but organizations that complete the migration successfully gain improved scalability, faster deployments, and better resource efficiency.


Need Help with Kubernetes Migration?

We guide organizations through Kubernetes migrations from on-premise infrastructure to cloud-native platforms. Our Kubernetes consulting services cover assessment, architecture design, migration execution, and ongoing optimization for EKS, AKS, and GKE.

Book a free 30-minute consultation to discuss your Kubernetes migration project.

Chat with real humans
Chat on WhatsApp