~/blog/postgresql-on-kubernetes-production-guide-2026
zsh
KUBERNETES

PostgreSQL on Kubernetes: Production Setup We Actually Run

Engineering Team 2026-03-19

Running databases on Kubernetes used to be a bad idea. In 2026, with mature operators like CloudNativePG, it is a viable production option — if you do it right.

We run PostgreSQL on Kubernetes for several clients. We also recommend managed databases (RDS, Cloud SQL) for others. This guide covers when each approach makes sense, and how to set up PostgreSQL on Kubernetes for production.

When to Run PostgreSQL on Kubernetes

Use Kubernetes-native PostgreSQL when:

  • You want a single platform for everything (apps + databases)
  • You need to run across multiple clouds or on-premise
  • You want database-per-tenant isolation in a multi-tenant SaaS
  • Your team already manages Kubernetes and wants to reduce external dependencies
  • Cost matters — self-managed Postgres on Kubernetes can be 40-60% cheaper than RDS

Use managed PostgreSQL (RDS, Cloud SQL, Azure Database) when:

  • Your team does not have deep Kubernetes and PostgreSQL expertise
  • You want zero operational overhead for database management
  • You need cross-region replication with minimal setup
  • You are running on a single cloud provider with no portability requirements
  • Your database is business-critical and you want vendor-backed SLAs

The honest answer: If in doubt, use managed. Running databases on Kubernetes adds operational complexity that is only justified when the benefits above outweigh the cost of managing it yourself.

Why CloudNativePG

There are several PostgreSQL operators for Kubernetes. We use CloudNativePG because:

  • CNCF Sandbox project — backed by the Cloud Native Computing Foundation
  • Does not use StatefulSets — manages pods and PVCs directly for better control over failover
  • Native streaming replication — built on PostgreSQL’s native replication, not custom solutions
  • Automated failover — promotes replicas to primary in seconds
  • Backup to object storage — continuous WAL archiving to S3/GCS/Azure Blob
  • Point-in-time recovery — restore to any second using WAL replay
  • Built-in Prometheus metrics — no additional exporters needed

Other viable options: Zalando Postgres Operator (more mature, uses Patroni), CrunchyData PGO (enterprise-focused). CloudNativePG is our default for new deployments.

Production Setup

Step 1: Install the Operator

# Install CloudNativePG operator
kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.0.yaml

# Verify installation
kubectl get deployment -n cnpg-system cnpg-controller-manager

Or with Helm:

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm install cnpg cnpg/cloudnative-pg -n cnpg-system --create-namespace

Step 2: Create a Production Cluster

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: app-db
  namespace: production
spec:
  instances: 3    # 1 primary + 2 replicas

  # PostgreSQL version
  imageName: ghcr.io/cloudnative-pg/postgresql:16.6

  # Storage
  storage:
    size: 100Gi
    storageClass: gp3-encrypted    # AWS EBS gp3 with encryption

  # Resource allocation
  resources:
    requests:
      cpu: "2"
      memory: 4Gi
    limits:
      memory: 8Gi

  # PostgreSQL configuration
  postgresql:
    parameters:
      shared_buffers: "1GB"
      effective_cache_size: "3GB"
      work_mem: "64MB"
      maintenance_work_mem: "256MB"
      max_connections: "200"
      max_wal_size: "2GB"
      min_wal_size: "512MB"
      wal_level: "replica"
      max_parallel_workers_per_gather: "4"
      random_page_cost: "1.1"        # SSD storage

  # High availability
  minSyncReplicas: 1
  maxSyncReplicas: 1

  # Backup to S3
  backup:
    barmanObjectStore:
      destinationPath: "s3://my-pg-backups/app-db/"
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
    retentionPolicy: "30d"

  # Anti-affinity — spread replicas across nodes
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname

  # Monitoring
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
    - name: custom-pg-metrics
      key: queries

Key configuration decisions:

  • 3 instances — 1 primary, 2 replicas. Minimum for production HA. The primary handles writes, replicas handle reads and serve as failover candidates.
  • Synchronous replication (minSyncReplicas: 1) — at least one replica confirms every write. This prevents data loss during failover at the cost of slightly higher write latency.
  • Anti-affinity — ensures primary and replicas run on different nodes. If a node fails, the database survives.
  • gp3 storage — AWS EBS gp3 provides consistent IOPS. Use gp3-encrypted for encryption at rest.

Step 3: Configure Backups

CloudNativePG does continuous backup using PostgreSQL’s WAL (Write-Ahead Log) archiving. Every transaction is streamed to S3 in near real-time.

Schedule regular base backups:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: daily-backup
  namespace: production
spec:
  schedule: "0 2 * * *"    # 2 AM daily
  backupOwnerReference: self
  cluster:
    name: app-db
  immediate: true

Test your backups. A backup that has never been restored is not a backup. Schedule monthly restore tests:

# Restore to a point in time (for testing)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: restore-test
  namespace: testing
spec:
  instances: 1
  storage:
    size: 100Gi
    storageClass: gp3-encrypted

  bootstrap:
    recovery:
      source: app-db
      recoveryTarget:
        targetTime: "2026-03-18T14:00:00Z"    # Restore to this timestamp

  externalClusters:
  - name: app-db
    barmanObjectStore:
      destinationPath: "s3://my-pg-backups/app-db/"
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: ACCESS_SECRET_KEY

Step 4: Connect Your Application

CloudNativePG creates Kubernetes Services automatically:

ServicePurpose
app-db-rwRead-write (primary only)
app-db-roRead-only (replicas only)
app-db-rRead (any instance)

Connect your application using the read-write service for writes and read-only service for read replicas:

# Application deployment
env:
- name: DATABASE_URL
  value: "postgresql://app:$(DB_PASSWORD)@app-db-rw:5432/myapp?sslmode=require"
- name: DATABASE_URL_READONLY
  value: "postgresql://app:$(DB_PASSWORD)@app-db-ro:5432/myapp?sslmode=require"
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: app-db-app
      key: password

CloudNativePG automatically generates database credentials and stores them in Kubernetes Secrets. The secret app-db-app is created automatically.

Step 5: Monitoring

CloudNativePG exposes Prometheus metrics natively. If you have Prometheus installed, add a PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: cnpg-metrics
  namespace: production
spec:
  selector:
    matchLabels:
      cnpg.io/cluster: app-db
  podMetricsEndpoints:
  - port: metrics

Essential Grafana dashboards:

CloudNativePG provides a pre-built Grafana dashboard that shows:

  • Replication lag between primary and replicas
  • Transaction throughput (TPS)
  • Active connections vs max connections
  • Buffer cache hit ratio
  • WAL generation rate
  • Disk usage and growth rate

Critical alerts to set:

# PrometheusRule for PostgreSQL alerts
groups:
- name: postgresql
  rules:
  - alert: PostgreSQLReplicationLagHigh
    expr: cnpg_pg_replication_streaming_replicas_wal_lag_bytes > 100000000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL replication lag > 100MB"

  - alert: PostgreSQLConnectionsHigh
    expr: cnpg_pg_stat_activity_count / cnpg_pg_settings_setting{name="max_connections"} > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL connections > 80% of max"

  - alert: PostgreSQLDiskUsageHigh
    expr: cnpg_pg_database_size_bytes / cnpg_pg_volume_size_bytes > 0.85
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL disk usage > 85%"

Failover: What Happens When the Primary Dies

CloudNativePG handles automatic failover:

  1. The operator detects the primary pod is unhealthy (liveness probe fails)
  2. The most up-to-date replica is promoted to primary (typically < 10 seconds)
  3. The app-db-rw Service automatically points to the new primary
  4. Applications reconnect transparently (ensure your connection pool handles reconnects)
  5. A new replica is created to maintain the desired instance count

Important: Your application must handle database reconnection gracefully. Use a connection pooler (PgBouncer, built into CloudNativePG) and configure retry logic in your application.

# Enable PgBouncer pooler
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: app-db-pooler-rw
  namespace: production
spec:
  cluster:
    name: app-db
  instances: 2
  type: rw
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "50"

Cost Comparison: CloudNativePG vs RDS

For a production setup (primary + 2 replicas, 100GB storage, 2 vCPU, 4GB RAM):

ComponentCloudNativePG on EKSRDS Multi-AZ
Compute~$150/mo (shared EKS nodes)~$350/mo (db.r6g.large)
Storage~$30/mo (gp3 100GB x 3)~$35/mo (gp3 100GB)
Backup storage~$5/mo (S3)~$10/mo (RDS backup)
Operator/management$0 (open source)Included
Total~$185/mo~$395/mo
Operational effortMedium (your team manages)Low (AWS manages)

CloudNativePG is ~53% cheaper but requires your team to manage upgrades, troubleshoot replication issues, and handle edge cases. The cost savings are real, but so is the operational overhead.

Common Mistakes

1. Skipping anti-affinity. Without pod anti-affinity, Kubernetes might schedule all 3 database pods on the same node. If that node fails, you lose all replicas simultaneously. Always set enablePodAntiAffinity: true.

2. Using Deployment instead of an operator. A Deployment with a PostgreSQL container is not a database cluster. It has no replication, no failover, and no backup management. Use an operator or use managed databases.

3. Not testing backups. Backups streaming to S3 are useless if you have never restored one. Schedule monthly restore tests to verify your recovery process works.

4. Undersizing storage. Growing a PVC is possible but risky. Start with 2x your expected data size and monitor growth rate. Running out of disk space is the most common Kubernetes database incident we see.

5. Missing connection pooling. PostgreSQL’s per-connection memory overhead is significant. Without PgBouncer, 500 application connections can overwhelm a database that could handle the query load easily. Always use connection pooling.


Need Help Running Databases on Kubernetes?

We set up and manage PostgreSQL on Kubernetes using CloudNativePG — from initial deployment to ongoing operations and performance tuning.

Our PostgreSQL consulting services cover:

  • Architecture design — choose between CloudNativePG, managed RDS, or hybrid approaches
  • Production setup — HA clusters with automated backup, failover, and monitoring
  • Migration — move from RDS, standalone PostgreSQL, or other databases to Kubernetes-native PostgreSQL
  • Performance tuning — query optimisation, connection pooling, and resource right-sizing
  • Kubernetes cluster setupEKS, AKS, or GKE optimised for stateful workloads

Talk to our database team →

Continue exploring these related topics

$ suggest --service

Need Kubernetes expertise?

From architecture to production support, we help teams run Kubernetes reliably at scale.

Get started
Chat with real humans
Chat on WhatsApp