Running databases on Kubernetes used to be a bad idea. In 2026, with mature operators like CloudNativePG, it is a viable production option — if you do it right.
We run PostgreSQL on Kubernetes for several clients. We also recommend managed databases (RDS, Cloud SQL) for others. This guide covers when each approach makes sense, and how to set up PostgreSQL on Kubernetes for production.
When to Run PostgreSQL on Kubernetes
Use Kubernetes-native PostgreSQL when:
- You want a single platform for everything (apps + databases)
- You need to run across multiple clouds or on-premise
- You want database-per-tenant isolation in a multi-tenant SaaS
- Your team already manages Kubernetes and wants to reduce external dependencies
- Cost matters — self-managed Postgres on Kubernetes can be 40-60% cheaper than RDS
Use managed PostgreSQL (RDS, Cloud SQL, Azure Database) when:
- Your team does not have deep Kubernetes and PostgreSQL expertise
- You want zero operational overhead for database management
- You need cross-region replication with minimal setup
- You are running on a single cloud provider with no portability requirements
- Your database is business-critical and you want vendor-backed SLAs
The honest answer: If in doubt, use managed. Running databases on Kubernetes adds operational complexity that is only justified when the benefits above outweigh the cost of managing it yourself.
Why CloudNativePG
There are several PostgreSQL operators for Kubernetes. We use CloudNativePG because:
- CNCF Sandbox project — backed by the Cloud Native Computing Foundation
- Does not use StatefulSets — manages pods and PVCs directly for better control over failover
- Native streaming replication — built on PostgreSQL’s native replication, not custom solutions
- Automated failover — promotes replicas to primary in seconds
- Backup to object storage — continuous WAL archiving to S3/GCS/Azure Blob
- Point-in-time recovery — restore to any second using WAL replay
- Built-in Prometheus metrics — no additional exporters needed
Other viable options: Zalando Postgres Operator (more mature, uses Patroni), CrunchyData PGO (enterprise-focused). CloudNativePG is our default for new deployments.
Production Setup
Step 1: Install the Operator
# Install CloudNativePG operator
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.25/releases/cnpg-1.25.0.yaml
# Verify installation
kubectl get deployment -n cnpg-system cnpg-controller-manager
Or with Helm:
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm install cnpg cnpg/cloudnative-pg -n cnpg-system --create-namespace
Step 2: Create a Production Cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: app-db
namespace: production
spec:
instances: 3 # 1 primary + 2 replicas
# PostgreSQL version
imageName: ghcr.io/cloudnative-pg/postgresql:16.6
# Storage
storage:
size: 100Gi
storageClass: gp3-encrypted # AWS EBS gp3 with encryption
# Resource allocation
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
memory: 8Gi
# PostgreSQL configuration
postgresql:
parameters:
shared_buffers: "1GB"
effective_cache_size: "3GB"
work_mem: "64MB"
maintenance_work_mem: "256MB"
max_connections: "200"
max_wal_size: "2GB"
min_wal_size: "512MB"
wal_level: "replica"
max_parallel_workers_per_gather: "4"
random_page_cost: "1.1" # SSD storage
# High availability
minSyncReplicas: 1
maxSyncReplicas: 1
# Backup to S3
backup:
barmanObjectStore:
destinationPath: "s3://my-pg-backups/app-db/"
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: ACCESS_SECRET_KEY
wal:
compression: gzip
maxParallel: 4
data:
compression: gzip
retentionPolicy: "30d"
# Anti-affinity — spread replicas across nodes
affinity:
enablePodAntiAffinity: true
topologyKey: kubernetes.io/hostname
# Monitoring
monitoring:
enablePodMonitor: true
customQueriesConfigMap:
- name: custom-pg-metrics
key: queries
Key configuration decisions:
- 3 instances — 1 primary, 2 replicas. Minimum for production HA. The primary handles writes, replicas handle reads and serve as failover candidates.
- Synchronous replication (
minSyncReplicas: 1) — at least one replica confirms every write. This prevents data loss during failover at the cost of slightly higher write latency. - Anti-affinity — ensures primary and replicas run on different nodes. If a node fails, the database survives.
- gp3 storage — AWS EBS gp3 provides consistent IOPS. Use
gp3-encryptedfor encryption at rest.
Step 3: Configure Backups
CloudNativePG does continuous backup using PostgreSQL’s WAL (Write-Ahead Log) archiving. Every transaction is streamed to S3 in near real-time.
Schedule regular base backups:
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: daily-backup
namespace: production
spec:
schedule: "0 2 * * *" # 2 AM daily
backupOwnerReference: self
cluster:
name: app-db
immediate: true
Test your backups. A backup that has never been restored is not a backup. Schedule monthly restore tests:
# Restore to a point in time (for testing)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: restore-test
namespace: testing
spec:
instances: 1
storage:
size: 100Gi
storageClass: gp3-encrypted
bootstrap:
recovery:
source: app-db
recoveryTarget:
targetTime: "2026-03-18T14:00:00Z" # Restore to this timestamp
externalClusters:
- name: app-db
barmanObjectStore:
destinationPath: "s3://my-pg-backups/app-db/"
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: ACCESS_SECRET_KEY
Step 4: Connect Your Application
CloudNativePG creates Kubernetes Services automatically:
| Service | Purpose |
|---|---|
app-db-rw | Read-write (primary only) |
app-db-ro | Read-only (replicas only) |
app-db-r | Read (any instance) |
Connect your application using the read-write service for writes and read-only service for read replicas:
# Application deployment
env:
- name: DATABASE_URL
value: "postgresql://app:$(DB_PASSWORD)@app-db-rw:5432/myapp?sslmode=require"
- name: DATABASE_URL_READONLY
value: "postgresql://app:$(DB_PASSWORD)@app-db-ro:5432/myapp?sslmode=require"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-db-app
key: password
CloudNativePG automatically generates database credentials and stores them in Kubernetes Secrets. The secret app-db-app is created automatically.
Step 5: Monitoring
CloudNativePG exposes Prometheus metrics natively. If you have Prometheus installed, add a PodMonitor:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: cnpg-metrics
namespace: production
spec:
selector:
matchLabels:
cnpg.io/cluster: app-db
podMetricsEndpoints:
- port: metrics
Essential Grafana dashboards:
CloudNativePG provides a pre-built Grafana dashboard that shows:
- Replication lag between primary and replicas
- Transaction throughput (TPS)
- Active connections vs max connections
- Buffer cache hit ratio
- WAL generation rate
- Disk usage and growth rate
Critical alerts to set:
# PrometheusRule for PostgreSQL alerts
groups:
- name: postgresql
rules:
- alert: PostgreSQLReplicationLagHigh
expr: cnpg_pg_replication_streaming_replicas_wal_lag_bytes > 100000000
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL replication lag > 100MB"
- alert: PostgreSQLConnectionsHigh
expr: cnpg_pg_stat_activity_count / cnpg_pg_settings_setting{name="max_connections"} > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL connections > 80% of max"
- alert: PostgreSQLDiskUsageHigh
expr: cnpg_pg_database_size_bytes / cnpg_pg_volume_size_bytes > 0.85
for: 10m
labels:
severity: critical
annotations:
summary: "PostgreSQL disk usage > 85%"
Failover: What Happens When the Primary Dies
CloudNativePG handles automatic failover:
- The operator detects the primary pod is unhealthy (liveness probe fails)
- The most up-to-date replica is promoted to primary (typically < 10 seconds)
- The
app-db-rwService automatically points to the new primary - Applications reconnect transparently (ensure your connection pool handles reconnects)
- A new replica is created to maintain the desired instance count
Important: Your application must handle database reconnection gracefully. Use a connection pooler (PgBouncer, built into CloudNativePG) and configure retry logic in your application.
# Enable PgBouncer pooler
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: app-db-pooler-rw
namespace: production
spec:
cluster:
name: app-db
instances: 2
type: rw
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "1000"
default_pool_size: "50"
Cost Comparison: CloudNativePG vs RDS
For a production setup (primary + 2 replicas, 100GB storage, 2 vCPU, 4GB RAM):
| Component | CloudNativePG on EKS | RDS Multi-AZ |
|---|---|---|
| Compute | ~$150/mo (shared EKS nodes) | ~$350/mo (db.r6g.large) |
| Storage | ~$30/mo (gp3 100GB x 3) | ~$35/mo (gp3 100GB) |
| Backup storage | ~$5/mo (S3) | ~$10/mo (RDS backup) |
| Operator/management | $0 (open source) | Included |
| Total | ~$185/mo | ~$395/mo |
| Operational effort | Medium (your team manages) | Low (AWS manages) |
CloudNativePG is ~53% cheaper but requires your team to manage upgrades, troubleshoot replication issues, and handle edge cases. The cost savings are real, but so is the operational overhead.
Common Mistakes
1. Skipping anti-affinity. Without pod anti-affinity, Kubernetes might schedule all 3 database pods on the same node. If that node fails, you lose all replicas simultaneously. Always set enablePodAntiAffinity: true.
2. Using Deployment instead of an operator. A Deployment with a PostgreSQL container is not a database cluster. It has no replication, no failover, and no backup management. Use an operator or use managed databases.
3. Not testing backups. Backups streaming to S3 are useless if you have never restored one. Schedule monthly restore tests to verify your recovery process works.
4. Undersizing storage. Growing a PVC is possible but risky. Start with 2x your expected data size and monitor growth rate. Running out of disk space is the most common Kubernetes database incident we see.
5. Missing connection pooling. PostgreSQL’s per-connection memory overhead is significant. Without PgBouncer, 500 application connections can overwhelm a database that could handle the query load easily. Always use connection pooling.
Need Help Running Databases on Kubernetes?
We set up and manage PostgreSQL on Kubernetes using CloudNativePG — from initial deployment to ongoing operations and performance tuning.
Our PostgreSQL consulting services cover:
- Architecture design — choose between CloudNativePG, managed RDS, or hybrid approaches
- Production setup — HA clusters with automated backup, failover, and monitoring
- Migration — move from RDS, standalone PostgreSQL, or other databases to Kubernetes-native PostgreSQL
- Performance tuning — query optimisation, connection pooling, and resource right-sizing
- Kubernetes cluster setup — EKS, AKS, or GKE optimised for stateful workloads