The cloud native database landscape has matured significantly. Organisations running containerised workloads on Kubernetes require databases that match the elasticity, resilience, and operational model of their application infrastructure. A cloud native database in 2026 must support horizontal scaling, automated failover, declarative management through Kubernetes operators, and seamless integration with modern observability stacks.
This guide examines the essential characteristics of cloud native databases, evaluates leading solutions across different categories, and provides practical guidance for selecting and operating databases in containerised environments.
What Defines a Cloud Native Database
A cloud native database is purpose-built or adapted to run effectively in containerised, orchestrated environments. Unlike traditional databases designed for static server deployments, cloud native databases embrace the ephemeral, distributed nature of modern infrastructure.
Core Characteristics
Horizontal scalability: Cloud native databases scale by adding nodes rather than upgrading hardware. They distribute data across multiple nodes using sharding, partitioning, or replication strategies that allow capacity to grow linearly with demand.
Self-healing and automated operations: When nodes fail, cloud native databases automatically redistribute data, promote replicas, and rebalance workloads without manual intervention. This aligns with the declarative reconciliation model that defines cloud native systems.
Kubernetes-native management: Modern databases provide Custom Resource Definitions (CRDs) and operators that enable management through familiar Kubernetes constructs. Database clusters become declarative resources managed alongside application workloads.
API-first architecture: Everything is programmable. Provisioning, scaling, backup, and monitoring are accessible through APIs, enabling GitOps workflows and infrastructure as code practices.
Observability integration: Cloud native databases expose metrics in Prometheus format, support distributed tracing, and generate structured logs compatible with centralised logging platforms. This integration is essential for maintaining visibility in production Kubernetes environments.
The Database-per-Service Pattern
Microservices architectures favour independent data stores for each service. This pattern provides:
- Autonomy - Teams choose the database type best suited to their service’s data model
- Isolation - Schema changes and performance issues remain contained
- Scalability - Each database scales according to its service’s demands
- Resilience - Database failures affect only the owning service
However, this pattern introduces complexity around data consistency, cross-service queries, and operational overhead. Understanding CAP theorem trade-offs becomes essential when designing distributed data architectures.
Categories of Cloud Native Databases
Distributed SQL (NewSQL)
Distributed SQL databases provide ACID transactions and SQL compatibility while scaling horizontally across multiple nodes. They combine the familiarity of relational databases with cloud native scalability.
Leading solutions:
| Database | Architecture | Consistency Model | Best For |
|---|---|---|---|
| CockroachDB | Shared-nothing, Raft consensus | Serializable | Global applications, multi-region |
| YugabyteDB | Distributed, PostgreSQL-compatible | Strong consistency | PostgreSQL migrations, hybrid cloud |
| TiDB | MySQL-compatible, TiKV storage | Strong consistency | MySQL scale-out, HTAP workloads |
| PlanetScale | MySQL-compatible, Vitess-based | Eventual/strong | Serverless MySQL, developer experience |
| Spanner | Google proprietary | External consistency | Global scale, strict consistency |
CockroachDB has emerged as the leading open source distributed SQL database. Its architecture provides automatic sharding, rebalancing, and multi-region deployment with serializable isolation. CockroachDB is PostgreSQL wire-compatible, enabling existing applications to migrate with minimal code changes.
YugabyteDB offers high PostgreSQL compatibility, supporting advanced features like stored procedures, triggers, and extensions. For organisations with significant PostgreSQL investments, YugabyteDB provides a natural scale-out path while maintaining familiar tooling.
TiDB targets MySQL workloads requiring horizontal scale. Its hybrid transactional and analytical processing (HTAP) capability enables real-time analytics on operational data without separate data warehouses.
Cloud Native NoSQL
NoSQL databases designed for cloud native environments provide schema flexibility and specialised data models for specific use cases.
Document databases:
- MongoDB Atlas - Managed MongoDB with serverless and dedicated options
- Amazon DocumentDB - MongoDB-compatible managed service
- FerretDB - Open source MongoDB alternative using PostgreSQL
Key-value stores:
- Redis Enterprise - Distributed Redis with persistence and clustering
- Amazon DynamoDB - Serverless key-value with global tables
- ScyllaDB - High-performance Cassandra-compatible database
Time-series databases:
- TimescaleDB - PostgreSQL extension for time-series data
- InfluxDB - Purpose-built time-series with flux query language
- VictoriaMetrics - High-performance metrics storage
For comprehensive coverage of NoSQL options, see our guide to top NoSQL databases in 2026.
Vector Databases for AI Workloads
The AI revolution has created demand for specialised databases storing and querying high-dimensional embeddings. Vector databases enable similarity search, recommendation systems, and retrieval-augmented generation (RAG) for large language models.
Leading vector databases:
- Pinecone - Managed vector search with serverless scaling
- Weaviate - Open source with hybrid keyword and vector search
- Milvus - Distributed vector database for large-scale AI
- Chroma - Embedded database optimised for RAG applications
- Qdrant - High-performance open source vector search
Vector databases integrate with embedding models from OpenAI, Cohere, and open source alternatives to power semantic search and AI applications. Our guide to vector databases provides detailed comparisons.
Analytical and OLAP Databases
Cloud native analytical databases process large-scale queries across distributed data sets, powering business intelligence, reporting, and data science workloads.
Column-oriented databases:
- ClickHouse - High-performance analytical database with real-time capabilities
- Apache Druid - Real-time analytics for event-driven data
- DuckDB - Embedded analytical database for local processing
Data warehouses:
- Snowflake - Multi-cloud data warehouse with separation of storage and compute
- Databricks - Unified analytics platform with lakehouse architecture
- BigQuery - Serverless data warehouse with ML integration
We use ClickHouse for analytical workloads due to its exceptional query performance and cost efficiency for time-series and event data.
Streaming and Event Databases
Modern applications increasingly rely on event-driven architectures. Streaming databases combine message queue capabilities with database-like query interfaces.
Event streaming platforms:
- Apache Kafka - Distributed event streaming with exactly-once semantics
- Amazon MSK - Managed Kafka with AWS integration
- Redpanda - Kafka-compatible with simplified operations
- Apache Pulsar - Multi-tenant streaming with tiered storage
Stream processing:
- ksqlDB - SQL interface for Kafka stream processing
- Apache Flink - Stateful stream processing at scale
- Materialize - SQL on streaming data with incremental updates
For Kafka implementation guidance, see our Amazon MSK primer.
Kubernetes Database Operators
Kubernetes operators automate database lifecycle management, encoding operational knowledge into software. Operators handle provisioning, scaling, backup, failover, and upgrades through declarative custom resources.
Operator Architecture
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-db
namespace: databases
spec:
instances: 3
primaryUpdateStrategy: unsupervised
storage:
size: 100Gi
storageClass: fast-ssd
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
backup:
barmanObjectStore:
destinationPath: s3://backups/production-db
s3Credentials:
accessKeyId:
name: backup-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-credentials
key: SECRET_ACCESS_KEY
wal:
compression: gzip
retentionPolicy: "30d"
This declarative specification manages a three-node PostgreSQL cluster with automated backup to S3. The operator handles leader election, replica configuration, connection pooling, and point-in-time recovery.
Leading Database Operators
PostgreSQL operators:
- CloudNativePG - CNCF sandbox project, comprehensive PostgreSQL management
- Crunchy PGO - Enterprise PostgreSQL with monitoring integration
- Zalando Postgres Operator - Battle-tested at scale
MySQL operators:
- Oracle MySQL Operator - Official MySQL operator for Kubernetes
- Percona Operator for MySQL - Enterprise features with Percona distribution
- Vitess - Horizontal sharding for MySQL at scale
Other databases:
- Strimzi - Apache Kafka operator with extensive customisation
- MongoDB Community Operator - MongoDB deployment on Kubernetes
- Redis Operator - Redis cluster management
Operator Selection Criteria
When evaluating operators, consider:
- Maturity and community - Production deployments and active maintenance
- Backup and recovery - Automated backup, point-in-time recovery support
- Scaling capabilities - Horizontal and vertical scaling automation
- High availability - Automatic failover and replica management
- Monitoring integration - Prometheus metrics, Grafana dashboards
- Security features - TLS, authentication, network policies
Data Persistence on Kubernetes
Stateful workloads on Kubernetes require careful attention to storage configuration. Unlike stateless applications that can be freely rescheduled, databases need persistent storage that survives pod restarts and node failures.
Storage Classes and CSI Drivers
Kubernetes abstracts storage through StorageClasses and Container Storage Interface (CSI) drivers:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "10000"
throughput: "500"
encrypted: "true"
kmsKeyId: arn:aws:kms:region:account:key/key-id
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
Key considerations:
- Performance tier - Match storage performance to workload requirements
- Encryption - Enable encryption at rest for sensitive data
- Volume binding - Use
WaitForFirstConsumerfor topology-aware provisioning - Reclaim policy - Use
Retainfor databases to prevent accidental data loss - Expansion - Enable volume expansion for growing data sets
StatefulSets for Database Workloads
StatefulSets provide stable network identities and persistent storage essential for databases:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
StatefulSets ensure:
- Pods receive stable DNS names (postgres-0, postgres-1, postgres-2)
- Each pod gets its own PersistentVolumeClaim
- Pods are created and deleted in order
- Storage persists across pod restarts
Local vs Network Storage
Network-attached storage (EBS, Azure Disk, GCP PD):
- Survives node failures
- Can be moved between nodes (with downtime)
- Higher latency than local storage
- Suitable for most production workloads
Local storage (NVMe, local SSD):
- Lowest latency and highest throughput
- Data lost if node fails
- Requires application-level replication
- Best for databases with built-in replication (CockroachDB, Cassandra)
For latency-sensitive workloads, consider databases designed for local storage with built-in replication. The database handles data durability, eliminating the need for network storage overhead.
Multi-Region and Global Distribution
Global applications require databases that operate across geographic regions, providing low-latency access for users worldwide while maintaining consistency guarantees.
Global Database Architectures
Active-active multi-region: All regions accept writes, with conflict resolution handling concurrent updates. CockroachDB and YugabyteDB support this pattern with configurable consistency levels.
-- CockroachDB: Configure regional placement
ALTER DATABASE app SET PRIMARY REGION = "us-east1";
ALTER DATABASE app ADD REGION "eu-west1";
ALTER DATABASE app ADD REGION "ap-southeast1";
-- Pin table data to specific regions
ALTER TABLE users SET LOCALITY REGIONAL BY ROW;
Active-passive with read replicas: One region handles writes, with read replicas in other regions serving read traffic. This pattern works with traditional databases like PostgreSQL and MySQL.
Geo-partitioned data: Data is partitioned by geography, with each region owning data for local users. This reduces cross-region latency for most operations while maintaining global accessibility when needed.
Consistency vs Latency Trade-offs
Global distribution forces trade-offs between consistency and latency:
| Consistency Level | Cross-Region Latency | Use Case |
|---|---|---|
| Strong/Serializable | High (100-300ms) | Financial transactions, inventory |
| Bounded staleness | Medium (50-100ms) | User profiles, content |
| Eventual | Low (<50ms) | Analytics, caching, activity feeds |
Choose consistency levels based on business requirements. Not all data requires strong consistency, and mixing consistency levels within an application optimises both correctness and performance.
Disaster Recovery Considerations
Multi-region deployment addresses disaster recovery requirements:
- RTO (Recovery Time Objective) - Automatic failover enables near-zero RTO
- RPO (Recovery Point Objective) - Synchronous replication provides zero RPO
- Compliance - Data residency requirements may mandate regional data placement
Design for regional isolation while maintaining the ability to serve traffic from any region during outages. Test failover procedures regularly to ensure recovery processes work under pressure.
Managed vs Self-Managed Databases
The build-vs-buy decision significantly impacts operational overhead, cost, and flexibility.
Managed Database Services
Cloud providers offer fully managed database services that eliminate operational burden:
AWS:
- RDS (PostgreSQL, MySQL, MariaDB, Oracle, SQL Server)
- Aurora (PostgreSQL, MySQL with enhanced performance)
- DynamoDB (Serverless key-value)
- DocumentDB (MongoDB-compatible)
- ElastiCache (Redis, Memcached)
- Neptune (Graph database)
- Timestream (Time-series)
Azure:
- Azure SQL Database
- Cosmos DB (Multi-model)
- Azure Database for PostgreSQL/MySQL
- Azure Cache for Redis
GCP:
- Cloud SQL (PostgreSQL, MySQL, SQL Server)
- Cloud Spanner (Global distributed SQL)
- Firestore (Document database)
- Bigtable (Wide-column)
- Memorystore (Redis)
For RDS implementation guidance, see our Terraform RDS tutorial.
Self-Managed on Kubernetes
Running databases on Kubernetes provides:
- Consistency - Same operational model for applications and data
- Portability - Avoid cloud provider lock-in
- Cost control - Potential savings at scale
- Customisation - Full control over configuration
However, self-management requires:
- Expertise - Deep knowledge of database operations
- Tooling - Backup, monitoring, alerting infrastructure
- On-call - 24/7 support for database issues
- Capacity planning - Proactive scaling and resource management
Decision Framework
Choose managed services when:
- Team lacks database operational expertise
- Rapid time-to-market is critical
- Compliance requires vendor-supported solutions
- Workload fits managed service constraints
Choose self-managed when:
- Cost optimisation is paramount at scale
- Specific configuration requirements exceed managed options
- Multi-cloud portability is required
- Team has strong database operations skills
Many organisations adopt a hybrid approach: managed services for critical production workloads with self-managed databases for development, testing, and cost-sensitive workloads.
Performance Optimisation
Database performance in containerised environments requires attention to resource allocation, query optimisation, and infrastructure configuration.
Resource Allocation
Memory: Databases are memory-intensive. Allocate sufficient memory for buffer pools, caches, and working memory. Monitor memory pressure and adjust limits accordingly.
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "4"
CPU: Set CPU requests to ensure consistent performance. Avoid CPU limits for databases, as they can cause latency spikes during garbage collection or background operations.
Storage IOPS: Provision storage with adequate IOPS for your workload. Monitor I/O wait times and upgrade storage tier if database performance is I/O-bound.
Connection Pooling
Database connections are expensive. Implement connection pooling to manage connections efficiently:
PgBouncer for PostgreSQL:
apiVersion: v1
kind: ConfigMap
metadata:
name: pgbouncer-config
data:
pgbouncer.ini: |
[databases]
app = host=postgres port=5432 dbname=app
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Connection pooling is especially important in Kubernetes, where application pods scale dynamically and can exhaust database connections during scale-up events.
Query Optimisation
Monitor slow queries and optimise based on execution plans:
-- PostgreSQL: Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000;
ALTER SYSTEM SET log_statement = 'all';
-- Analyse query execution
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;
Index optimisation, query rewriting, and schema design remain fundamental regardless of whether databases run on Kubernetes or traditional infrastructure.
Caching Strategies
Implement caching to reduce database load:
Application-level caching: Cache frequently accessed data in application memory or distributed caches like Redis.
Read replicas: Route read traffic to replicas, reserving the primary for writes.
Materialised views: Pre-compute expensive aggregations for analytical queries.
For caching implementation guidance, see our comparison of Redis vs Memcached.
Security Best Practices
Database security in cloud native environments requires defence in depth across network, authentication, and data protection layers.
Network Security
Network policies: Restrict database access to authorised pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-access
namespace: databases
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
- podSelector:
matchLabels:
role: api
ports:
- protocol: TCP
port: 5432
Service mesh mTLS: Encrypt traffic between applications and databases using service mesh mutual TLS.
Authentication and Authorisation
IAM integration: Use cloud provider IAM for database authentication where supported (RDS IAM, GCP Cloud SQL IAM).
Certificate authentication: Configure TLS client certificates for service-to-database authentication.
Role-based access: Implement least-privilege database roles for each application:
-- Create application role with minimal permissions
CREATE ROLE app_service WITH LOGIN PASSWORD 'secure_password';
GRANT SELECT, INSERT, UPDATE, DELETE ON orders TO app_service;
GRANT USAGE, SELECT ON SEQUENCE orders_id_seq TO app_service;
Encryption
At rest: Enable storage encryption using cloud provider KMS or database-native encryption.
In transit: Require TLS for all database connections:
# PostgreSQL TLS configuration
ssl: "on"
ssl_cert_file: "/certs/server.crt"
ssl_key_file: "/certs/server.key"
ssl_ca_file: "/certs/ca.crt"
Application-level: Consider column-level encryption for highly sensitive data using application-managed keys.
Secrets Management
Store database credentials in secrets management systems:
- Kubernetes Secrets with encryption at rest
- HashiCorp Vault with dynamic credentials
- AWS Secrets Manager with automatic rotation
- External Secrets Operator for cloud provider integration
For comprehensive security guidance, see our Kubernetes security best practices.
Backup and Disaster Recovery
Database backup strategies must account for the dynamic nature of containerised environments.
Backup Approaches
Logical backups: Export data in portable formats (pg_dump, mysqldump). Useful for cross-version migrations and selective restoration.
Physical backups: Copy data files directly. Faster for large databases but tied to specific versions.
Continuous archiving: Stream write-ahead logs (WAL) for point-in-time recovery. Essential for minimising data loss.
Backup Automation
Database operators automate backup management:
# CloudNativePG backup configuration
backup:
barmanObjectStore:
destinationPath: s3://backups/production
s3Credentials:
accessKeyId:
name: backup-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
maxParallel: 4
retentionPolicy: "30d"
# Scheduled backup
scheduledBackup:
schedule: "0 0 * * *" # Daily at midnight
immediate: true
backupOwnerReference: self
Testing Recovery
Backup strategies are worthless without tested recovery procedures:
- Schedule regular recovery tests
- Document recovery runbooks
- Measure actual RTO and RPO
- Validate backup integrity automatically
Observability and Monitoring
Database observability integrates with platform monitoring to provide unified visibility.
Metrics Collection
Export database metrics to Prometheus:
PostgreSQL metrics:
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-exporter-queries
data:
queries.yaml: |
pg_replication:
query: "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag"
metrics:
- lag:
usage: "GAUGE"
description: "Replication lag in seconds"
Key metrics to monitor:
- Connection pool utilisation
- Query latency percentiles
- Replication lag
- Buffer cache hit ratio
- Lock contention
- Transaction throughput
Alerting
Configure alerts for database health:
groups:
- name: database-alerts
rules:
- alert: HighReplicationLag
expr: pg_replication_lag > 30
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL replication lag is high"
- alert: ConnectionPoolExhausted
expr: pgbouncer_pools_cl_active / pgbouncer_pools_maxclient > 0.9
for: 2m
labels:
severity: critical
annotations:
summary: "Connection pool nearly exhausted"
Distributed Tracing
Trace database queries through the application stack using OpenTelemetry instrumentation. This visibility helps identify slow queries and their impact on user-facing latency.
Implementation Roadmap
Phase 1: Assessment (Week 1)
Evaluate current state:
- Document existing databases and their characteristics
- Identify workload patterns (OLTP, OLAP, mixed)
- Assess data volumes and growth projections
- Review consistency and availability requirements
Define requirements:
- Performance SLAs (latency, throughput)
- Availability targets (RTO, RPO)
- Compliance constraints (data residency, encryption)
- Budget parameters
Phase 2: Selection (Weeks 2-3)
Evaluate candidates:
- Match database categories to workload requirements
- Assess Kubernetes operator maturity
- Compare managed vs self-managed options
- Consider vendor lock-in implications
Proof of concept:
- Deploy candidates in test environment
- Benchmark with representative workloads
- Validate failover and recovery procedures
- Test integration with existing tooling
Phase 3: Implementation (Weeks 4-6)
Infrastructure setup:
- Configure storage classes and provisioners
- Deploy database operators
- Establish backup infrastructure
- Integrate monitoring and alerting
Migration planning:
- Design migration strategy (lift-and-shift, refactor)
- Plan data migration approach
- Schedule maintenance windows
- Prepare rollback procedures
Phase 4: Migration (Weeks 7-10)
Execute migration:
- Migrate non-production environments first
- Validate application functionality
- Monitor performance and resource utilisation
- Execute production migration with minimal downtime
Documentation and training:
- Document operational procedures
- Train teams on new tooling
- Establish on-call procedures
- Create runbooks for common issues
Phase 5: Optimisation (Ongoing)
Continuous improvement:
- Monitor performance trends
- Optimise resource allocation
- Refine backup and recovery procedures
- Update capacity planning
Conclusion
Cloud native databases have matured to support the most demanding production workloads. Distributed SQL databases like CockroachDB and YugabyteDB provide horizontal scale with familiar SQL interfaces. Kubernetes operators automate complex operational tasks, enabling teams to manage databases alongside application workloads using consistent GitOps practices.
Key recommendations:
- Match database to workload - Choose databases based on data model, consistency requirements, and access patterns rather than familiarity alone
- Leverage operators - Use mature Kubernetes operators to automate provisioning, scaling, backup, and failover
- Plan for failure - Design for node and zone failures with appropriate replication and backup strategies
- Integrate observability - Ensure database metrics, logs, and traces flow into platform monitoring systems
- Consider managed services - Evaluate the total cost of ownership including operational overhead, not just licensing
The database landscape continues to evolve with AI-native vector databases, real-time analytical systems, and serverless architectures pushing boundaries. Organisations that invest in cloud native database infrastructure position themselves to adopt these innovations while maintaining operational excellence.
Need help modernising your data infrastructure? Tasrie IT Services specialises in cloud native database implementations, Kubernetes consulting, and cloud migration. Our team has deployed distributed databases for organisations processing billions of transactions across global infrastructure.
Schedule a consultation to evaluate your database architecture and develop a modernisation roadmap tailored to your requirements.
Related Resources
- Top 10 NoSQL Databases in 2026
- Top 5 Vector Databases in 2025
- Understanding the CAP Theorem
- NoSQL vs MySQL: When to Choose Each
- Redis vs Memcached Comparison
- Cloud Native Fundamentals
External Resources: