Engineering

Prometheus Monitoring Kubernetes: The Complete Production Guide

Amjad Syed - Founder & CEO

Setting up Prometheus monitoring for Kubernetes is one of those tasks that looks simple until you try to do it properly. The basic installation takes 10 minutes. Getting it production-ready takes weeks of tuning.

After implementing monitoring solutions for dozens of clients running Kubernetes, I have learned what separates a monitoring setup that just works from one that actually helps you sleep at night.

This guide covers everything you need to know about Prometheus monitoring Kubernetes in production. Not just installation, but the configuration decisions that matter.

Why Prometheus for Kubernetes Monitoring

Prometheus became the standard for Kubernetes monitoring for good reasons. It was designed for the same environment Kubernetes operates in: dynamic, distributed, and constantly changing.

Three things make Prometheus particularly suited for Kubernetes:

Pull-based collection means Prometheus discovers and scrapes targets automatically. When pods scale up or down, Prometheus adapts without manual configuration. This is critical in Kubernetes where IP addresses change constantly.

Dimensional data model lets you slice metrics by any label. Want to see CPU usage by namespace, deployment, and pod? That is a single query. Traditional monitoring tools struggle with this level of flexibility.

Native Kubernetes integration through service discovery means Prometheus understands Kubernetes primitives. It can automatically find and scrape pods, services, nodes, and endpoints.

The Cloud Native Computing Foundation graduated Prometheus in 2018, cementing its position as the monitoring standard for cloud-native applications.

Architecture Overview

Before diving into configuration, understanding the architecture helps you make better decisions.

A production Prometheus setup for Kubernetes typically includes:

Prometheus Server scrapes and stores metrics. In Kubernetes, this usually runs as a StatefulSet with persistent storage. You need persistent storage because losing your metrics history hurts during incident investigation.

Alertmanager handles alert routing, grouping, and silencing. It receives alerts from Prometheus and sends notifications to Slack, PagerDuty, or email. Running it separately from Prometheus keeps your alerting working even when Prometheus has issues.

Node Exporter runs on every node as a DaemonSet. It exposes hardware and OS metrics like CPU, memory, disk, and network. Without it, you only see container metrics, not the underlying infrastructure.

kube-state-metrics exposes Kubernetes object state as metrics. This includes deployment replicas, pod phases, node conditions, and resource requests. Prometheus cannot see this information directly from the Kubernetes API.

Grafana provides visualization. While Prometheus has a basic UI, Grafana is where you build dashboards and explore metrics interactively.

Installation Options

You have three main options for installing Prometheus on Kubernetes. Each has trade-offs.

The Prometheus Operator uses Kubernetes custom resources to manage Prometheus deployments. It is the most Kubernetes-native approach.

# Install using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=15d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

The kube-prometheus-stack includes Prometheus, Alertmanager, Grafana, and common exporters. It also includes pre-configured dashboards and alerting rules.

When to use: Most production deployments. The operator handles upgrades, scaling, and configuration changes gracefully.

Option 2: Helm Chart (Standalone)

If you want more control or do not need the operator pattern:

helm install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --create-namespace \
  --set server.retention=15d \
  --set server.persistentVolume.size=50Gi

When to use: Simpler setups or when you need fine-grained control over each component.

Option 3: Manual Deployment

Deploying each component manually gives you maximum control. We covered the detailed steps for installing Prometheus on Kubernetes in a separate guide.

When to use: Learning environments or highly customized setups.

Essential Metrics to Collect

Not all metrics matter equally. Focus on these categories for Kubernetes monitoring.

Node Metrics

Node Exporter provides infrastructure visibility:

# CPU usage by node
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage by node
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage by node
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Kubernetes State Metrics

kube-state-metrics exposes object states:

# Pods not running
kube_pod_status_phase{phase!="Running",phase!="Succeeded"} == 1

# Deployments with unavailable replicas
kube_deployment_status_replicas_unavailable > 0

# Nodes not ready
kube_node_status_condition{condition="Ready",status="true"} == 0

Container Metrics

cAdvisor (built into kubelet) provides container metrics:

# Container CPU usage
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace, pod)

# Container memory usage
sum(container_memory_working_set_bytes{container!=""}) by (namespace, pod)

# Container restarts
sum(kube_pod_container_status_restarts_total) by (namespace, pod)

These metrics form the foundation of our 10-layer monitoring framework that we implement for production Kubernetes clusters.

Service Discovery Configuration

Prometheus discovers targets automatically through Kubernetes service discovery. Here is a production-ready configuration:

scrape_configs:
  # Scrape pods with prometheus.io annotations
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Only scrape pods with prometheus.io/scrape=true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      # Use custom port if specified
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}
      # Use custom path if specified
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      # Add pod labels
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      # Add namespace label
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
      # Add pod name label
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod

With this configuration, any pod with these annotations gets scraped automatically:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

This pattern scales well. Teams can instrument their applications without touching the central Prometheus configuration.

Alerting Rules That Actually Help

Most alerting setups suffer from two problems: too many alerts or alerts that fire too late. Here are rules that balance signal and noise.

Infrastructure Alerts

groups:
  - name: kubernetes-infrastructure
    rules:
      # Node is down
      - alert: KubernetesNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.node }} is not ready"
          description: "Node has been not ready for more than 5 minutes."

      # Node disk pressure
      - alert: KubernetesNodeDiskPressure
        expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Node {{ $labels.node }} has disk pressure"

      # Node memory pressure
      - alert: KubernetesNodeMemoryPressure
        expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Node {{ $labels.node }} has memory pressure"

Workload Alerts

groups:
  - name: kubernetes-workloads
    rules:
      # Pod crash looping
      - alert: KubernetesPodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
          description: "Pod has restarted {{ $value }} times in the last 15 minutes."

      # Pod stuck pending
      - alert: KubernetesPodPending
        expr: kube_pod_status_phase{phase="Pending"} == 1
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is stuck pending"

      # Deployment replicas mismatch
      - alert: KubernetesDeploymentReplicasMismatch
        expr: kube_deployment_spec_replicas != kube_deployment_status_replicas_available
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has replica mismatch"

Resource Alerts

groups:
  - name: kubernetes-resources
    rules:
      # Container CPU throttling
      - alert: ContainerCPUThrottling
        expr: sum(increase(container_cpu_cfs_throttled_periods_total[5m])) by (namespace, pod, container) / sum(increase(container_cpu_cfs_periods_total[5m])) by (namespace, pod, container) > 0.25
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }} is being throttled"
          description: "Container is being CPU throttled {{ $value | humanizePercentage }} of the time."

      # Container approaching memory limit
      - alert: ContainerMemoryNearLimit
        expr: (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }} is near memory limit"

Grafana Dashboards

Prometheus stores the data. Grafana makes it useful. Here are the dashboards we deploy for every Kubernetes cluster.

Cluster Overview shows high-level health: node count, pod count, resource utilization, and top consumers. This is where you start during an incident.

Node Details drills into individual nodes: CPU, memory, disk, network, and running pods. Use this when the cluster overview shows a problem node.

Namespace Overview shows resource usage by namespace. Helpful for capacity planning and identifying resource hogs.

Pod Details shows individual pod metrics: restarts, resource usage, and network traffic.

The kube-prometheus-stack includes these dashboards pre-configured. You can also import community dashboards from Grafana’s dashboard library.

We cover Grafana setup and best practices in our Grafana support services.

Scaling Prometheus

Prometheus works well for small to medium clusters. As you scale, you hit limitations.

Single Prometheus Limitations

  • Memory usage grows with active time series
  • Query performance degrades with data volume
  • Single point of failure
  • Retention limited by local storage

Scaling Strategies

Vertical scaling works up to a point. Give Prometheus more memory and faster disks. This gets expensive quickly.

Sharding by namespace runs multiple Prometheus instances, each scraping a subset of namespaces. Works but requires careful query routing.

Thanos or Cortex adds long-term storage and global querying. Prometheus instances push data to a central store. We cover Thanos support in detail for clients needing this level of scale.

Victoria Metrics offers a drop-in Prometheus replacement with better resource efficiency. Worth considering for large clusters.

For most teams, a single well-tuned Prometheus handles clusters up to a few hundred nodes. Beyond that, consider the architectural options above.

Common Problems and Solutions

Problem: High Memory Usage

Prometheus memory usage correlates with active time series. Each unique combination of metric name and labels is a time series.

Solution: Reduce cardinality. Avoid labels with high cardinality like user IDs, request IDs, or timestamps. Use relabeling to drop unnecessary labels before storage.

metric_relabel_configs:
  - source_labels: [__name__]
    regex: 'go_.*'
    action: drop
  - regex: 'kubernetes_io_.*'
    action: labeldrop

Problem: Scrape Failures

Targets showing as down in Prometheus.

Solution: Check network policies, service account permissions, and target annotations. Common causes:

  • Pod not exposing metrics endpoint
  • Wrong port in annotation
  • Network policy blocking Prometheus
  • ServiceMonitor selector not matching

Problem: Missing Metrics

Prometheus is running but expected metrics are not there.

Solution: Verify the scrape config matches your targets. Check the Prometheus UI targets page. Look for relabel rules that might be dropping metrics.

Problem: Alert Fatigue

Too many alerts firing, team ignores them.

Solution: Review and tune alert thresholds. Add for duration to avoid flapping. Group related alerts. Delete alerts that never result in action.

Production Checklist

Before going to production, verify:

  • Persistent storage configured for Prometheus
  • Resource requests and limits set appropriately
  • Pod disruption budget configured
  • Alertmanager configured with notification channels
  • Critical alerts tested and verified
  • Grafana dashboards reviewed and customized
  • Retention period matches your needs
  • Backup strategy for Prometheus data
  • Network policies allow scraping
  • ServiceAccount has necessary RBAC permissions

Next Steps

Prometheus monitoring Kubernetes is foundational. Once it is running well, consider:


Need Help With Prometheus Monitoring?

We implement Prometheus monitoring for Kubernetes clusters ranging from startups to enterprises. If you want help getting this right the first time:

Book a free 30-minute consultation to discuss your monitoring needs.

Or explore our Prometheus consulting services for ongoing support.

Chat with real humans
Chat on WhatsApp