Kubernetes VPA (Vertical Pod Autoscaler) automatically adjusts CPU and memory requests for your pods based on actual usage patterns. According to the CNCF Annual Survey 2023, 71% of organizations struggle with Kubernetes resource management, making VPA a critical tool for optimization.
What Is Kubernetes VPA?
The Vertical Pod Autoscaler is a Kubernetes component that automatically sets resource requests and limits for containers based on historical and current resource utilization. Unlike the Horizontal Pod Autoscaler (HPA) that scales the number of pods, VPA adjusts the resources allocated to existing pods.
VPA consists of three main components:
- Recommender: Monitors resource usage and provides recommendations
- Updater: Evicts pods that need resource adjustments
- Admission Controller: Sets correct resource requests on new pods
For organizations looking to optimize their Kubernetes infrastructure, our Kubernetes cost optimization services can help implement VPA alongside other efficiency strategies.
Why Use Kubernetes VPA?
Manually setting resource requests is challenging and often leads to either over-provisioning (wasting money) or under-provisioning (causing performance issues). VPA solves this by:
- Reducing resource waste: Automatically right-sizes pods based on actual usage
- Preventing OOM kills: Ensures pods have sufficient memory
- Improving cluster utilization: Frees up resources for other workloads
- Saving costs: According to Kubernetes cost management research, VPA can reduce resource waste by 30-50%
How Does Kubernetes VPA Work?
VPA operates in different modes to suit various use cases:
VPA Update Modes
- Off: Only provides recommendations without applying changes
- Initial: Sets resources only when pods are created
- Recreate: Evicts and recreates pods with new resource settings
- Auto: Automatically updates resources (currently same as Recreate)
The VPA workflow follows these steps:
- VPA Recommender analyzes pod metrics from the Metrics Server
- Calculates optimal CPU and memory requests based on historical data
- Updater evicts pods that deviate significantly from recommendations
- Admission Controller applies new resource requests when pods restart
Installing Kubernetes VPA
Before implementing VPA, ensure you have the Metrics Server installed for resource monitoring. Here’s how to deploy VPA:
Prerequisites
- Kubernetes cluster version 1.11 or higher
- Metrics Server running
- kubectl access with cluster-admin privileges
Installation Steps
# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Deploy VPA components
./hack/vpa-up.sh
# Verify installation
kubectl get pods -n kube-system | grep vpa
You should see three pods running: vpa-recommender, vpa-updater, and vpa-admission-controller.
Creating Your First VPA Configuration
Let’s create a basic VPA for a sample application. Here’s a complete example:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: default
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
Apply this configuration:
kubectl apply -f vpa-config.yaml
# Check VPA status
kubectl describe vpa my-app-vpa
VPA Best Practices and Limitations
When to Use VPA
VPA works best for:
- Stateful applications: Databases, caches, message queues
- Long-running workloads: Applications with predictable usage patterns
- Development environments: Where pod restarts are acceptable
- Cost optimization initiatives: As part of a broader Kubernetes FinOps strategy
Critical Limitations
- Cannot use with HPA on CPU/memory: VPA and HPA conflict when scaling on the same metrics
- Requires pod restarts: In Recreate mode, VPA evicts pods to apply changes
- Not suitable for stateless apps: HPA is often better for scaling stateless workloads
- Initial learning period: VPA needs time to gather metrics before making recommendations
For production deployments, consider our Kubernetes consulting services to design an optimal autoscaling strategy.
VPA vs HPA: Choosing the Right Autoscaler
Understanding when to use each autoscaler is crucial:
| Feature | VPA | HPA |
|---|---|---|
| Scaling Direction | Vertical (resources) | Horizontal (replicas) |
| Pod Restarts | Required (Recreate mode) | Not required |
| Best For | Stateful apps | Stateless apps |
| Metrics | CPU, Memory | CPU, Memory, Custom |
| Cluster Impact | Resource optimization | Capacity planning |
According to the Kubernetes documentation, you can use VPA and HPA together if HPA scales on custom metrics while VPA handles CPU/memory.
Monitoring VPA Recommendations
To view VPA recommendations without applying them, use “Off” mode:
updatePolicy:
updateMode: "Off"
Then check recommendations:
kubectl get vpa my-app-vpa -o jsonpath='{.status.recommendation}' | jq
This shows:
- Target: Recommended resource requests
- LowerBound: Minimum viable resources
- UpperBound: Maximum recommended resources
- UncappedTarget: Recommendation without policy constraints
Integrate VPA metrics with Prometheus monitoring for comprehensive observability.
Advanced VPA Configurations
Controlling Resource Policies
Fine-tune VPA behavior with resource policies:
resourcePolicy:
containerPolicies:
- containerName: "my-container"
mode: "Auto"
minAllowed:
cpu: "250m"
memory: "256Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
The controlledValues field determines what VPA adjusts:
- RequestsAndLimits: Updates both (maintains ratio)
- RequestsOnly: Only adjusts requests (default)
VPA for Multiple Containers
For pods with multiple containers, specify policies per container:
containerPolicies:
- containerName: "app"
minAllowed:
memory: "512Mi"
- containerName: "sidecar"
mode: "Off"
minAllowed:
memory: "128Mi"
This allows you to exclude sidecars or set different policies for each container.
Troubleshooting Common VPA Issues
VPA Not Making Recommendations
If VPA shows no recommendations:
- Check Metrics Server: Ensure it’s collecting data
kubectl top nodes kubectl top pods - Verify VPA components: All three pods should be running
- Wait for data: VPA needs 24-48 hours of metrics for accurate recommendations
- Check logs: Review recommender logs for errors
kubectl logs -n kube-system deployment/vpa-recommender
Pods Not Being Updated
If recommendations exist but pods aren’t updated:
- Verify
updateModeis set to “Auto” or “Recreate” - Check if pods are managed by a controller (Deployment, StatefulSet)
- Review updater logs for eviction issues
- Ensure PodDisruptionBudgets aren’t blocking evictions
For complex troubleshooting scenarios, our Kubernetes production support team provides 24/7 assistance.
VPA in Production: Real-World Considerations
Implementing VPA in production requires careful planning:
Gradual Rollout Strategy
- Start with “Off” mode: Monitor recommendations without changes
- Test in development: Validate VPA behavior with non-critical workloads
- Use “Initial” mode: Apply resources only to new pods
- Enable “Auto” selectively: Start with stateful apps that tolerate restarts
Combining VPA with Cluster Autoscaler
VPA works well with Cluster Autoscaler for comprehensive optimization:
- VPA right-sizes individual pods
- Cluster Autoscaler adjusts node count based on total resource needs
- Together, they minimize both waste and costs
This combination is particularly effective in cloud environments like AWS EKS, Azure AKS, or Google GKE.
Measuring VPA Impact
Track these metrics to quantify VPA benefits:
- Resource utilization: Compare actual vs. requested resources
- Cost savings: Calculate reduced resource waste
- OOM kill rate: Monitor memory-related pod failures
- Pod restart frequency: Track evictions and restarts
According to case studies from our Kubernetes cost optimization work, organizations typically see:
- 30-50% reduction in over-provisioned resources
- 20-40% decrease in cluster costs
- 60% fewer OOM-related incidents
Frequently Asked Questions
What is the difference between VPA and HPA in Kubernetes?
VPA (Vertical Pod Autoscaler) adjusts the CPU and memory resources allocated to existing pods, while HPA (Horizontal Pod Autoscaler) changes the number of pod replicas. VPA is best for stateful applications, whereas HPA suits stateless workloads that can scale horizontally.
Does Kubernetes VPA require pod restarts?
Yes, in “Auto” and “Recreate” modes, VPA must restart pods to apply new resource requests. The “Initial” mode only sets resources when pods are first created, avoiding restarts. The “Off” mode provides recommendations without any changes.
Can I use VPA and HPA together?
You cannot use VPA and HPA together if both scale on CPU or memory metrics, as they will conflict. However, you can combine them if HPA scales on custom metrics (like request rate) while VPA manages CPU/memory resources.
How long does VPA need to collect data before making recommendations?
VPA typically needs 24-48 hours of metrics to generate accurate recommendations. For new applications, it starts with conservative estimates and refines them over time as more usage data becomes available.
Ready to optimize your Kubernetes resource usage? Our Kubernetes consulting team can implement VPA and other autoscaling strategies tailored to your workloads. We’ve helped organizations reduce cluster costs by up to 50% through intelligent resource management.