Understanding what is kubernetes tolerations and taints is fundamental to controlling where pods run in your cluster. These mechanisms work together to ensure pods land on the right nodes, prevent resource contention, and maintain workload isolation. According to the CNCF Annual Survey 2023, 96% of organizations are now using or evaluating Kubernetes, making these scheduling primitives essential knowledge for platform teams.
Taints and tolerations act as a repel-and-attract system. Nodes can be “tainted” to repel pods, while pods can declare “tolerations” to override those taints and schedule anyway. This gives you fine-grained control over pod placement without manually assigning nodes.
What Are Kubernetes Taints?
Taints are properties applied to nodes that repel pods unless those pods explicitly tolerate the taint. Think of a taint as a “keep out” sign that only certain pods can ignore.
A taint consists of three components:
- Key: An identifier for the taint (e.g.,
gpu,dedicated,environment) - Value: Optional data associated with the key (e.g.,
true,production,team-a) - Effect: What happens to pods that don’t tolerate the taint
The three taint effects determine scheduling behavior:
- NoSchedule: Prevents new pods from scheduling on the node unless they tolerate the taint. Existing pods remain unaffected.
- PreferNoSchedule: Soft version of NoSchedule. Kubernetes tries to avoid scheduling pods here but will do so if no other options exist.
- NoExecute: Prevents new pods AND evicts existing pods that don’t tolerate the taint. This is the most aggressive effect.
Here’s how to apply a taint to a node:
kubectl taint nodes node1 gpu=true:NoSchedule
This command taints node1 with key gpu, value true, and effect NoSchedule. Now only pods with a matching toleration can schedule on this node.
Common use cases for taints include:
- Dedicating nodes to specific workloads (GPU nodes, high-memory nodes)
- Isolating production from non-production workloads
- Preventing pods from scheduling on nodes undergoing maintenance
- Creating multi-tenant clusters with team-specific nodes
What Are Kubernetes Tolerations?
Tolerations are properties added to pod specifications that allow (but don’t require) pods to schedule on nodes with matching taints. A toleration says “I can tolerate this taint and schedule on nodes that have it.”
A toleration must match the taint’s key, value, and effect to work. Here’s a pod with a toleration:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: nvidia/cuda:11.0-base
tolerations:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
This pod tolerates the gpu=true:NoSchedule taint, allowing it to schedule on nodes with that taint.
Tolerations support two operators:
- Equal: The toleration’s key and value must exactly match the taint
- Exists: Only the key must match; value is ignored
The Exists operator is useful for tolerating any value:
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
This tolerates any taint with key dedicated, regardless of value.
How Do Taints and Tolerations Work Together?
The taint-toleration system follows these rules:
- No taint on node: Any pod can schedule there
- Taint on node, no toleration on pod: Pod cannot schedule (for NoSchedule/PreferNoSchedule) or gets evicted (for NoExecute)
- Taint on node, matching toleration on pod: Pod can schedule
- Multiple taints on node: Pod must tolerate ALL taints to schedule
Let’s walk through a practical example. Suppose you have GPU nodes that should only run machine learning workloads:
Step 1: Taint the GPU nodes
kubectl taint nodes gpu-node-1 workload=ml:NoSchedule
kubectl taint nodes gpu-node-2 workload=ml:NoSchedule
Step 2: Add tolerations to ML pods
apiVersion: v1
kind: Pod
metadata:
name: tensorflow-training
spec:
containers:
- name: tf-container
image: tensorflow/tensorflow:latest-gpu
tolerations:
- key: "workload"
operator: "Equal"
value: "ml"
effect: "NoSchedule"
Step 3: Regular pods without tolerations won’t schedule on GPU nodes, keeping resources available for ML workloads.
For advanced scheduling strategies, our Kubernetes consulting services can help design node pools and taints that match your workload requirements.
When Should You Use Taints and Tolerations?
Taints and tolerations solve specific scheduling challenges:
Dedicated Hardware Nodes
When you have specialized hardware (GPUs, FPGAs, high-memory nodes), taints prevent general workloads from consuming these expensive resources:
kubectl taint nodes gpu-node hardware=gpu:NoSchedule
kubectl taint nodes highmem-node hardware=highmem:NoSchedule
Multi-Tenant Isolation
In shared clusters, taints create logical boundaries between teams or customers:
kubectl taint nodes team-a-node tenant=team-a:NoExecute
kubectl taint nodes team-b-node tenant=team-b:NoExecute
Each team’s pods tolerate only their own taint, ensuring workload isolation.
Environment Separation
Separate production from staging workloads within the same cluster:
kubectl taint nodes prod-node-1 environment=production:NoSchedule
kubectl taint nodes staging-node-1 environment=staging:NoSchedule
Node Maintenance and Draining
Before maintenance, taint nodes with NoExecute to gracefully evict pods:
kubectl taint nodes node1 maintenance=true:NoExecute
Pods without a matching toleration will terminate and reschedule elsewhere. According to Kubernetes documentation, this is the recommended approach for controlled node draining.
Common Taint and Toleration Patterns
Kubernetes itself uses taints for critical system functions:
Node Not Ready Taint
When a node becomes unhealthy, Kubernetes automatically taints it:
node.kubernetes.io/not-ready:NoExecute
This evicts pods from failing nodes. Most pods have a default toleration for this taint with a tolerationSeconds value, allowing temporary network hiccups without immediate eviction. Learn more about handling these scenarios in our guide on Kubernetes node not ready troubleshooting.
Master Node Taint
Control plane nodes have a built-in taint to prevent user workloads:
node-role.kubernetes.io/master:NoSchedule
Only system pods (kube-proxy, CoreDNS) tolerate this taint.
Custom Application Taints
You can create application-specific taints for canary deployments:
apiVersion: v1
kind: Pod
metadata:
name: canary-pod
spec:
containers:
- name: app
image: myapp:canary
tolerations:
- key: "deployment"
value: "canary"
effect: "NoSchedule"
Taint canary nodes with deployment=canary:NoSchedule to route only canary traffic there.
Taints vs Node Affinity: What’s the Difference?
Both taints/tolerations and node affinity control pod placement, but they work differently:
Taints and Tolerations:
- Push model: Nodes repel pods by default
- Negative selection: “Don’t schedule here unless…”
- Node-centric: Nodes control what can run on them
- Best for: Keeping pods OFF nodes
Node Affinity:
- Pull model: Pods attract themselves to nodes
- Positive selection: “I want to run here”
- Pod-centric: Pods choose where to run
- Best for: Putting pods ON specific nodes
Often you’ll use both together. Taint a GPU node to keep regular pods off, then use node affinity in ML pods to prefer that GPU node over others.
For comprehensive cluster design patterns, explore our Kubernetes migration strategy guide.
How to Remove Taints from Nodes
Removing a taint uses the same command with a minus sign:
kubectl taint nodes node1 gpu=true:NoSchedule-
The trailing - removes the taint. To remove all taints with a specific key:
kubectl taint nodes node1 gpu-
This removes all taints with key gpu, regardless of value or effect.
Toleration Seconds: Time-Based Evictions
For NoExecute taints, you can add tolerationSeconds to delay eviction:
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
This pod tolerates an unreachable node for 300 seconds (5 minutes) before evicting. This prevents flapping during brief network issues.
Kubernetes adds default tolerations for common node conditions:
node.kubernetes.io/not-ready:NoExecutewith 300s tolerancenode.kubernetes.io/unreachable:NoExecutewith 300s tolerance
You can override these defaults in your pod specs.
Best Practices for Taints and Tolerations
Follow these guidelines for maintainable clusters:
Use Descriptive Taint Keys
Choose clear, consistent naming:
- Good:
workload=database,hardware=gpu,environment=production - Bad:
special=true,x=y,node1=abc
Document Your Taints
Maintain a registry of taints used in your cluster. Include:
- Taint key/value/effect
- Purpose and use case
- Which teams or applications use it
- When it was added and by whom
Start with PreferNoSchedule
For new taints, start with PreferNoSchedule to test behavior before enforcing with NoSchedule. This prevents accidental pod scheduling failures.
Combine with Resource Requests
Taints prevent scheduling, but don’t guarantee resources. Always set resource requests:
resources:
requests:
nvidia.com/gpu: 1
memory: "16Gi"
Audit Tolerations Regularly
Periodically review which pods tolerate which taints. Orphaned tolerations can lead to unexpected scheduling.
Test Eviction Behavior
Before using NoExecute in production, test eviction timing and pod disruption budgets in staging. According to a Datadog report on Kubernetes adoption, misconfigured eviction policies are a leading cause of service disruptions.
For production-ready Kubernetes configurations, our Kubernetes production support team can audit your taints and tolerations.
Troubleshooting Taint and Toleration Issues
Common problems and solutions:
Pod Stuck in Pending State
Symptom: Pod shows Pending status indefinitely.
Diagnosis:
kubectl describe pod <pod-name>
Look for events like “0/5 nodes are available: 5 node(s) had taint {key: value}, that the pod didn’t tolerate.”
Solution: Add matching toleration to pod spec or remove taint from nodes.
Pods Evicted Unexpectedly
Symptom: Running pods suddenly terminate.
Diagnosis: Check if a NoExecute taint was added:
kubectl describe node <node-name> | grep Taints
Solution: Either add tolerations with tolerationSeconds or remove the taint.
Toleration Not Working
Symptom: Pod has toleration but still won’t schedule.
Common causes:
- Typo in key, value, or effect
- Wrong operator (Equal vs Exists)
- Other scheduling constraints (affinity, resource limits)
Solution: Verify exact taint-toleration match:
kubectl get nodes -o json | jq '.items[].spec.taints'
kubectl get pod <pod-name> -o json | jq '.spec.tolerations'
Frequently Asked Questions
What is the difference between taints and tolerations in Kubernetes?
Taints are applied to nodes to repel pods, while tolerations are added to pods to allow them to schedule on tainted nodes. Taints push pods away; tolerations pull pods back.
Can a pod have multiple tolerations?
Yes, pods can have unlimited tolerations. If a node has multiple taints, the pod must tolerate all of them to schedule there.
Do tolerations guarantee pod placement?
No. Tolerations only allow scheduling on tainted nodes. They don’t force it. Use node affinity or node selectors to prefer specific nodes.
What happens if I taint a node with NoExecute and pods are already running?
Pods without matching tolerations are immediately evicted. Pods with tolerations and tolerationSeconds are evicted after that duration expires.
Can I use taints for autoscaling?
Yes. Taint new nodes during scale-up to control which pods land there first. Cluster autoscaler respects taints when determining if a node can satisfy pending pods. For cost optimization strategies, see our Kubernetes cost optimization guide.
Conclusion
Understanding what is kubernetes tolerations and taints empowers you to build sophisticated scheduling policies that match your operational needs. Taints let nodes declare what they won’t run, while tolerations let pods override those restrictions. Together, they enable dedicated hardware pools, multi-tenant isolation, environment separation, and graceful maintenance workflows.
Start with simple use cases—dedicating GPU nodes or separating environments—then expand to more complex patterns as your cluster matures. Always document your taints, test eviction behavior, and combine with other scheduling primitives like affinity and resource requests for robust workload placement.
Need expert help designing Kubernetes scheduling strategies for your production clusters? Our Kubernetes consulting team specializes in cluster optimization, cost reduction, and operational excellence. We’ve helped organizations save over $250K through intelligent resource allocation and scheduling policies.