Preparing for a Kubernetes interview requires more than memorizing kubectl commands. Employers want engineers who understand the “why” behind the YAML—people who can debug a failing pod at 2 AM and explain their reasoning clearly.
We’ve compiled 50 Kubernetes interview questions that we actually use when hiring DevOps engineers, SREs, and platform engineers. These range from fundamental concepts to advanced troubleshooting scenarios that separate senior candidates from everyone else.
How to Use This Guide
Questions are organized by difficulty:
- Basic (1-15): Core concepts every Kubernetes user should know
- Intermediate (16-30): Day-to-day operational knowledge
- Advanced (31-40): Architecture, security, and design decisions
- Scenario-Based (41-50): Real-world troubleshooting and problem-solving
For hands-on practice, spin up a local cluster with Minikube or Kind and recreate common issues.
Basic Kubernetes Interview Questions (1-15)
1. What is Kubernetes and why do organizations use it?
Kubernetes is an open-source container orchestration platform that automates deploying, scaling, and managing containerized applications. Originally developed by Google based on 15 years of running containerized workloads, it’s now maintained by the Cloud Native Computing Foundation (CNCF).
Organizations use Kubernetes because it:
- Automates container deployment across multiple hosts
- Provides self-healing (restarts failed containers automatically)
- Enables horizontal scaling based on demand
- Manages service discovery and load balancing
- Supports rolling updates with zero downtime
2. Explain Kubernetes architecture and its main components.
Kubernetes follows a client-server architecture with two main layers:
Control Plane (Master):
- API Server: Front-end for the cluster; all REST commands go through it
- etcd: Distributed key-value store holding all cluster state
- Scheduler: Assigns Pods to Nodes based on resource requirements
- Controller Manager: Runs controllers that maintain desired state (ReplicaSet, Node, etc.)
Data Plane (Worker Nodes):
- Kubelet: Agent ensuring containers run in Pods as specified
- Kube-proxy: Maintains network rules for Pod communication
- Container Runtime: Runs containers (containerd, CRI-O)
For production clusters, understanding this architecture helps with Kubernetes security best practices.
3. What is a Pod and how does it differ from a container?
A Pod is the smallest deployable unit in Kubernetes. It represents one or more containers that:
- Share the same network namespace (same IP address)
- Can share storage volumes
- Are always co-located and co-scheduled
Container vs Pod:
- A container is a single isolated process
- A Pod is a logical host for tightly-coupled containers
apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: nginx
image: nginx:1.25
- name: log-shipper # Sidecar container
image: fluent-bit:latest
4. What is a Namespace and when would you use one?
A Namespace is a logical partition within a cluster that provides:
- Resource isolation between teams or environments
- Scope for names (objects must be unique within a namespace)
- Ability to apply resource quotas and RBAC policies
Common use cases:
kubectl get namespaces
# NAME STATUS AGE
# default Active 30d
# kube-system Active 30d
# production Active 20d
# staging Active 20d
Use namespaces to separate development, staging, and production workloads or to isolate different teams.
5. What’s the difference between a Deployment and a StatefulSet?
| Aspect | Deployment | StatefulSet |
|---|---|---|
| Use case | Stateless applications | Stateful applications (databases) |
| Pod identity | Interchangeable replicas | Stable, unique network identities |
| Storage | Shared or no persistent storage | Each Pod gets its own PersistentVolume |
| Scaling | Parallel creation/deletion | Ordered, sequential operations |
| Pod names | Random suffix (web-abc123) | Predictable (web-0, web-1, web-2) |
Use Deployment for: Web servers, APIs, microservices Use StatefulSet for: Databases, message queues, distributed systems requiring stable identities
6. What is a Service and what types exist?
A Service provides stable network access to a set of Pods. Since Pods are ephemeral and get new IPs when recreated, Services provide a consistent endpoint.
Service Types:
| Type | Description | Use Case |
|---|---|---|
| ClusterIP | Internal IP only (default) | Service-to-service communication |
| NodePort | Exposes on static port (30000-32767) | Development, simple external access |
| LoadBalancer | Provisions cloud load balancer | Production external access |
| ExternalName | DNS CNAME to external service | Accessing external databases |
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
type: ClusterIP
selector:
app: web
ports:
- port: 80
targetPort: 8080
7. How do labels and selectors work?
Labels are key-value pairs attached to objects for identification. Selectors query objects based on their labels.
# Pod with labels
metadata:
labels:
app: frontend
environment: production
version: v2.1.0
# Service selecting Pods
spec:
selector:
app: frontend
environment: production
This loose coupling allows Services to route traffic to any Pod matching the selector, enabling rolling updates without downtime.
8. What is kubectl and what are the most important commands?
kubectl is the command-line tool for interacting with Kubernetes clusters.
Essential commands:
# View resources
kubectl get pods -n <namespace>
kubectl get deployments
kubectl get services
# Detailed information
kubectl describe pod <pod-name>
kubectl logs <pod-name> --follow
# Apply configurations
kubectl apply -f deployment.yaml
# Debugging
kubectl exec -it <pod-name> -- /bin/sh
kubectl port-forward <pod-name> 8080:80
# Context management
kubectl config get-contexts
kubectl config use-context <context-name>
9. What is a ConfigMap and how is it used?
A ConfigMap stores non-confidential configuration data as key-value pairs, decoupling configuration from container images.
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_HOST: "postgres.default.svc"
LOG_LEVEL: "info"
config.json: |
{
"feature_flags": {
"new_ui": true
}
}
Consuming ConfigMaps:
spec:
containers:
- name: app
env:
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: DATABASE_HOST
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: app-config
For a deeper dive, see our guide on Kubernetes ConfigMaps.
10. What is a Secret and how does it differ from a ConfigMap?
Secrets store sensitive data like passwords, tokens, and SSH keys. Unlike ConfigMaps:
- Data is base64-encoded (not encrypted by default)
- Can be encrypted at rest in etcd
- Access can be restricted via RBAC
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
username: YWRtaW4= # base64 encoded
password: cGFzc3dvcmQxMjM=
Best practice: Use external secrets managers like HashiCorp Vault or cloud-native solutions (AWS Secrets Manager, Azure Key Vault) with the External Secrets Operator.
Learn more in our Kubernetes Secrets guide.
11. What is a ReplicaSet and how does it relate to Deployments?
A ReplicaSet ensures a specified number of Pod replicas are running at any time. The relationship:
Deployment → ReplicaSet → Pods
- Deployments manage ReplicaSets
- ReplicaSets manage Pods
- During rolling updates, Deployments create new ReplicaSets while scaling down old ones
- Old ReplicaSets are retained for rollback capability
You rarely create ReplicaSets directly—use Deployments instead, which provide declarative updates and rollback features.
12. How do you roll back a failed Deployment?
# View rollout history
kubectl rollout history deployment/web-app
# Roll back to previous version
kubectl rollout undo deployment/web-app
# Roll back to specific revision
kubectl rollout undo deployment/web-app --to-revision=2
# Check rollout status
kubectl rollout status deployment/web-app
Kubernetes keeps a history of ReplicaSets, enabling quick rollbacks without redeploying old images.
13. What is a DaemonSet?
A DaemonSet ensures a copy of a Pod runs on all (or selected) Nodes. When Nodes are added, Pods are automatically added; when Nodes are removed, Pods are garbage collected.
Use cases:
- Log collectors (Fluentd, Fluent Bit)
- Monitoring agents (Prometheus Node Exporter)
- Network plugins (Calico, Cilium)
- Storage daemons
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter:latest
14. What is an Ingress and how does it differ from a Service?
An Ingress manages external HTTP/HTTPS access to Services, providing:
- Path-based routing
- Host-based routing
- TLS termination
- Load balancing
Service vs Ingress:
- Service: Layer 4 (TCP/UDP) load balancing
- Ingress: Layer 7 (HTTP/HTTPS) routing with URL rules
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
Note: Ingress requires an Ingress Controller (NGINX, Traefik, AWS ALB) to function.
15. What are resource requests and limits?
Resource requests and limits control CPU and memory allocation for containers:
- Requests: Minimum guaranteed resources; used for scheduling decisions
- Limits: Maximum resources a container can use; exceeding memory limits causes OOMKill
spec:
containers:
- name: app
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Best practice: Always set requests (for predictable scheduling) and limits (to prevent runaway containers).
Intermediate Kubernetes Interview Questions (16-30)
16. Explain the three types of probes in Kubernetes.
| Probe | Purpose | Failure Action |
|---|---|---|
| Liveness | Is the container running? | Restart container |
| Readiness | Can the container accept traffic? | Remove from Service endpoints |
| Startup | Has the app finished starting? | Delay other probes |
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
17. What is a Horizontal Pod Autoscaler (HPA)?
HPA automatically scales Pod replicas based on observed metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
HPA queries the Metrics Server to get current resource usage and adjusts replicas accordingly.
For advanced autoscaling patterns, see our Kubernetes autoscaling guide.
18. What is Vertical Pod Autoscaler (VPA)?
VPA automatically adjusts CPU and memory requests/limits for containers based on historical usage:
- Analyzes resource consumption over time
- Recommends or applies optimal resource settings
- Helps right-size containers
Modes:
Off: Only provides recommendationsAuto: Applies recommendations (may restart Pods)Initial: Sets resources only at Pod creation
Learn more in our Kubernetes VPA guide.
19. How do PersistentVolumes (PV) and PersistentVolumeClaims (PVC) work?
PersistentVolume (PV): Cluster-level storage resource provisioned by an admin or dynamically via StorageClass.
PersistentVolumeClaim (PVC): User request for storage that binds to an available PV.
# PVC requesting storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
# Pod using the PVC
spec:
containers:
- name: app
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pvc
20. What are taints and tolerations?
Taints are applied to Nodes to repel Pods; tolerations allow Pods to schedule on tainted Nodes.
# Taint a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule
# Pod with toleration
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
Effects:
NoSchedule: Don’t schedule new PodsPreferNoSchedule: Avoid scheduling if possibleNoExecute: Evict existing Pods and don’t schedule new ones
21. Explain Node affinity and Pod affinity/anti-affinity.
Node Affinity: Attracts Pods to specific Nodes based on labels.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
Pod Affinity/Anti-Affinity: Co-locate or separate Pods based on labels.
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: web
topologyKey: kubernetes.io/hostname
This ensures web Pods spread across different Nodes for high availability.
22. What are init containers?
Init containers run before app containers start, completing initialization tasks:
spec:
initContainers:
- name: init-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres 5432; do sleep 2; done']
- name: init-migrations
image: myapp:latest
command: ['./migrate', 'up']
containers:
- name: app
image: myapp:latest
Use cases:
- Wait for dependencies (databases, services)
- Run database migrations
- Download configuration files
- Set up permissions
23. What is a Network Policy?
Network Policies control traffic flow between Pods at the IP/port level, implementing a zero-trust model:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
Note: Requires a CNI plugin that supports Network Policies (Calico, Cilium, Weave).
24. How does DNS work in Kubernetes?
CoreDNS (default DNS provider) enables service discovery using internal DNS names:
<service-name>.<namespace>.svc.cluster.local
Examples:
postgres.default.svc.cluster.local→ PostgreSQL in default namespaceapi.production.svc.cluster.local→ API service in production
Pods automatically get DNS configuration to resolve these names.
25. What is Helm and why is it useful?
Helm is the package manager for Kubernetes, using “charts” to define, install, and upgrade applications.
Benefits:
- Templating for environment-specific values
- Versioned releases with rollback support
- Dependency management
- Reusable, shareable packages
# Install a chart
helm install my-release bitnami/postgresql
# Upgrade with new values
helm upgrade my-release bitnami/postgresql -f values-prod.yaml
# Rollback
helm rollback my-release 1
26. What is the Kubernetes Scheduler and how does it work?
The Scheduler assigns Pods to Nodes through:
- Filtering: Removes Nodes that can’t run the Pod (insufficient resources, taints, affinity rules)
- Scoring: Ranks remaining Nodes using priority functions
- Binding: Assigns Pod to highest-scoring Node
Scheduling factors:
- Resource requests/limits
- Node selectors and affinity
- Taints and tolerations
- Pod topology spread constraints
27. What is etcd and why is it critical?
etcd is a distributed, consistent key-value store that holds all cluster state:
- Desired state (what you want)
- Current state (what exists)
- Configuration and secrets
Best practices:
- Run in a cluster (minimum 3 nodes) for high availability
- Regular backups:
etcdctl snapshot save backup.db - Encrypt secrets at rest
- Limit direct access; use API server
28. Explain rolling update strategy.
Rolling updates replace old Pods with new ones incrementally:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # Max Pods above desired count
maxUnavailable: 25% # Max Pods unavailable during update
Process:
- Create new ReplicaSet with updated spec
- Scale up new ReplicaSet, scale down old
- Repeat until all Pods are updated
- Old ReplicaSet retained (replica=0) for rollback
29. What are Pod Disruption Budgets (PDBs)?
PDBs limit voluntary disruptions to maintain application availability:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2 # or use maxUnavailable
selector:
matchLabels:
app: web
PDBs are respected during:
- Node drains (
kubectl drain) - Cluster upgrades
- Voluntary evictions
They’re not respected during involuntary disruptions (node crashes, OOM kills).
30. What is kube-proxy and how does it work?
kube-proxy runs on every Node, implementing Service networking:
Modes:
- iptables (default): Creates iptables rules for each Service
- IPVS: Uses kernel IPVS for better performance at scale
- userspace: Legacy mode, rarely used
kube-proxy:
- Watches API server for Service/Endpoint changes
- Updates network rules accordingly
- Enables Pods to reach Services via ClusterIP
Advanced Kubernetes Interview Questions (31-40)
31. What are Custom Resource Definitions (CRDs)?
CRDs extend the Kubernetes API with custom resource types:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
size:
type: string
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
After creating the CRD, you can create instances:
apiVersion: example.com/v1
kind: Database
metadata:
name: my-postgres
spec:
engine: postgresql
size: large
32. What is a Kubernetes Operator?
An Operator combines CRDs with custom controllers to automate application management. It encodes operational knowledge (how to deploy, scale, backup, upgrade) into software.
Examples:
- Prometheus Operator
- PostgreSQL Operator
- Elasticsearch Operator
Operators handle tasks like:
- Automated backups and restores
- Scaling decisions
- Version upgrades
- Failure recovery
33. Explain RBAC in Kubernetes.
Role-Based Access Control uses four resources:
| Resource | Scope | Purpose |
|---|---|---|
| Role | Namespace | Defines permissions within a namespace |
| ClusterRole | Cluster-wide | Defines permissions cluster-wide |
| RoleBinding | Namespace | Grants Role to users/groups/service accounts |
| ClusterRoleBinding | Cluster-wide | Grants ClusterRole cluster-wide |
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: User
name: developer@example.com
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
34. What is GitOps and how does it apply to Kubernetes?
GitOps uses Git as the single source of truth for declarative infrastructure. Tools like Argo CD or Flux continuously reconcile cluster state with Git repositories.
Benefits:
- Version-controlled infrastructure changes
- Audit trail through Git history
- Pull request-based change management
- Automatic drift detection and correction
Learn more in our DevOps Kubernetes playbook.
35. How do you implement canary deployments in Kubernetes?
Native Kubernetes doesn’t support canary deployments directly. Options:
1. Multiple Deployments with Service weights:
# Stable: 90% traffic
replicas: 9
# Canary: 10% traffic
replicas: 1
2. Argo Rollouts:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 1h}
- setWeight: 50
- pause: {duration: 1h}
3. Service Mesh (Istio): Use VirtualService to split traffic by percentage.
36. What is a Service Mesh and when would you use one?
A Service Mesh (Istio, Linkerd, Cilium) provides:
- mTLS: Encrypted service-to-service communication
- Traffic management: Canary, A/B testing, retries, timeouts
- Observability: Distributed tracing, metrics, logging
- Policy enforcement: Rate limiting, access control
Use when:
- You need zero-trust security between services
- Complex traffic routing requirements
- Detailed observability across microservices
- Multiple teams deploying independently
Avoid when:
- Simple architectures (< 10 services)
- Team lacks service mesh expertise
- Overhead isn’t justified
37. How do you secure a Kubernetes cluster?
Control Plane:
- Enable RBAC with least-privilege access
- Encrypt etcd at rest
- Enable audit logging
- Restrict API server access
Workloads:
- Use Pod Security Standards (restricted mode)
- Run containers as non-root
- Set read-only root filesystem
- Define resource limits
Network:
- Implement Network Policies (default deny)
- Use service mesh for mTLS
- Isolate namespaces
Supply Chain:
- Scan images for vulnerabilities
- Sign images with Cosign
- Use admission controllers (OPA Gatekeeper, Kyverno)
# Pod Security Standard: restricted
apiVersion: v1
kind: Namespace
metadata:
name: secure-ns
labels:
pod-security.kubernetes.io/enforce: restricted
38. What are admission controllers?
Admission controllers intercept API requests after authentication/authorization but before persistence. They can validate or mutate requests.
Built-in controllers:
NamespaceLifecycle: Prevents operations in terminating namespacesLimitRanger: Enforces default resource constraintsPodSecurity: Enforces Pod Security Standards
Custom controllers:
- OPA Gatekeeper: Policy enforcement using Rego
- Kyverno: Kubernetes-native policies
# Kyverno policy requiring labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
spec:
validationFailureAction: enforce
rules:
- name: check-team-label
match:
resources:
kinds:
- Pod
validate:
message: "label 'team' is required"
pattern:
metadata:
labels:
team: "?*"
39. How do you handle multi-cluster Kubernetes?
Approaches:
| Approach | Use Case |
|---|---|
| Federation | Sync resources across clusters |
| Service Mesh | Cross-cluster service discovery (Istio multi-cluster) |
| GitOps | Deploy same config to multiple clusters via Git |
| Cluster API | Provision and manage cluster lifecycle |
Tools:
- Rancher: Multi-cluster management UI
- Argo CD ApplicationSets: Deploy to multiple clusters
- Cilium Cluster Mesh: Cross-cluster networking
40. What is the difference between imperative and declarative configuration?
Imperative: Tell Kubernetes what to do step-by-step
kubectl create deployment nginx --image=nginx
kubectl scale deployment nginx --replicas=3
kubectl expose deployment nginx --port=80
Declarative: Define desired state; let Kubernetes figure out how
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 3
...
kubectl apply -f deployment.yaml
Best practice: Use declarative configuration for production. It’s:
- Version-controllable
- Repeatable
- Self-documenting
- GitOps-friendly
Scenario-Based Interview Questions (41-50)
41. A Pod is stuck in CrashLoopBackOff. How do you debug it?
# 1. Check Pod status and events
kubectl describe pod <pod-name>
# 2. Check container logs (including previous crashed container)
kubectl logs <pod-name> --previous
# 3. Look for OOMKilled in status
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState}'
# 4. Check if it's a probe issue
kubectl get pod <pod-name> -o yaml | grep -A 10 livenessProbe
# 5. Try running interactively
kubectl run debug --image=<image> --rm -it -- /bin/sh
Common causes:
- Application error (check logs)
- Missing dependencies or config
- Failed liveness probe
- OOM killed (increase memory limit)
- Image pull issues
42. A Service isn’t routing traffic to Pods. What do you check?
# 1. Verify Service selector matches Pod labels
kubectl get svc <service-name> -o wide
kubectl get pods --show-labels
# 2. Check Endpoints (should list Pod IPs)
kubectl get endpoints <service-name>
# Empty endpoints = selector doesn't match any Pods
# 3. Verify Pods are Ready
kubectl get pods
# Not Ready = won't receive traffic
# 4. Test from within the cluster
kubectl run test --image=busybox --rm -it -- wget -qO- <service-name>:80
# 5. Check Network Policies blocking traffic
kubectl get networkpolicies
43. How do you investigate high memory usage in a Pod?
# 1. Check current usage
kubectl top pod <pod-name>
# 2. Compare against limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}'
# 3. Check for OOMKilled events
kubectl describe pod <pod-name> | grep -i oom
# 4. Exec into Pod and check processes
kubectl exec -it <pod-name> -- top
# 5. Review application metrics (if available)
kubectl port-forward <pod-name> 9090:9090
# Check /metrics endpoint
Solutions:
- Increase memory limits
- Fix memory leaks in application
- Add horizontal scaling
- Review VPA recommendations
44. A node becomes NotReady. What happens and how do you respond?
What happens automatically:
- Pods marked as
Unknown - After 5 minutes, Pods evicted and rescheduled (if using Deployments)
- Node Controller marks node as unschedulable
Investigation:
# 1. Check node status
kubectl describe node <node-name>
# 2. Check kubelet status (SSH to node)
systemctl status kubelet
journalctl -u kubelet -n 100
# 3. Check system resources
df -h # Disk space
free -m # Memory
# 4. Check node conditions
kubectl get node <node-name> -o jsonpath='{.status.conditions}'
Common causes:
- Kubelet crashed
- Disk pressure
- Memory pressure
- Network issues
45. How do you perform a zero-downtime upgrade of your application?
# 1. Configure rolling update strategy
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Never reduce below desired count
# 2. Set readiness probe
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# 3. Configure PDB
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 2
# 4. Apply new version
kubectl set image deployment/web web=myapp:v2.0
# 5. Monitor rollout
kubectl rollout status deployment/web
46. How do you handle secrets rotation without restarting Pods?
Option 1: External Secrets with refresh
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
spec:
refreshInterval: 1h
Option 2: Reloader controller Use Stakater Reloader to automatically restart Pods when ConfigMaps/Secrets change.
Option 3: Application-level reload Design application to watch for file changes and reload secrets without restart.
47. A deployment is running but requests are timing out. How do you diagnose?
# 1. Check if Pods are Ready
kubectl get pods -l app=myapp
# 2. Check Service endpoints
kubectl get endpoints myapp-service
# 3. Test connectivity from another Pod
kubectl run debug --rm -it --image=busybox -- sh
wget -qO- --timeout=5 myapp-service:80
# 4. Check Pod logs for errors
kubectl logs -l app=myapp --tail=100
# 5. Check Network Policies
kubectl get networkpolicies -A
# 6. Check if Ingress is configured correctly
kubectl describe ingress myapp-ingress
# 7. Check resource usage (CPU throttling?)
kubectl top pods -l app=myapp
48. How would you migrate a stateful application to Kubernetes?
-
Assess the application:
- Data persistence requirements
- Network identity needs
- Scaling characteristics
-
Choose the right workload type:
- StatefulSet for ordered deployment
- Operators for complex applications (databases)
-
Plan storage:
volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 100Gi -
Handle data migration:
- Use backup/restore tools
- Set up replication before cutover
- Plan for rollback
-
Consider managed operators:
- CloudNativePG for PostgreSQL
- Strimzi for Kafka
- MongoDB Community Operator
49. How do you implement cost optimization in a Kubernetes cluster?
# 1. Right-size resources using VPA recommendations
kubectl get vpa -A
# 2. Identify over-provisioned Pods
kubectl top pods -A --sort-by=cpu
# 3. Set resource requests/limits on all workloads
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].resources.requests == null) | .metadata.name'
Strategies:
- Use Cluster Autoscaler to scale nodes down
- Implement spot/preemptible instances for non-critical workloads
- Use namespace resource quotas
- Schedule batch jobs during off-peak hours
- Use tools like Kubecost for visibility
See our guide on Kubernetes cost optimization.
50. Describe your approach to upgrading a production Kubernetes cluster.
-
Preparation:
# Review release notes # Back up etcd etcdctl snapshot save backup.db # Document current versions kubectl version kubectl get nodes -o wide -
Pre-flight checks:
- Verify all Pods healthy
- Check PodDisruptionBudgets
- Ensure backup/restore tested
- Plan rollback procedure
-
Upgrade control plane:
- One master at a time (HA clusters)
- Upgrade in order: API server → controller manager → scheduler
-
Upgrade worker nodes:
# Cordon node (prevent new Pods) kubectl cordon node1 # Drain node (evict Pods) kubectl drain node1 --ignore-daemonsets # Upgrade kubelet and kubectl # Uncordon node kubectl uncordon node1 -
Post-upgrade:
- Verify all components healthy
- Update add-ons (CNI, CoreDNS, Ingress)
- Test critical applications
- Monitor for issues
Interview Preparation Tips
Before the Interview
-
Hands-on practice: Set up a local cluster with Minikube or Kind. Break things intentionally and fix them.
-
Review core concepts: Understand the “why” behind each resource type, not just syntax.
-
Know your experience: Be ready to discuss specific Kubernetes challenges you’ve solved.
-
Stay current: Review recent Kubernetes news and updates.
During the Interview
-
Think out loud: Explain your debugging process step-by-step.
-
Ask clarifying questions: “Is this a single-tenant or multi-tenant cluster?” “What CNI are they using?”
-
Acknowledge what you don’t know: It’s better to say “I’d need to look that up” than to guess incorrectly.
-
Connect to real experience: “In my last role, we handled this by…”
Build Kubernetes Expertise with Expert Guidance
Preparing for Kubernetes interviews—or building production-ready clusters—is easier with experienced guidance.
Our Kubernetes consulting services help teams:
- Architect production clusters on AWS EKS, Azure AKS, or Google GKE
- Implement security best practices from RBAC to Network Policies
- Optimize costs with right-sizing and autoscaling
- Train engineering teams on Kubernetes operations
We’ve helped organizations from startups to enterprises build reliable, scalable Kubernetes platforms.