When a Kubernetes pod is in an Evicted state, it signifies that the Kubelet, the agent running on each node, has proactively terminated the pod to reclaim resources and maintain node stability. This critical action is typically triggered by resource pressure, most commonly related to ephemeral storage or memory exhaustion. Understanding the precise cause and implementing appropriate resource management strategies are crucial for maintaining a healthy and reliable Kubernetes cluster.
1. Symptoms: Clear description of indicators and shell output.
The primary symptom of this error is a pod transitioning into the Evicted status. This can be observed using standard Kubernetes commands:
kubectl get pods
Output will show:
NAME READY STATUS RESTARTS AGE
my-app-pod 0/1 Evicted 0 5m
Further investigation using kubectl describe pod will provide detailed events and the specific reason for the eviction:
kubectl describe pod my-app-pod
Look for events similar to these:
For Ephemeral Storage Pressure:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 2m kubelet, node-1 Pod was evicted by the kubelet.
Normal Killing 2m kubelet, node-1 Stopping container my-container
Warning NodeHasNoDiskPressure 2m (x2 over 5m) kubelet, node-1 Node has sufficient disk space available
Warning NodeHasDiskPressure 2m (x2 over 5m) kubelet, node-1 Node has disk pressure -- the image garbage collector is disabled
Warning EvictionThresholdMet 2m (x2 over 5m) kubelet, node-1 A pod was evicted because its ephemeral-storage usage exceeded the node's configured eviction threshold.
For Memory Pressure:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 3m kubelet, node-1 Pod was evicted by the kubelet.
Normal Killing 3m kubelet, node-1 Stopping container my-container
Warning NodeHasNoMemoryPressure 3m (x2 over 6m) kubelet, node-1 Node has sufficient memory available
Warning NodeHasMemoryPressure 3m (x2 over 6m) kubelet, node-1 Node has memory pressure -- the image garbage collector is disabled
Warning EvictionThresholdMet 3m (x2 over 6m) kubelet, node-1 A pod was evicted because its memory usage exceeded the node's configured eviction threshold.
The Message field within the EvictionThresholdMet event is crucial for identifying whether the eviction was due to ephemeral-storage or memory pressure.
2. Root Cause: Technical explanation of the underlying cause.
The k8s-evicted error stems from the Kubelet’s eviction manager, a critical component designed to protect the stability and health of the Kubernetes node. When a node’s resources (such as memory, ephemeral storage, CPU, or process IDs) fall below predefined “hard eviction thresholds,” the Kubelet initiates an eviction process to free up resources. This prevents the node from becoming unresponsive or crashing, which would impact all pods running on it.
Ephemeral Storage Pressure: Ephemeral storage refers to the local, non-persistent disk space available to a pod on its node. This includes:
emptyDirvolumes: Temporary directories created for a pod, deleted when the pod is removed.- Container writable layer: The filesystem layer where a container writes data during its lifecycle.
- Container logs: Logs generated by the application within the container.
If a pod, or a collection of pods on a node, consumes an excessive amount of this local storage, it can trigger ephemeral-storage pressure. The Kubelet monitors the total ephemeral storage usage on the node and, if it crosses a configured threshold (e.g., 90% disk usage), it will evict pods to reclaim space. Pods that are consuming the most ephemeral storage are typically targeted first.
Memory Pressure: Memory pressure occurs when the total memory usage on a node approaches or exceeds its capacity. This can be caused by:
- Individual Pod Exceeding Limits: A single pod consuming more memory than its
resources.limits.memoryallows, leading to an Out-Of-Memory (OOM) kill by the kernel, which can then trigger Kubelet eviction if the node is under overall memory pressure. - Aggregate Pod Memory Usage: Even if individual pods stay within their limits, the sum of all pods’ memory consumption, combined with system processes, can exhaust the node’s total memory.
- Memory Leaks: Applications with memory leaks will continuously consume more memory, eventually leading to pressure.
When the Kubelet detects that the node’s available memory falls below a configured threshold, it will evict pods to free up memory. Pods that are consuming the most memory and are not within their requests (or have no requests defined) are often prioritized for eviction. The Kubelet aims to evict pods in a way that minimizes disruption while ensuring node stability.
3. Step-by-Step Fix: Accurate fix instructions. You MUST use “Before:” and “After:” labels for code comparison blocks.
Resolving k8s-evicted requires identifying the specific resource under pressure and adjusting pod resource requests/limits or optimizing application behavior.
Step 1: Identify the specific resource pressure (Ephemeral Storage or Memory)
As shown in the symptoms section, kubectl describe pod <pod-name> will indicate whether the eviction was due to ephemeral-storage or memory pressure in the Events section.
Step 2: Implement the appropriate fix
Fix for Ephemeral Storage Pressure:
-
Analyze Ephemeral Storage Usage:
- Check application logs for excessive output.
- Review
emptyDirvolume usage within your pod’s containers. Are temporary files growing unexpectedly large? - Consider if persistent storage (e.g.,
PersistentVolumeClaim) should be used instead ofemptyDirfor data that needs to persist or is large.
-
Adjust Pod Resource Limits for Ephemeral Storage: If your application legitimately requires more ephemeral storage, you must explicitly request and limit it in your pod definition.
Before:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: template: spec: containers: - name: my-container image: my-image:latest resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"After:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: template: spec: containers: - name: my-container image: my-image:latest resources: requests: memory: "256Mi" cpu: "250m" ephemeral-storage: "1Gi" # Request a baseline ephemeral storage limits: memory: "512Mi" cpu: "500m" ephemeral-storage: "2Gi" # Set a hard limit for ephemeral storagerequests.ephemeral-storage: The amount of ephemeral storage guaranteed to the container.limits.ephemeral-storage: The maximum amount of ephemeral storage the container can consume. Exceeding this will lead to eviction.
-
Optimize Application Ephemeral Storage Usage:
- Implement log rotation and compression within your application or via a sidecar container.
- Ensure temporary files are properly cleaned up.
- Avoid storing large, non-essential data in
emptyDirvolumes.
Fix for Memory Pressure:
-
Analyze Memory Usage:
- Use
kubectl top pod <pod-name>to see current memory consumption. - Profile your application to identify memory leaks or inefficient memory usage patterns.
- Check application logs for
OOMKilledmessages, which often precede memory-related evictions.
- Use
-
Adjust Pod Resource Limits for Memory: If your application requires more memory, increase its
memoryrequests and limits.Before:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: template: spec: containers: - name: my-container image: my-image:latest resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"After:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: template: spec: containers: - name: my-container image: my-image:latest resources: requests: memory: "512Mi" # Increase memory request cpu: "250m" limits: memory: "1Gi" # Increase memory limit cpu: "500m"requests.memory: The amount of memory guaranteed to the container.limits.memory: The maximum amount of memory the container can consume. Exceeding this will lead to an OOMKill, which can trigger eviction.
-
Optimize Application Memory Usage:
- Refactor code to reduce memory footprint.
- Tune JVM settings (for Java applications) or other runtime environments.
- Reduce cache sizes if they are consuming excessive memory.
General Considerations:
- Node Capacity: If multiple pods are being evicted, or if
kubectl top nodeshows consistently high resource utilization, your nodes might be undersized. Consider adding more nodes to your cluster or upgrading existing nodes with more memory/disk space. - Monitoring: Implement robust monitoring for pod and node resource usage to proactively identify potential pressure points.
4. Verification: How to confirm the fix works.
After applying the changes, verify the fix using the following steps:
-
Monitor Pod Status:
kubectl get pods -wObserve the affected pod. It should transition from
EvictedtoContainerCreatingand thenRunning. Ensure it remains in theRunningstate without further evictions. -
Check Pod Events:
kubectl describe pod <pod-name>Review the
Eventssection. There should be no newEvictedorEvictionThresholdMetwarnings related to ephemeral storage or memory pressure. -
Monitor Resource Usage:
- For Memory:
Ensure the pod’s memory usage stays well within its new limits and that the node’s overall memory utilization is healthy.
kubectl top pod <pod-name> kubectl top node <node-name> - For Ephemeral Storage:
While
kubectl topdoesn’t directly show ephemeral storage, you can monitor the node’s disk usage if you have access to the node itself or through your cloud provider’s monitoring tools. For individual pods, you might need to exec into the container and check disk usage within relevant directories (e.g.,/tmp,/var/log).
- For Memory:
-
Application Functionality: Confirm that your application is running as expected, serving requests, and not exhibiting any new errors or performance degradation. If possible, run load tests to simulate production traffic and ensure stability under stress.
5. Common Pitfalls: Key mistakes to avoid.
When addressing k8s-evicted errors, several common mistakes can lead to recurring issues or suboptimal cluster performance:
- Setting Limits Too Low: Arbitrarily increasing resource limits without proper analysis can lead to the same eviction problem if the new limits are still insufficient for the application’s actual needs. This often results in a cycle of evictions.
- Setting Limits Too High: Over-provisioning resources by setting excessively high limits can waste valuable node resources. This can lead to fewer pods being scheduled on a node, higher infrastructure costs, and potentially starving other pods if the node becomes overcommitted.
- Ignoring
requests: Whilelimitsprevent a pod from consuming too much,requestsare crucial for scheduling. Ifrequestsare not set, or set too low, the scheduler might place a pod on a node that doesn’t have enough guaranteed resources, leading to resource contention and potential evictions for other pods. - Not Monitoring Application Behavior: Simply adjusting Kubernetes resource definitions without understanding the application’s actual resource consumption patterns (e.g., memory leaks, excessive logging) is a temporary fix. The underlying application issue will eventually resurface.
- Misunderstanding
emptyDir: TreatingemptyDirvolumes as persistent storage or assuming they have infinite capacity.emptyDiris ephemeral and contributes to the pod’s ephemeral storage usage, which is subject to Kubelet eviction thresholds. - Ignoring Node-Level Pressure: Focusing solely on individual pod limits while the entire node is under severe resource pressure (e.g., 95% memory usage) will not solve the problem. In such cases, scaling up the node’s resources or adding more nodes is necessary.
- Using
hostPathfor Temporary Storage: WhilehostPathcan provide local storage, it bypasses Kubernetes’ ephemeral storage management. If an application writes excessively to ahostPathvolume, it can fill up the host’s disk without Kubelet being able to track or evict based on that usage, potentially leading to a full node disk and broader cluster instability.
6. Related Errors: 2-3 similar errors.
Understanding errors related to k8s-evicted can provide a broader perspective on Kubernetes resource management challenges:
OOMKilled: This status indicates that a container within a pod was terminated by the operating system’s Out-Of-Memory (OOM) killer. This happens when a container attempts to use more memory than itsresources.limits.memoryallows. WhileOOMKilledis a direct container termination, it often precedes or contributes tok8s-evictedif the node itself is under memory pressure, as the Kubelet might evict other pods to stabilize the node after an OOMKill.CrashLoopBackOff: This status means a pod is repeatedly starting, crashing, and restarting. While it can be caused by various application-level issues (e.g., misconfiguration, bugs),OOMKilledevents (due to memory limits or ephemeral storage limits leading to application failure) are a common underlying cause. If a pod is repeatedly OOMKilled, it will enter aCrashLoopBackOffstate.Pending(due to insufficient resources): A pod remains in thePendingstate if the Kubernetes scheduler cannot find a suitable node to run it. One common reason for this is “Insufficient ” (e.g.,Insufficient memory,Insufficient ephemeral-storage). This occurs when no node in the cluster has enough available resources to satisfy the pod’sresources.requests. While not an eviction, it’s a related resource management issue indicating that the cluster’s capacity is insufficient for the requested workloads.