Automating Resource Management in GKE: A Guide to Resolving Pending Pods

Mariusz (Mario) Dworniczak, PMP

Published Dec 1, 2025

Introduction

In production Kubernetes environments, pods stuck in a Pending state due to resource exhaustion represent one of the most common operational challenges. While manual intervention can provide immediate relief, modern cloud-native practices demand automated, self-healing solutions. This article explores the theoretical foundations of Kubernetes resource management and presents a practical approach to implementing automated remediation in Google Kubernetes Engine (GKE).

Understanding Kubernetes Resource Scheduling

The Scheduling Contract

Kubernetes operates on a principle of resource guarantees. Before a pod can run, the scheduler must identify a node capable of satisfying all defined resource requirements. When this fundamental condition cannot be met, pods remain in a Pending state indefinitely.

Resource Requests and Limits

Resource Requests define the guaranteed minimum amount of CPU and memory a pod requires. The Kubernetes scheduler uses these values to determine node placement. If the sum of all running pods' requests exceeds a node's available capacity, new pods cannot be scheduled.

Resource Limits establish the maximum resources a container may consume. Containers can burst up to this threshold when spare capacity exists, but exceeding memory limits triggers OOMKill events, while CPU limits result in throttling.

Node Allocatable Capacity

Not all node resources are available to pods. A portion is reserved for the operating system, kubelet, and other system components. The remaining capacity, termed Allocatable, represents the actual resource pool available for scheduling:

Allocatable = Node Capacity - System Reserved - Kube Reserved

A pod remains pending when, for every node in the cluster, the following condition is false:

Node Allocatable Resource ≥ Σ(Existing Pod Requests) + (New Pod Request)

Considerations for Spot VMs

GKE Spot VMs (formerly Preemptible Nodes) offer significant cost savings but come with availability trade-offs. These instances can be reclaimed by Google Cloud with minimal notice, making them suitable for fault-tolerant, stateless, or batch workloads. Due to their volatile nature, resource exhaustion occurs more frequently in spot-based node pools, making automation essential rather than optional.

Case Study: Resource Exhaustion in Production

Scenario Overview

A DevOps team deployed a stateless microservice to a dedicated Spot VM node pool with the following configuration:

Cluster Type: GKE with single spot node pool
Initial Pool Size: 1 node (minimum)
New Deployment: 2 replicas requested
Result: Both pods stuck in Pending state

Diagnostic Process

Step 1: Examine Pod Events

The first diagnostic step involves querying the scheduler's decision log:

kubectl describe pod <pod-name>

Output:

Warning  FailedScheduling  1m  default-scheduler  
0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory

This event confirms the scheduler's inability to place the pod due to resource constraints.

Step 2: Analyze Node Capacity

Next, examine the node's current resource allocation:

kubectl describe node <node-name>

Allocated Resources:

With 95% of CPU requests and 85% of memory requests already consumed, the remaining capacity (5% CPU, 15% memory) was insufficient for the new pods requiring 500m CPU and 1Gi memory each.

Immediate Resolution

The engineer manually scaled the node pool from 1 to 2 nodes via the Google Cloud Console. Within minutes, the new node became available, and the scheduler placed the pending pods, restoring service.

Implementing Automated Solutions

1. Cluster Autoscaler (CAS)

The Cluster Autoscaler eliminates the need for manual intervention by automatically adjusting node pool size based on pod scheduling needs.

Configuration

Enable CAS on the node pool with defined scaling boundaries:

gcloud container node-pools update spot-pool \
    --cluster production-cluster \
    --enable-autoscaling \
    --min-nodes 1 \
    --max-nodes 10 \
    --location us-central1

Operation

Detection: CAS continuously monitors for pods in Pending state due to insufficient resources
Action: When detected, it provisions additional nodes within configured limits
Self-Healing: Once nodes reach Ready state, the scheduler places pods automatically
Scale-Down: CAS also removes underutilized nodes to optimize costs

Best Practices

Set min-nodes to handle baseline load without delay
Configure max-nodes to prevent runaway scaling and cost overruns
Enable CAS on all production node pools, especially cost-optimized pools
Consider separate node pools for different workload types (CPU-intensive, memory-intensive, GPU workloads)

2. Accurate Resource Specification

Automation effectiveness depends on accurate input. Resource requests in deployment manifests must reflect actual application needs.

Common Pitfalls

Underspecified Requests: Leads to node overcommitment, performance degradation, and "noisy neighbor" problems where pods compete for resources.

Overspecified Requests: Results in wasted capacity, unnecessary scaling, and increased costs. Nodes may appear full while actual utilization remains low.

Determining Appropriate Values

Option 1: Vertical Pod Autoscaler (VPA)

Run VPA in recommendation mode to analyze actual resource usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updateMode: "Off"  # Recommendation only

Option 2: Metrics Analysis

Monitor actual usage via Prometheus, Cloud Monitoring, or similar tools:

kubectl top pods --containers

Analyze usage patterns over time and set requests at the 90th percentile of observed usage, with limits at 2-3x requests for burst capacity.

Example Resource Definition

apiVersion: apps/v1
kind: Deployment
metadata:
  name: microservice-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: gcr.io/project/app:v1
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1000m
            memory: 2Gi

3. Horizontal Pod Autoscaler (HPA)

Complement CAS with HPA to scale pod replicas based on demand:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: microservice-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: microservice-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

This creates a self-regulating system where HPA scales pods horizontally and CAS scales nodes to accommodate them.

4. Pod Disruption Budgets (PDB)

Protect application availability during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: microservice-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: microservice-app

PDBs ensure that automated operations (including CAS scale-downs) maintain minimum replica counts.

Monitoring and Observability

Essential Alerts

Configure alerting for critical scheduling events:

FailedScheduling Events:

# Example Prometheus alert rule
- alert: PodsStuckPending
  expr: |
    kube_pod_status_phase{phase="Pending"} > 0
  for: 5m
  annotations:
    summary: "Pods stuck in Pending state"

Node Pool Scaling Events: Monitor when CAS reaches maximum node limits, indicating potential capacity planning issues.

Resource Pressure: Alert on nodes approaching allocatable capacity thresholds before pods become pending.

Key Metrics to Track

Pod scheduling latency
Node provisioning time
Resource utilization vs. requests ratio
Frequency of scaling events
Cost trends correlated with scaling patterns

Dashboards

Create dashboards displaying:

Cluster-wide resource allocation and utilization
Per-node capacity breakdown
Pod pending duration histogram
Autoscaler activity timeline

Advanced Considerations

Multi-Pool Strategies

Consider separating workloads across specialized node pools:

On-Demand Pool: Critical, stateful applications requiring high availability
Spot Pool: Fault-tolerant, stateless workloads with cost optimization priority
GPU Pool: ML/AI workloads requiring specialized hardware

Use node affinity and taints/tolerations to control pod placement:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: cloud.google.com/gke-spot
            operator: In
            values:
            - "true"

Priority Classes

Define priority classes to ensure critical workloads are scheduled first:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000
globalDefault: false
description: "High priority for critical services"

Cluster Overprovisioning

Deploy placeholder pods to maintain buffer capacity for faster scaling:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioner
spec:
  replicas: 1
  template:
    spec:
      priorityClassName: overprovisioning
      containers:
      - name: pause
        image: k8s.gcr.io/pause
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi

These placeholder pods are evicted when higher-priority workloads need resources, ensuring capacity is immediately available.

Cost Optimization

Right-Sizing Recommendations

Regularly review VPA recommendations
Analyze actual vs. requested resources across all namespaces
Identify overprovisioned applications for cost reduction
Consider committed use discounts for stable baseline capacity

Spot VM Best Practices

Set appropriate max-nodes to prevent excessive scaling
Implement graceful shutdown handling for preemption events
Use pod disruption budgets to maintain availability
Monitor spot instance reclamation rates and adjust strategies accordingly

Conclusion

Resource exhaustion leading to pending pods is a preventable operational issue. While manual intervention provides immediate relief, sustainable operations require architectural solutions centered on automation.

Implementation Checklist

✅ Enable Cluster Autoscaler on all production node pools with appropriate min/max boundaries

✅ Define accurate resource requests based on actual application usage patterns

✅ Implement Horizontal Pod Autoscaler to scale replicas based on demand

✅ Configure Pod Disruption Budgets to protect application availability

✅ Establish monitoring and alerting for scheduling failures and resource pressure

✅ Create documentation for resource specification standards and autoscaling policies

✅ Conduct regular reviews of resource utilization and autoscaling effectiveness

By implementing these automated solutions and best practices, teams can build resilient, self-healing Kubernetes environments that respond to resource demands without manual intervention, allowing engineers to focus on delivering value rather than firefighting infrastructure issues.

For more information on GKE best practices and resource management, refer to the official Google Kubernetes Engine documentation.

To view or add a comment, sign in

Automating Resource Management in GKE: A Guide to Resolving Pending Pods

Mariusz (Mario) Dworniczak, PMP

Introduction

Understanding Kubernetes Resource Scheduling

The Scheduling Contract

Resource Requests and Limits

Node Allocatable Capacity

Considerations for Spot VMs

Case Study: Resource Exhaustion in Production

Scenario Overview

Diagnostic Process

Step 1: Examine Pod Events

Step 2: Analyze Node Capacity

Immediate Resolution

Implementing Automated Solutions

1. Cluster Autoscaler (CAS)

Configuration

Operation

Best Practices

2. Accurate Resource Specification

Common Pitfalls

Determining Appropriate Values

Example Resource Definition

3. Horizontal Pod Autoscaler (HPA)

4. Pod Disruption Budgets (PDB)

Monitoring and Observability

Essential Alerts

Key Metrics to Track

Dashboards

Advanced Considerations

Multi-Pool Strategies

Priority Classes

Cluster Overprovisioning

Cost Optimization

Right-Sizing Recommendations

Spot VM Best Practices

Conclusion

Implementation Checklist

More articles by Mariusz (Mario) Dworniczak, PMP

Explore content categories

Introduction

Understanding Kubernetes Resource Scheduling

The Scheduling Contract

Resource Requests and Limits

Node Allocatable Capacity

Considerations for Spot VMs

Case Study: Resource Exhaustion in Production

Scenario Overview

Diagnostic Process

Step 1: Examine Pod Events

Step 2: Analyze Node Capacity

Immediate Resolution

Implementing Automated Solutions

1. Cluster Autoscaler (CAS)

Configuration

Operation

Best Practices

2. Accurate Resource Specification

Common Pitfalls

Determining Appropriate Values

Example Resource Definition

3. Horizontal Pod Autoscaler (HPA)

4. Pod Disruption Budgets (PDB)

Monitoring and Observability

Essential Alerts

Key Metrics to Track

Dashboards

Advanced Considerations

Multi-Pool Strategies

Priority Classes

Cluster Overprovisioning

Cost Optimization

Right-Sizing Recommendations

Spot VM Best Practices

Conclusion

Implementation Checklist

More articles by Mariusz (Mario) Dworniczak, PMP

Automating Cloud Environments with the Cloud Foundation Toolkit (CFT)

AI-Assisted Development Guides for Software Engineers: The Path to Super-Productivity

Navigating the Red Zones: High-Risk Areas for AI Assistance in Software Engineering

The Strategy Pattern: Interchangeable Algorithms for Flexible Systems

How AI Enhances Requirements Engineering

SAST vs. DAST: Building a Comprehensive Application Security Strategy

Understanding Cryptographic Puzzles in Blockchain

Merkle Trees in Blockchain: The Root of Trust!

Modern Software Testing: Leveraging AI to Enhance Testing Protocols and Efficiency

The Testing Phase in SDLC: Ensuring Quality and Reliability

Explore content categories