Kubernetes Cost Optimization

Posted Jul 30, 2025

By Muzammil Shaik 6 min read

💰 Kubernetes Cost Optimization

As organizations scale their Kubernetes deployments, managing costs becomes increasingly important. Kubernetes provides several mechanisms to optimize resource usage and control costs, but it requires careful planning and monitoring.

🌟 Understanding Kubernetes Resource Management

Before diving into cost optimization strategies, it’s essential to understand how Kubernetes manages resources:

Resource Requests and Limits

Kubernetes uses two key concepts for resource management:

Resource Requests: The minimum amount of resources that Kubernetes will guarantee to a container. These are used for scheduling decisions.
Resource Limits: The maximum amount of resources that a container can use. These are enforced by the container runtime.

Here’s an example of setting resource requests and limits in a Pod specification:

        
      
apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: resource-demo-container
    image: nginx
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Understanding CPU Resources

CPU resources are specified in units of CPU cores:

1 means one full CPU core
500m means 500 millicores or 0.5 CPU cores
0.1 means 0.1 CPU cores or 100 millicores

CPU is a compressible resource, meaning that containers can be throttled when they exceed their CPU limit but won’t be terminated.

Understanding Memory Resources

Memory resources are specified in bytes:

128Mi means 128 mebibytes (134,217,728 bytes)
1Gi means 1 gibibyte (1,073,741,824 bytes)

Memory is a non-compressible resource, meaning that containers cannot be throttled when they exceed their memory limit and will be terminated (OOM killed).

🚀 Cost Optimization Strategies

1. Right-sizing Resource Requests and Limits

One of the most effective ways to optimize costs is to right-size your resource requests and limits:

Too high requests: Wastes resources and increases costs
Too low requests: Risks application performance and stability
Too high limits: Can lead to resource hogging and noisy neighbor problems
Too low limits: Can cause application crashes or performance issues

Best Practices for Resource Sizing:

Start with a reasonable estimate based on application requirements
Monitor actual usage and adjust accordingly
Use tools like Vertical Pod Autoscaler (VPA) in recommendation mode
Consider using Goldilocks or other recommendation tools

2. Implementing Autoscaling

Kubernetes offers several autoscaling mechanisms:

Horizontal Pod Autoscaler (HPA)

Scales the number of Pod replicas based on CPU or memory utilization or custom metrics.

        
      
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Vertical Pod Autoscaler (VPA)

Adjusts the CPU and memory requests/limits of containers based on historical usage.

        
      
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: webapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 50m
        memory: 50Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi

Cluster Autoscaler

Adjusts the size of the Kubernetes cluster based on pending pods and node utilization.

3. Using Namespaces and Resource Quotas

Namespaces and Resource Quotas help control resource usage across teams and applications:

        
      
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    pods: "10"
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi

4. Implementing Pod Disruption Budgets

Pod Disruption Budgets (PDBs) ensure high availability while allowing for efficient resource usage:

        
      
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: webapp-pdb
spec:
  minAvailable: 2  # or maxUnavailable: 1
  selector:
    matchLabels:
      app: webapp

5. Using Spot/Preemptible Instances

For non-critical workloads, consider using spot or preemptible instances:

        
      
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spot-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      nodeSelector:
        cloud.google.com/gke-spot: "true"  # For GKE
        # Or for AWS: node.kubernetes.io/instance-type: spot
      tolerations:
      - key: cloud.google.com/gke-spot
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"

🔍 Monitoring and Optimization Tools

1. Kubernetes Dashboard

The Kubernetes Dashboard provides a basic overview of resource usage:

        
      
# Deploy Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

# Create a service account and get a token
kubectl create serviceaccount dashboard-admin
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin

2. Prometheus and Grafana

Prometheus and Grafana provide more detailed monitoring and visualization:

        
      
# Add Prometheus Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install Prometheus stack (includes Grafana)
helm install prometheus prometheus-community/kube-prometheus-stack

3. Goldilocks

Goldilocks is a tool that helps you identify the right resource requests and limits:

        
      
# Add Fairwinds Helm repository
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

# Install Goldilocks
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace

# Enable Goldilocks for a namespace
kubectl label namespace default goldilocks.fairwinds.com/enabled=true

4. kube-resource-report

kube-resource-report generates reports on resource usage and costs:

        
# Run kube-resource-report
docker run --rm -v ~/.kube:/kube -v $(pwd):/output hjacobs/kube-resource-report:20.7.0 /output --kubeconfig-path=/kube/config

💡 Best Practices for Cost Optimization

1. Establish Resource Standards

Define standard resource requests and limits for common application types
Create templates or Helm charts with sensible defaults
Implement policies to enforce resource specifications

2. Implement Cost Allocation

Use labels to track resources by team, application, or environment
Implement chargeback or showback mechanisms
Use tools like Kubecost or CloudHealth for cost allocation

3. Regular Resource Reviews

Schedule regular reviews of resource usage and costs
Identify and address resource waste
Adjust resource requests and limits based on actual usage

4. Use Multi-dimensional Autoscaling

Combine HPA, VPA, and Cluster Autoscaler for comprehensive scaling
Use custom metrics for more accurate scaling decisions
Implement scaling policies based on business metrics

5. Optimize Storage Costs

Use appropriate storage classes for different workloads
Implement storage lifecycle policies
Consider using object storage for large, infrequently accessed data

🏁 Conclusion

Kubernetes cost optimization is an ongoing process that requires a combination of proper resource management, monitoring, and automation. By implementing the strategies and best practices outlined in this guide, you can significantly reduce your Kubernetes costs while maintaining performance and reliability.

Remember that cost optimization should not come at the expense of application performance or reliability. Always test changes in a non-production environment before implementing them in production, and monitor the impact of changes on both costs and application performance.

📚 Additional Resources

Devops

This post is licensed under CC BY 4.0 by the author.