Kubernetes High Availability Best Practices: Mastering Pod Distribution

13 February 2025

When running applications in Kubernetes, ensuring high availability isn’t just about having multiple replicas – it’s about intelligently distributing those replicas across your infrastructure. In this blog post, we’ll dive deep into advanced pod distribution strategies that help maintain application resilience and optimal performance. Assuming these pods will always kept separated automatically by Kubernetes is a big mistake, as the scheduling algorithm at some point can place same pod on the same nodes if the pod distribution configuration not properly configured.

Understanding Pod Distribution Challenges

Before we explore solutions, let’s understand the challenges:

Multiple pods of the same application could end up on the same node, creating a single point of failure
Uneven pod distribution across zones can lead to degraded performance during zone failures
Resource contention between co-located pods can impact application performance
Network latency between pods in different regions can affect application response times

Pod Anti-Affinity: Keeping Pods Apart

Pod anti-affinity is one of the most powerful tools for ensuring high availability. It allows you to define rules that prevent pods from being scheduled together based on specified criteria. Below diagram shows what soft and hard anti-affinity will do.

Here’s an example of how to implement hard pod anti-affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: web
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web
            topologyKey: kubernetes.io/hostname

This configuration ensures that pods with the label app: web will never be scheduled on the same node. For more flexibility, you can use preferredDuringSchedulingIgnoredDuringExecution to implement soft anti-affinity rules.

Topology Spread Constraints: Even Distribution Across Your Cluster

While anti-affinity helps keep pods apart, topology spread constraints ensure even distribution across your infrastructure. This is particularly important in multi-zone clusters. As in the below diagram, it shows how pod topology spread can help to ensure service are resilient on node failure:

Here’s an example of implementing topology spread constraints:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 6
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web

This configuration ensures that:

Pods are distributed evenly across zones
The difference in number of pods between any two zones won’t exceed 1
New pods won’t be scheduled if they would violate the maxSkew constraint

Combining Strategies for Maximum Resilience

For optimal high availability, combine both approaches:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  replicas: 6
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: web
            topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web

This configuration provides:

Node-level pod separation through anti-affinity
Zone-level even distribution through topology spread constraints
Protection against both node and zone failures

Best Practices and Recommendations

Start with Soft Constraints
Begin with soft anti-affinity rules and topology spread constraints during initial deployment. This provides flexibility while you evaluate the impact on your cluster.
Monitor Pod Distribution
Regular monitoring of pod distribution is crucial. Use tools like:

   kubectl get pods -o wide
   kubectl describe nodes | grep -A5 "Non-terminated Pods"

Consider Resource Requirements
When implementing distribution strategies, account for:

Node resource capacity
Reserved resources for system components
Resource requirements of your applications

Plan for Failure Scenarios
Test your configuration under different failure scenarios:

Node failures
Zone outages
Network partitions

Summary

It is very critical in implementing proper pod distribution strategies as it is crucial for maintaining high availability in Kubernetes clusters especially for mission critical services. By combining pod anti-affinity with topology spread constraints, you can create resilient applications that can withstand various types of infrastructure failures.

Remember that these configurations should be tailored to your specific needs, considering factors like:

Application architecture
Infrastructure topology
Performance requirements
Business continuity needs

Regular testing and monitoring will help ensure your chosen strategies effectively maintain the desired level of availability for your applications.

Such approach will help us to plan for a maintenance with no downtime required for example like cluster patching in this post.