Infrastructure

Load Balancing

Load balancing distributes incoming network traffic across multiple servers to optimize utilization, ensure resilience and minimize response times for users.

When a web application grows and thousands or millions of users access it at once, a single server is not enough. Load balancing is the answer: a load balancer distributes incoming requests across multiple backend servers for even load, high availability and fast response. Without load balancing the modern internet – from Google and Netflix to your online shop – would not be possible. For businesses, load balancing is the foundation of every scalable and resilient IT infrastructure.

What is Load Balancing?

Load balancing is a technique that distributes incoming network traffic (HTTP requests, TCP connections, DNS queries) across a group of backend servers (pool or farm). The load balancer sits between clients and servers: it receives requests, chooses a target server using an algorithm and forwards the request. If a server fails, the load balancer detects it via health checks and sends traffic to the remaining servers – without user-visible interruption. Types include Layer-4 (TCP/UDP, very fast) and Layer-7 (HTTP, with access to URL, headers and cookies for smarter distribution). Load balancers can be hardware (F5, Citrix), software (NGINX, HAProxy, Traefik) or cloud services (AWS ALB/NLB, Azure Load Balancer, Google Cloud Load Balancing).

How does Load Balancing work?

A client sends a request (e.g. HTTPS) to the load balancer’s public IP or domain. The load balancer selects a backend server using a configured algorithm: Round Robin (in turn), Least Connections (server with fewest active connections), Weighted (more capable servers get more traffic) or IP Hash (same client IP always to same server for session persistence). Health checks periodically verify each server (e.g. HTTP GET /health → 200 OK). Unresponsive servers are removed from the pool and re-added when healthy. With Layer-7, requests can also be routed by URL path, header or cookie to specific server groups (content-based routing).

Practical Examples

NGINX as reverse proxy: An online shop uses NGINX in front of 4 Node.js backends with least-connections and automatic failover.

AWS Application Load Balancer: A SaaS platform uses ALB with path-based routing: /api/* to backend, /app/* to frontend, /ws/* to WebSocket servers.

Kubernetes Ingress: A microservices system uses Traefik as Ingress controller for automatic load balancing across pods with health checks.

Global Server Load Balancing: An international streaming service uses DNS-based load balancing to send users to the nearest data centre.

Database read replicas: A news site distributes read traffic across 3 PostgreSQL read replicas via HAProxy; writes go to the primary.

Typical Use Cases

High availability: Automatic failover when servers fail – no single point of failure

Horizontal scaling: Add servers to the pool when load increases – without downtime

Blue-green deployments: Gradually shift traffic from old to new version (canary/rolling)

SSL termination: Load balancer handles TLS and offloads backend servers

Geo-routing: Users are directed to the geographically nearest data centre

Advantages and Disadvantages

Advantages

High availability: If one server fails, others take over – users see no interruption
Scalability: Scale out by adding servers to the pool
Performance: Even distribution prevents overload of single servers and reduces response time
Flexibility: Maintain and update individual servers without downtime (rolling updates)
Security: Load balancer as central point for SSL termination, DDoS protection and rate limiting

Disadvantages

Extra complexity: The load balancer itself must be highly available (active-passive or active-active)
Session handling: Stateful apps need sticky sessions or external session store (Redis)
Cost: Cloud load balancers charge by data and number of rules – relevant at high traffic
Debugging: Issues in distributed systems are harder to diagnose than on a single server

Frequently Asked Questions about Load Balancing

What is the difference between Layer-4 and Layer-7 load balancing?

Layer-4 works on TCP/UDP and forwards packets without inspecting content – very fast and lightweight. Layer-7 works on HTTP and can route by URL path, headers, cookies or body. Layer-7 allows smarter routing (e.g. API to backend A, static to backend B) but is somewhat slower.

Which load-balancing algorithm should I use?

Round Robin is simplest and works well with similar servers and similar load per request. Least Connections fits when request duration varies (e.g. API with fast and slow endpoints). Weighted Round Robin when servers have different capacity. IP Hash for session persistence without sticky sessions. For most web apps Least Connections is a good default.

Do I need load balancing for a small application?

Small apps also benefit – not mainly for load but for resilience. Two servers behind a load balancer enable zero-downtime deployments and automatic failover. Cloud load balancers (e.g. AWS ALB) are cheap and quick to set up. For the very beginning a single server with NGINX as reverse proxy is enough – the architecture can be extended to load balancing later.

Want to use Load Balancing in your project?

We are happy to advise you on Load Balancing and find the optimal solution for your requirements. Benefit from our experience across over 200 projects.

Learn more Get free consultation

Back to IT Glossary