Load Balancers

Load Balancers

In this tutorial, we are going to discuss about Load balancers, another important system design concept. Load balancers are machines that balance the load among various servers.

Millions of requests could arrive per second in a typical data center. To serve these requests, thousands (or a hundred thousand) servers work together to share the load of incoming requests.

Here, it’s important that we consider how the incoming requests will be divided among all the available servers.

What is a load balancer?

A load balancer (LB) is the answer to the question. The job of the load balancer is to fairly divide all clients’ requests among the pool of available servers. Load balancers perform this job to avoid overloading or crashing servers.

Load Balancers

The main goal of load balancing is to ensure high availability, reliability, and performance by avoiding overloading a single server and avoiding downtime.

Typically a load balancer sits between the client and the server accepting incoming network and application traffic and distributing the traffic across multiple backbend servers using various algorithms.

By balancing application requests across multiple servers, a load balancer reduces the load on individual servers and prevents any one server from becoming a single point of failure, thus improving overall application availability and responsiveness.

Placing load balancers

Generally, Load Balancers sit between clients and servers. Requests go through to servers and back to clients via the load balancing layer. However, that isn’t the only point where load balancers are used.

Let’s consider the three well-known groups of servers. That is the web, the application, and the database servers. To divide the traffic load among the available servers, load balancers can be used between the server instances of these three services in the following way:

  • Place Load Balancers between end users of the application and web servers/application gateway.
  • Place Load Balancers between the web servers and application servers that run the business/application logic.
  • Place Load Balancers between the application servers and database servers.
Load Balancers

In reality, load balancers can be potentially used between any two services with multiple instances within the design of a system.

Why do we need a load balancer?

Suppose we have a single server setup, where several clients are sending requests to a single server. When number of requests increases, there will be two critical issues:

Server overloading:  There is a limit to the number of requests a single server can handle. If the number of requests exceeds this limit, the server may become overloaded and unable to function properly.

Single point of failure:  If the single server goes down for any reason, the entire application will become unavailable to users. This can result in a poor user experience and impact the reliability of the system.

We can solve the above problems in two ways:

Vertical scaling: We can increase the power of our current server. But there are limits to how much we can increase the capabilities of a single machine.

Horizontal scaling: We can add more servers to our system. Now this will bring a new challenge: How to distribute requests evenly across these servers? The answer is: We should use load balancers!

The load balancer will not only help us to distribute requests across multiple servers but also increase system capacity to add more servers if the number of requests increases in the future.

On the other side, load balancer also continuously checks the health of each server. If one of the servers goes offline due to some reason, it will redirect the traffic going to that server to some other available server. So load balancers help us to ensure that the service remains available, even in the case of the failure of one server or multiple servers.

How Load Balancer works?

Load balancers work by distributing incoming network traffic across multiple servers or resources to ensure efficient utilization of computing resources and prevent overload. Here are the general steps that a load balancer follows to distribute traffic:

  1. The load balancer receives a request from a client or user.
  2. The load balancer evaluates the incoming request and determines which server or resource should handle the request. This is done based on a predefined load-balancing algorithm that takes into account factors such as server capacity, server response time, number of active connections, and geographic location.
  3. The load balancer forwards the incoming traffic to the selected server or resource.
  4. The server or resource processes the request and sends a response back to the load balancer.
  5. The load balancer receives the response from the server or resource and sends it to the client or user who made the request.
Types of load balancers
1. Types of Load Balancer – Based on Configurations

There are mainly three types of load balancers based on configurations:

1. Software load balancers

Software load balancers are applications or components that run on general-purpose servers. They are implemented in software, making them flexible and adaptable to various environments.


  • HAProxy: A TCP load balancer.
  • NGINX: An HTTP load balancer with SSL termination support.
  • mod_athena: Apache-based HTTP load balancer.
  • Varnish: A reverse proxy-based load balancer.
  • Balance: Open-source TCP load balancer.
  • LVS: Linux virtual server offering layer 4 load balancing.
2. Hardware load balancers

Hardware load balancers are physical devices that are installed in a network or in data centres. They are generally less flexible and offer fewer options for customization. But they are often faster and more reliable than software load balancers because they are dedicated hardware devices that are designed specifically for load balancing.

Overall, the choice between a software or hardware load balancer depends on the specific requirements of a system. Software load balancers are more suitable for systems that require a high level of customization and scalability, while hardware load balancers are better for systems that require high performance and reliability.

These load balancers are expensive to acquire and configure, which is the reason a lot of service providers use them only as the first entry point for user requests. Later the internal software load balancers are used to redirect the data behind the infrastructure wall. Hardware load balancers may include additional features like SSL acceleration and application-layer processing.


  • F5 BIG-IP load balancer
  • CISCO system catalyst
  • Barracuda load balancer
  • Coytepoint load balancer
  • Citrix NetScaler
3. Virtual Load Balancers

A virtual load balancer is a type of load balancing solution implemented as a virtual machine (VM) or software instance within a virtualized environment, such as data centers utilizing virtualization technologies like VMware, Hyper-V, or KVM. It plays a crucial role in distributing incoming network traffic across multiple servers or resources to ensure efficient utilization of resources, improve response times, and prevent server overload.

2. Types of Load Balancer – Based on Functions
1. Layer 4 (L4) Load Balancer

Layer-4 load balancers operate at the transport layer of the OSI model. They make forwarding decisions based on information available in network layer protocols (such as IP addresses and port numbers). 

2. Layer 7 (L7) Load Balancer

Layer-7 load balancers operate at the application layer of the OSI model. They can make load balancing decisions based on content, including information such as URLs, HTTP headers, or cookies.  

3. Global Server Load Balancing (GSLB)

GSLB stands for Global Server Load Balancer. This type of load balancer goes beyond the traditional local load balancing and is designed for distributing traffic across multiple data centers or geographically distributed servers. It takes into account factors such as server proximity, server health, and geographic location to intelligently distribute traffic across multiple locations.

Load Balancing Algorithms

Load balancers use various load-balancing algorithms to distribute incoming network traffic across multiple servers in a balanced manner. So, it is the responsibility of these algorithms to select a server from a pool of available servers to direct each incoming request.

The different system uses different ways to select the servers from the load balancer. Companies use varieties of load-balancing algorithm techniques depending on the configuration. Some of the common load-balancing algorithms are given below:

1. Round Robin

This algorithm distributes incoming requests to servers in a cyclic order. It assigns a request to the first server, then moves to the second, third, and so on, and after reaching the last server, it starts again at the first.


  • Ensures an equal distribution of requests among the servers, as each server gets a turn in a fixed order.
  • Easy to implement and understand.
  • Works well when servers have similar capacities.


  • May not perform optimally when servers have different capacities or varying workloads.
  • No consideration for server health or response time.
  • Round Robin is predictable in its request distribution pattern, which could potentially be exploited by attackers who can observe traffic patterns and might find vulnerabilities in specific servers by predicting which server will handle their requests.

Example: A website with three web servers receives requests in the order A, B, C, A, B, C, and so on, distributing the load evenly among the servers.

2. Least Connections

The Least Connections algorithm directs incoming requests to the server with the lowest number of active connections. This approach accounts for the varying workloads of servers.


  • Adapts to differing server capacities and workloads.
  • Balances load more effectively when dealing with requests that take a variable amount of time to process.


  • Requires tracking the number of active connections for each server, which can increase complexity.
  • May not factor in server response time or health.

Example: An email service receives requests from users. The load balancer directs new requests to the server with the fewest active connections, ensuring that servers with heavier workloads are not overwhelmed.

3. Weighted Round Robin

The Weighted Round Robin algorithm is an extension of the Round Robin algorithm that assigns different weights to servers based on their capacities. The load balancer distributes requests proportionally to these weights.


  • Accounts for different server capacities, balancing load more effectively.
  • Simple to understand and implement.


  • Weights must be assigned and maintained manually.
  • No consideration for server health or response time.

Example: A content delivery network has three servers with varying capacities. The load balancer assigns weights of 3, 2, and 1 to these servers, respectively, distributing requests in a 3:2:1 ratio.

4. Weighted Least Connections

The Weighted Least Connections algorithm combines the Least Connections and Weighted Round Robin algorithms. It directs incoming requests to the server with the lowest ratio of active connections to assigned weight.


  • Balances load effectively, accounting for both server capacities and active connections.
  • Adapts to varying server workloads and capacities.


  • Requires tracking active connections and maintaining server weights.
  • May not factor in server response time or health.

Example: An e-commerce website uses three servers with different capacities and assigned weights. The load balancer directs new requests to the server with the lowest ratio of active connections to weight, ensuring an efficient distribution of load.

5. IP Hash

The IP Hash algorithm determines the server to which a request should be sent based on the source and/or destination IP address. This method maintains session persistence, ensuring that requests from a specific user are directed to the same server.


  • Maintains session persistence, which can be useful for applications requiring a continuous connection with a specific server.
  • Can distribute load evenly when using a well-designed hash function.


  • May not balance load effectively when dealing with a small number of clients with many requests.
  • No consideration for server health, response time, or varying capacities.

Example: An online multiplayer game uses the IP Hash algorithm to ensure that all requests from a specific player are directed to the same server, maintaining a continuous connection for a smooth gaming experience.

6. Least Response Time

The Least Response Time algorithm directs incoming requests to the server with the lowest response time and the fewest active connections. This method helps to optimize the user experience by prioritizing faster-performing servers.


  • Accounts for server response times, improving user experience.
  • Considers both active connections and response times, providing effective load balancing.


  • Requires monitoring and tracking server response times and active connections, adding complexity.
  • May not factor in server health or varying capacities.

Example: A video streaming service uses the Least Response Time algorithm to direct users to the server with the fastest response time, ensuring that videos start quickly and minimize buffering times.

7. Custom Load

The Custom Load algorithm allows administrators to create their own load balancing algorithm based on specific requirements or conditions. This can include factors such as server health, location, capacity, and more.


  • Highly customizable, allowing for tailored load balancing to suit specific use cases.
  • Can consider multiple factors, including server health, response times, and capacity.


  • Requires custom development and maintenance, which can be time-consuming and complex.
  • May require extensive testing to ensure optimal performance.

Example: An organization with multiple data centers around the world develops a custom load balancing algorithm that factors in server health, capacity, and geographic location. This ensures that users are directed to the nearest healthy server with sufficient capacity, optimizing user experience and resource utilization.

8. Random

The Random algorithm directs incoming requests to a randomly selected server from the available pool. This method can be useful when all servers have similar capacities and no session persistence is required.


  • Simple to implement and understand.
  • Can provide effective load distribution when servers have similar capacities.
  • Security systems that rely on detecting anomalies or implementing rate limiting (e.g., to mitigate DDoS attacks) might find it slightly more challenging to identify malicious patterns if a Random algorithm is used, due to the inherent unpredictability in request distribution. This could potentially dilute the visibility of attack patterns.


  • No consideration for server health, response times, or varying capacities.
  • May not be suitable for applications requiring session persistence.

Example: A static content delivery network uses the Random algorithm to distribute requests for images, JavaScript files, and CSS stylesheets among multiple servers. This ensures an even distribution of load and reduces the chances of overloading any single server.

9. Least Bandwidth

The Least Bandwidth algorithm directs incoming requests to the server currently utilizing the least amount of bandwidth. This approach helps to ensure that servers are not overwhelmed by network traffic.


  • Considers network bandwidth usage, which can be helpful in managing network resources.
  • Can provide effective load balancing when servers have varying bandwidth capacities.


  • Requires monitoring and tracking server bandwidth usage, adding complexity.
  • May not factor in server health, response times, or active connections.

Example: A file hosting service uses the Least Bandwidth algorithm to direct users to the server with the lowest bandwidth usage, ensuring that servers with high traffic are not overwhelmed and that file downloads are fast and reliable.

Different algorithms have different strengths and weaknesses. So the correct choice of load-balancing algorithm depends on the characteristics of the workload and the goals of the load-balancing strategy. On the other side, load balancers often provide various configuration options to choose the load-balancing algorithm.

That’s all about load balancers in system design. If you have any queries or feedback, please write us at contact@waytoeasylearn.com. Enjoy learning, Enjoy system design..!!

Load Balancers
Scroll to top