Load Balancing Algorithms

Load Balancing Algorithms

In this tutorial, we are going to discuss about Load balancing algorithms. A load balancing algorithm is a method used by a load balancer to distribute incoming traffic and requests among multiple servers or resources. The primary purpose of a load balancing algorithm is to ensure efficient utilization of available resources, improve overall system performance, and maintain high availability and reliability.

Load balancing algorithms play a critical role in efficiently distributing incoming network traffic across multiple servers or resources to optimize performance, improve scalability, and ensure high availability.

Load balancing algorithms help to prevent any single server or resource from becoming overwhelmed, which could lead to performance degradation or failure. By distributing the workload, load balancing algorithms can optimize response times, maximize throughput, and enhance user experience. These algorithms can consider factors such as server capacity, active connections, response times, and server health, among others, to make informed decisions on how to best distribute incoming requests.

The following are the most well-known load balancing algorithms:

1. Round Robin

Round Robin algorithm is the one of the well known load balancing algorithms. This algorithm distributes incoming requests to servers in a cyclic order. It assigns a request to the first server, then moves to the second, third, and so on, and after reaching the last server, it starts again at the first.

Load Balancing Algorithms

As illustrated in the above visual, there are 3 servers available, and the load balancer will route requests 1 to server 1, 2 to server 2, 3 to server 3, 4 to server 1, and 5 to server 2, and so on.


  • Ensures an equal distribution of requests among the servers, as each server gets a turn in a fixed order.
  • Easy to implement and understand.
  • Works well when servers have similar capacities.


  • May not perform optimally when servers have different capacities or varying workloads.
  • No consideration for server health or response time.
  • Round Robin is predictable in its request distribution pattern, which could potentially be exploited by attackers who can observe traffic patterns and might find vulnerabilities in specific servers by predicting which server will handle their requests.
2. Least Connections

This is the another simple and well known algorithm in the load balancing algorithms. The Least Connections algorithm sends incoming requests to the server with the lowest number of active connections. This approach accounts for the varying workloads of servers.

Least Connections algorithm

As illustrated in the above visual, there are 3 servers available: server 1 has 10 active connections, server 2 has 100 active connections, and server 3 has 1000 active connections. So server 1 has a lower number of active connections. So the load balancer will route all requests to server 1 and ensuring that servers with heavier workloads are not overwhelmed.

Example: An email service receives requests from users. The load balancer directs new requests to the server with the fewest active connections, ensuring that servers with heavier workloads are not overwhelmed.


  • Adapts to differing server capacities and workloads.
  • Balances load more effectively when dealing with requests that take a variable amount of time to process.


  • Requires tracking the number of active connections for each server, which can increase complexity.
  • May not factor in server response time or health.
3. Weighted Round Robin

The Weighted Round Robin algorithm is an extension of the Round Robin algorithm that assigns different weights to servers based on their capacities. The load balancer distributes requests based to these weights.

Weighted Round Robin algorithm

Example: A content delivery network has three servers with varying capacities. The load balancer assigns weights of 3, 2, and 1 to these servers, respectively, distributing requests in a 3:2:1 ratio.


  • Accounts for different server capacities, balancing load more effectively.
  • Simple to understand and implement.


  • Weights must be assigned and maintained manually.
  • No consideration for server health or response time.
4. Weighted Least Connections

The Weighted Least Connections algorithm is a combination of the Least Connections and Weighted Round Robin algorithms. It routes incoming requests to the server that has the lowest ratio of active connections to allocated weight. This ensures an efficient distribution of load in scenarios where there are multiple servers with varying capacities and assigned weights.

Weighted Least Connections algorithm

Example: An e-commerce website uses 3 servers with different capacities and assigned weights. The load balancer directs new requests to the server with the lowest ratio of active connections to weight, ensuring an efficient distribution of load.


  • Balances load effectively, accounting for both server capacities and active connections.
  • Adapts to varying server workloads and capacities.


  • Requires tracking active connections and maintaining server weights.
  • May not factor in server response time or health.
5. IP Hash

The IP Hash algorithm determines the server to which a request should be sent based on the source and/or destination IP address. It involves hashing the source or destination IP address of each incoming packet to determine which server should handle the packet. This method maintains session persistence, ensuring that requests from a specific user are directed to the same server.

Here’s how the IP hash algorithm typically works:

  1. Hash Calculation: When a packet arrives at the load balancer or router, the source IP address of the packet is extracted.
  2. Hashing: The source IP address is then used as input to a hash function, which generates a hash value. This hash value is typically a numeric value.
  3. Mapping: The hash value is mapped to one of the available servers or resources in the pool. This mapping can be done in various ways, such as by taking the modulus of the hash value with the total number of servers to determine the server index.
  4. Routing: The packet is then forwarded to the server or resource that corresponds to the mapped index.
IP Hash algorithm

Example: An online multiplayer game uses the IP Hash algorithm to ensure that all requests from a specific player are directed to the same server, maintaining a continuous connection for a smooth gaming experience.


  • Maintains session persistence, which can be useful for applications requiring a continuous connection with a specific server.
  • Can distribute load evenly when using a well-designed hash function.


  • May not balance load effectively when dealing with a small number of clients with many requests.
  • No consideration for server health, response time, or varying capacities.
6. Least Response Time

The Least Response Time (LRT) algorithm is a method used in load balancing to distribute incoming requests or traffic among a set of servers or resources based on their response times. This algorithm directs incoming requests to the server with the lowest response time and the fewest active connections. This method helps to optimize the user experience by prioritizing faster-performing servers.

Here’s how the Least Response Time algorithm typically works:

  1. Monitoring Response Times: Load balancers continuously monitor the response times of each server in the pool. This monitoring can be done using various techniques such as sending periodic health checks or measuring the time taken to respond to requests.
  2. Selecting the Server: When a new request arrives at the load balancer, it selects the server with the lowest current response time from the pool of available servers. This selection can be based on real-time response time measurements or on historical data, depending on the implementation.
  3. Routing the Request: The incoming request is then forwarded to the selected server for processing.
  4. Updating Response Time Metrics: After processing the request, the server updates its response time metrics, which are then used by the load balancer for future load balancing decisions.
Least Response Time algorithm

Example: A video streaming service uses the Least Response Time algorithm to direct users to the server with the fastest response time, ensuring that videos start quickly and minimize buffering times.


  • Accounts for server response times, improving user experience.
  • Considers both active connections and response times, providing effective load balancing.


  • Requires monitoring and tracking server response times and active connections, adding complexity.
  • May not factor in server health or varying capacities.
7. Random

The Random algorithm directs incoming requests to a randomly selected server from the available pool. This method can be useful when all servers have similar capacities and no session persistence is required.

Random algorithm

Example: A static content delivery network uses the Random algorithm to distribute requests for images, JavaScript files, and CSS style sheets among multiple servers. This ensures an even distribution of load and reduces the chances of overloading any single server.


  • Simple to implement and understand.
  • Can provide effective load distribution when servers have similar capacities.
  • Security systems that rely on detecting anomalies or implementing rate limiting (e.g., to mitigate DDoS attacks) might find it slightly more challenging to identify malicious patterns if a Random algorithm is used, due to the inherent unpredictability in request distribution. This could potentially dilute the visibility of attack patterns.


  • No consideration for server health, response times, or varying capacities.
  • May not be suitable for applications requiring session persistence.
8. Least Bandwidth

The “Least Bandwidth” algorithm is a method used in load balancing to distribute incoming network traffic among a set of servers or resources based on their available bandwidth. The aim is to direct requests to the server with the least currently utilized or allocated bandwidth, thereby optimizing the utilization of network resources and minimizing congestion. This approach helps to ensure that servers are not overwhelmed by network traffic.

Least Bandwidth algorithm

Example: A file hosting service uses the Least Bandwidth algorithm to direct users to the server with the lowest bandwidth usage, ensuring that servers with high traffic are not overwhelmed and that file downloads are fast and reliable.


  • Considers network bandwidth usage, which can be helpful in managing network resources.
  • Can provide effective load balancing when servers have varying bandwidth capacities.


  • Requires monitoring and tracking server bandwidth usage, adding complexity.
  • May not factor in server health, response times, or active connections.
9. Custom Load

A “Custom Load” algorithm refers to a load balancing method that is specifically designed or customized to meet the unique requirements or constraints of a particular system, application, or network environment. Unlike standard load balancing algorithms like Round Robin or Least Connections, a custom load algorithm is tailored to address specific performance, scalability, or resource utilization goals.


  • Highly customizable, allowing for tailored load balancing to suit specific use cases.
  • Can consider multiple factors, including server health, response times, and capacity.


  • Requires custom development and maintenance, which can be time-consuming and complex.
  • May require extensive testing to ensure optimal performance.

Example: An organization with multiple data centers around the world develops a custom load balancing algorithm that factors in server health, capacity, and geographic location. This ensures that users are directed to the nearest healthy server with sufficient capacity, optimizing user experience and resource utilization.

10. URL Hash load balancing method

URL hash load balancing, also known as content-based or request-based load balancing, is a method of distributing incoming requests based on specific attributes of the request, such as the URL.

Instead of distributing requests evenly across servers like traditional load balancing methods, URL hash load balancing routes requests to servers based on a hash value generated from attributes of the request, typically the URL.

The URL hash load balancing algorithm is similar to source IP hashing, except that the hash created is based on the URL in the client request. This ensures that client requests to a particular URL are always sent to the same back-end server.

That’s all about Load Balancing Algorithms in system design. The choice of load balancing algorithm among different load balancing algorithms depends on factors such as application requirements, system architecture, scalability goals, and performance objectives. Often, a combination of multiple algorithms or customized load balancing algorithms may be used to achieve optimal load distribution and resource utilization in complex environments.

If you have any queries or feedback, please write us at contact@waytoeasylearn.com. Enjoy learning, Enjoy system design..!!

Load Balancing Algorithms
Scroll to top