When you hear the term load balancing, this can refer to many different things such as routers, TCP, UDP or http requests. Today we gonna discuss about the latter. Load balancing for http requests is the process of forwarding traffic as evenly as possible among instances of your server pool. Have in mind that your machines may vary in size, load or geographic region even so that’s not an easy task!
So, which are the key benefits of a load balancer?
Lets say that your data centers are divided between different areas and a flood happens to one of them. Obviously you don’t want some of your requests to be served and others to return a 5XX error. In this case load balancer comes to the rescue as it directs traffic only to running instances.
Provides an abstraction layer
When a client interacts with your application the internals of your hardware structure are hidden. Your disaster servers, cloud infarstracture etc. are not accessible to the outside world giving you the opportunity to add/remove hardware at will without notice. What is accessible though is a public IP where everyone interacts with your system. Simply it provides an abstraction layer from your internals.
They provide better performance as the incoming traffic is divided across different servers thus the load is reduced and performance is improved. You can either have hardware load balancers, software ones like nginx, virtuals or even do DNS load balancing.
Now, there are many ways to determine which server node is the most appropriate to route traffic to at any given instance. First you do have to determine their state. This is done with the so called health checks which provide information about the latency, load and other important info. Common algorithms which are used for http load balancers are:
This is a good option if all of your servers have the same size, but won’t work well with different sizes as lower spec machines will receive the same amount of traffic as bigger ones. There is a variation of this algorithm, where weights are put on each machine. Meaning that
server Awill have weight 3 and
server Bwill have 1 because its much smaller. Proportionally if your load balancer receives 40 requests almost 30 will go to server A and only 10 to server B. This is called the weighted Round Robin.
The traffic is routed to the server with the least amount of active connections. Same to the previous algorithm, if the size of your servers vary, you can have a weighted least connections algorithm.
Requests are divided randomly among the server cluster. As you can imagine this can cause big stress to some servers, since the load they have at any given instant is not taken into account.
Unique source hash
Hashing is applied to client’s IP address and the unique generated key is mapped to a particular server. This way we can always make sure that a specific client will always be served by the same server. The hash can be calculated from a combination of source IP, port and is stored in cache.
Checks the response time of a server and if it is low then it has limited stress and thus can accept more traffic.
Traffic is directed to the node with the least amount of bandwidth.
Now, in a stateless environment there is no problem with how client requests are served. If client A makes a request, it doesn’t matter which server will provide the response. But in a stateful environment things get a bit tricky. This case is called sticky notes/session persistence.
Lets say you have a stateful request and it is a simple registration form with three pages. Prior to proceeding to pages 2 you must complete page 1 and prior to proceeding to page 3 you must complete page 2. Now, there are two ways to accomplish session persistence. First, by mapping the relationship between source IP address and server, all future transactions can be routed appropriately. The second option is to use a cookie on client’s machine and refer to it prior to any requests. In both cases of course you must also determine the life span of your sessions.