Abstract
A decade or so ago, setting up online businesses called for upfront investments in hardware and
software. It also required regular expenses in, maintenance, upgrades etc. However, with the advent
of cloud computing, this infrastructure was now able to be “rented” from a shared pool of resources
which was present remotely. This was much cheaper than investing in one’s own hardware/software and
provided many other advantages such as scalability, reliability, maintenance etc. Today, any new online
business can launch its application with minimal costs, just by renting servers on a cloud.
Load balancing means optimizing the distribution of workloads across multiple computing resources,
to maximize the efficiency of each resource. Load balancing can be done at different layers of the stack,
like application, network, CPU, disk etc. but in this thesis, we focus on improving it at the application
layer and the network layer.
We first look at an example of load balancing at the application layer. Most applications on the
cloud are made to be horizontally scalable, i.e. addition of more number of servers would enable the
application to handle more load. However, this also causes an increase in the costs to the business, for
renting more number of servers. Therefore, it is important to distribute the load among servers smartly, so that a smaller number of servers are able to handle a given load. This requires good data distribution strategies and load balancing algorithms which adapt to the requests and the kind of data being accessed. Today, most of the companies which work with BigData use NoSQL databases to store huge volumes of data. They prefer NoSQL over relational databases because NoSQL databases are horizontally scalable,
and have much better read-write performance due to the lack of a schema. Distributed Key-value
stores are a type of NoSQL database which is used heavily in most applications, for storing billions of
key-value pairs. Key-value stores are required to have a very high throughput as the applications using
them need to perform millions of queries/updates per second. Having good load balancing algorithms
helps in achieving a better throughput using the same number of servers. We analyze, compare and look
at trade-offs in different load distribution schemes for consistent hashing in distributed key-value stores.
Different traffic patterns, including an adversarial pattern, were constructed to test the load distribution. A simulator was made for each load distribution scheme and the load on each node was recorded. It was shown that increasing the number of consistent hash rings doesn’t change the load distribution characteristics in simple traffic scenarios, but in the case of adversarial traffic, increasing the number of hash rings leads to an improvement of around 15% in the load distribution statistics.
Secondly, we look at an example of load balancing at the network layer. Nowadays, large datacenters
are moving from a traditional network to a software-defined network. Software Defined Networking(
SDN) lets the network administrator create his own software for defining rules to create flow routes.
Since the network administrator knows about the topology and other specifications of the network, it is
possible to make packets take different routes based on the various requirements, by just programming
the OpenFlow controller. There are different kinds of applications with different requirements in a datacenter.
Some require low latency and are not sensitive to bandwidth, others require low packet loss etc.
Thus, each application has its own requirements profile. For example, a gaming server hosted on the
cloud would require extremely low latency, but would not care about bandwidth. Video streaming services,
on the other hand, are sensitive to bandwidth and packet loss. In the existing networks, it is very
hard to control the routes that the packets take. We create an OpenFlow controller, which uses the requirements
profile of each application using the network and tries to route traffic to optimize the network
parameters that the application is most sensitive to. The controller is able to adapt to changing network
conditions, and intelligently updates existing routes to accommodate for the network disruptions. We
test it in different scenarios with different application requirements and study the improvements that
the new controller gives over the existing one. We simulate and show that file transfer of 10MB on an
unstable network took 168 seconds using a simple controller, and was optimized to 88 seconds using our
controller. We also simulate video streaming on an unstable network and show that our controller adapts
to these fluctuations in the network and pro-actively re-routes traffic, to make the stream non-lossy.
We also looked at how we can optimize migration of virtual machines for load balancing by leveraging
the knowledge of migration, on t