Scalability in Web Applications

This page introduces scalability and its importance in web applications. It also details good programming practices and aspects to be considered while increasing the scalability of a web application more scalable. Finally, the page briefly describes how to achieve scalability in a Rails application.

Introduction

Scalability is the ability of a system, network or process to handle an increase in load in a capable manner, or to expand to accommodate this increase. In a typical scenario, it can be understood as the ability of a system to perform extra work by the addition of resources (mostly hardware). It is generally difficult to define scalability precisely and, in any particular case, it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportional to the capacity added, is said to be a scalable system. A very good example of a scalable system would be the Domain Name System. The distributed nature of the Domain Name System allows it to scale well even when new systems are added all over the world daily.

Scalability in Web Applications

The scalability of a web application is its ability to handle an increase in the number of users or site traffic in a timely manner. Ideally, the experience of using the web application should be the same if the site has a single user or a million users. This kind of scalability is usually achieved with the addition of hardware in either a scaling up (vertical scaling) or a scaling out (horizontal scaling) manner. A typical web application follows a multi-layered architecture. The web application is considered to be scalable if each layer or component in its architecture is scalable. Fig 1 shows a web application which is able to linearly scale out with the addition of resources in its application layer and database layer.<ref>http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html</ref>

Need for scalability in web applications

In this internet age, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users i.e. traffic to the website increases. The server which was able to handle the initial load may not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which will affect the website in an economic and/or defamatory manner. It is, thus, important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For e.g., Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer Michael Jackson died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increases in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover through the addition of more resources to ensure that there is minimum, if not zero, downtime.

Good Programming Practices for Scalability

While some of the practices include following certain approaches during coding, many of the good practices are the outcomes of good architectural design for an application. Some of the good programming practices to support a scalable web application is listed below.

Database Indexes and Query Optimisation<ref>http://mrkirkland.com/prepare-for-web-application-scalability/</ref>

Not just with respect to scalability, but having indexes is a must in databases. Getting the indexes right is going to significantly improve performance. For example, consider a table ORDER which has a column, CustomerID referenced from another table CUSTOMER. ie CustomerID is a foregin key in the table ORDER. If the table does not have an index for CustomerID, then a query to retrieve the orders for a customer would require the entire ORDER table to be scanned to get the required Orders. This would be a serious performance hit and would be terrible when scaled, especially if getting the orders for a customer is a frequent operation. Having an index for CustomerID in ORDER table would prevent having to scan the entire ORDER table. Also, queries should take full advantage of the power of databases. For example, doing a loop to retrieve the rows of a table is a big NO. Instead, all DB operations must be set-based operations as much as possible and should be done in a single query.

Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>

It is good practice to make small isolated tasks and run them in parallel. For example, if the web application has a page to update information and a widget on the same page showing a live news feed, they should be run in parallel as both tasks are independent. Today, most machines have multi-core processors and have support for multi-threading which should be taken advantage of.

Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

It is a common practice to save data in server-side sessions between HTTP requests. This increases the per connection overhead and, hence, does not scale well. A good practice is to save as much data as possible on the client-side in cookies. Trust concerns can be mitigated by encrypting information. Only the required information should be sent to the server, purely on as much as required basis.

Use a modular design

There should be zero coupling between various independent components, such as :

Media server
DB server
Mail server
Front End (UI)
Back End etc.

Combining these logically separate entities into the same module is a recipe for disaster in terms of maintainability, debugging, re-usability, and so on.

Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

If the application receives a lot of data which must be stored but will not be used frequently, it is a good idea to compress and store the data. For example, old emails (older than 3 months) in a mail server. Since the chance of old emails being checked is low, this practice saves a lot of money in terms of disk storage costs, especially when the application supports a huge number of users. Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network.

Design around seek

IO operations incur a lot of overhead. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential rather than random locations. Caching data also helps to reduce the number of seeks required.

Non-programming factors affecting Scalability

In addition to the software aspects, there are certain hardware related considerations that have to be made for a scalable application. This section explains in detail about the hardware related factors.

Vertical Scaling

This is generally the the first as well as most intuitive approach to achieve scalability. Let us consider an analogy to understand this concept. Assume you have a 512MB RAM computer. Initially, when it was brand new and you had very few applications running on it, it was working fine. But once you install many applications and start multitasking your machine becomes slow and you regularly get “Application not responding” errors. The most obvious thing to do is upgrade your machine to have around 2GB of RAM instead of the previous 512 MB of RAM. Now, all applications run as smoothly as before. The same is done to scale web applications. Here, the bottleneck is your server. It can only accommodate a fixed number of requests per second. If the incoming traffic to your web servers increases, users will experience a lag while accessing your application. You go out and buy a web server which offers a higher performance than the previous one. This is called vertical scaling.

The obvious benefit of this approach is that it is so simple that it does not require complex changes to get it up and running. You simply port your code to the new machine and it will run fine.

The drawback of this approach is that eventually you will run out of resources to scale vertically. These resources could be money or even technological barriers. There may not be a machine developed yet which is as powerful as you currently need.

Horizontal Scaling

This approach comes in when you have reached a technical barrier to scale vertically or you do not have enough money to buy that powerful machine you need. This approach also has the same motive, to make sure that all machines are run below the max load so as to behave in an acceptable manner. But, instead of having a more powerful machine, this approach utilizes many smaller machines to achieve the same end result. This approach requires use of a load balancer to do the majority of the heavy lifting.

The advantage of this approach is that it is cost effective. However, it has a major disadvantage in that it complicates the system. Now, setting this up is complex and managing them is a major issue. You need to make sure that you split the traffic evenly among the servers so as to keep everyone running under max load. You now also need to worry about consistency in the system as you still need to make the multiple servers seem like a single logical system to the end user.

Caching

Caching can be a really effective technique if your application frequently deals with similar results/queries. The crux of this method is to cache data in an In-Memory cache. The cached data could be results of queries to the database or a collections of objects in the database. This saves repeated calls to the database, saving computation time as well as avoiding a database access bottleneck. There are a good number of tools like Memcached (http://memcached.org/) and Redis(http://redis.io/) available which help ease the process of utilizing this method.

The great part about this is that you can store any values that you want in these caches. It could be as simple as a leaderboard or as complex as a multi-table query. These caches are stored as Hashes (key-value pairs).

These are tested tools being used by some of the top web application product companies:

Youtube and Twitter use Memcached
GitHub, Pinterest and StackOverflow use Redis

One more advantage of these tools are that they are open source and have an active community support

Load Balancing

Let us assume your application has become an overnight success and you receive thousands of requests/sec. If you have a single server, it will become a bottleneck and the speed of your website will suffer. To overcome this, you add 2 more servers so that you can route incoming traffic into other web servers too. This looks like a really appealing solution until you run into the fundamental problem of how are you going to ensure that the incoming traffic is not unevenly distributed to the servers. In this scenario, the load balancer is an ideal solution. A load balancer is a software/hardware that is used to distribute the incoming traffic evenly among the web servers. The load balancer can utilize many different metrics to achieve this. It can use a simple round robin approach to cycle between the web servers. It could also use a more intelligent approach by checking the approximate load on each server and then using it to allocate web requests to web servers. Regardless of which method it uses, the overall effect is the speedup of applications and better response times.

A benefit of this approach is that you do not need to make public the internal structure of your web application network components such as IP address of each server. The load balancer can act as a single point of entry to the application.

There are a few caveats in this method though. We would like to maintain sessions in a single server i.e. If my first request is to server A, for the rest of the requests, it would be beneficial to use the same web server. For example, if I have been authenticated as a user in Web Server A and if my next request goes to Web Server B, then it may happen that my authentication object is not passed and the user is asked to authenticate himself/herself once again. In a more real-world scenarion, it would be faster to search all related tasks in a web application server than to call everything again in all the web servers.

Database Replication

Fig 7. DB Replication with multiple writes

In the earlier scenario, let us assume that you now have multiple servers but only a single database. Now, you are back in more or less the same situation that you were in earlier. Your rendering has become faster but the reads/writes are still a problem as there is just one database and the web servers have to wait for their turn to access it.

An obvious solution to this is to replicate the database. You now have multiple copies of the databases too, but it comes with its own set of difficulties. You will have to ensure that the data is consistent across the databases. Any writes that you have will be written back to all the databases. This will be a time-consuming as well as complex task. One can argue that instead of writing to all the databases, any web request should write to a single database and let it propagate the changes to the others. This is a feasible solution if your application is Read-heavy. If it is a Write heavy application, we can use another approach known as database partitioning, detailed in the next section.

Fig 8. DB Replication with single writes

Database partitioning

From the previous section, we know that database partitioning will not be useful in case of a write heavy web application. There are a couple of techniques that form a part of this general umbrella of database partitioning techniques. One way is to maintain separate databases for separate kinds of data eg. separate media database, separate user profile database etc.

Another widely used approach is sharding. In this technique, you split the data horizontally. i.e. , instead of separating columns of data alone, you split the data rows / tuples. The main goal behind this step is to convert the large , bulky database into smaller, manageable databases which do not have an inter dependency on each other. For example, you could split a customers database geographically, so, a user who is a part of the Asia region will be stored separately from an American user.

The following are the advantages of this technique:

High availability - Only a part of the application is affected if one of the databases go down.
Faster queries - Since each database has less amount of data, queries take lesser time.
More write bandwidth - Since data is separate, you could write in multiple databases at the same time.

Scalability in Rails applications

Ruby on Rails(RoR), simply referred to as Rails, is a very elegant open source web application framework for creating the full stack by emphasizing on the use of well-known software engineering patterns and paradigms.<ref> http://guides.rubyonrails.org/getting_started.html#what-is-rails-questionmark</ref>. Many high-profile web firms use Ruby on Rails to build agile, scalable web applications. Some of the largest sites running Ruby on Rails include GitHub, Yammer, Scribd, Shopify, Hulu, and Basecamp<ref> http://articles.businessinsider.com/2011-05-11/tech/30035869_1_ruby-rails-custom-software</ref>. As of May 2014, it is estimated that more than 600,000 web sites are running Ruby on Rails.<ref><http://trends.builtwith.com/framework/Ruby-on-Rails></ref>

When Twitter had some significant downtimes in early 2008, a widespread myth started about Rails, ‘Rails doesn’t scale’<ref> http://techcrunch.com/2008/05/01/twitter-said-to-be-abandoning-ruby-on-rails></ref><ref>http://canrailsscale.com/></ref>. However, when Twitter’s lead architect, Blaine Cook held Ruby blameless<ref> http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster</ref>, the developer community came to the general conclusion that it is not the languages or frameworks that is a bottleneck to scaling, but the architecture. Therefore, Rails does scale well, but it is dependent on the condition that it has been architected to scale well.

Even though it applies to web applications in general, the following sections describe practices which help in the development of scalable Rails applications.

Optimizing the application

In spite of the fact that Ruby has a lot of limitations in terms of lack of true support for multithreading, a lot of scaling issues can be solved by optimizing certain ways in which operations are done.

Scaling Reads by Precomputing<ref>http://devblog.moz.com/2011/09/scaling-the-f-out-of-a-rails-app/</ref>

Often, because of the quick nature of development of Rails applications, developers tend to miss anticipating operations which might require a lot of time to compute as data grows in the future. A typical example of such an operation would be an aggregate view which is generated based on usage in the past. At some point, calculating the aggregate view would take a lot of time and would result in a long time to render UI controls like graphs if it is done on-demand. To solve such issues, it is a good idea to precompute the values, frequency depending on the input volume, and store the results. Rendering would eventually be reduced to either displaying the data or doing minimal calculation so that the end-user does not have to wait long for results.

Cache expensive operations

Sometimes it is not possible to pre-compute for every possible set of operation. However, if such operations are going to be used frequently in a short period of time, it is a good idea to cache the first result so that subsequent results can be displayed faster. A typical example<ref>https://speakerd.s3.amazonaws.com/presentations/50044249a555b00002037c2c/Fast.pdf</ref> would look like:

Caching in Rails.rb (Slower first time)

  Benchmark.measure do
    Rails.cache.fetch(“cache_key”) do 
      User.some_expensive_query
    end
  end.real
  => 20.5s

Caching in Rails 2.rb (Super fast after that)

  Benchmark.measure do
    Rails.cache.fetch(“cache_key”) do 
      User.some_expensive_query
    end
  end.real
  => 0.00001s

Make the databases queries faster<ref>http://rakeroutes.com/blog/increase-rails-performance-with-database-indexes/</ref>

Database accesses is a significant factor affecting the throughput of a web application. Some of the common methods to make database IO faster are:

Indexing

DB reads can increase significantly by the addition of database indexes. Indexes are helpful when a search has to be made for an attribute other than id. For example, a query in the Products table,

  products.where(:name => :shampoo)

would result in the query having to look at every row of the table and see if the name of the product is shampoo. This unnecessarily expensive operation can be speeded up by the addition of an index as below

  class AddNameIndexToProducts < ActiveRecord::Migration
    def change
      add_index: :products, :name
    end
  end

This eliminates the need to look-up every row as all the products with name ‘shampoo’ would have references stored together and thus can be quickly accessed. Another common example of indexes is in join statements. i.e if the design has belongs_to and has_many relationship. Rails does not create indexes by default for the foreign keys. Creation of an index similar to the one shown above in the table for the foreign key would also result in a significant improvement for join queries. In Ruby 3.2, the automated explain plan<ref>http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain</ref> was introduced which writes out warnings in application logs when the query takes longer than 0.5s(by default). It is a useful tool to analyze the performance of queries and can give important pointers to identify where indexes should be added. In most web applications, the percentage of reads is significantly higher than writes and hence they can really benefit by using indexes. However, if a particular table has mostly writes then it would not be a good idea to have indexes for the table as having the index causes extra work while inserting and updating entries.

Eliminating n+1 queries

n+1 queries are solved by using eager load. Consider the following example<ref>http://guides.rubyonrails.org/active_record_querying.html</ref>,

  users = User.limit(20)
  
  users.each do |user|
    puts user.address.city
  end

Assuming users and address are different tables, the above set of statements would result in 21 queries. One for retrieving 20 users and then twenty queries for getting the city of each user. This is known as the n+1 problem. This solution can be mitigated by using an eager load. In this case, we can load the addresses along with user with a statement like this.

  users = User.includes(:address).limit(20)

  users.each do |user|
    puts user.address.city
  end

This code will only require 2 queries, one to get the users and one to get their associated addresses. This is known as eager loading where the data is loaded in advance. However, this should be used only when it is absolutely require. Unused eager loading might burden the IO bandwidth by retrieving unnecessary data. Gems like Bullet can be used analyze database queries and detect n+1 queries and unused eager loading.

Architecture

Irrespective of the efficiency of the code, at some point, the hardware resources reach the threshold of the maximum load it can take as the application goes popular. It is thus essential to design applications in a way that the application can take advantage of the additional resources added to it. The most essential to support this requirement is to write applications in a modularized way. Thus, a typical Rails application should have the following three parts as separate components:

Web: The front end or the web module should run Ruby code. Its sole purpose is to handle requests and/or render UI to the user to display information
Worker:

The worker module performs compute-intensive operations or other background jobs such as sending emails or running Resque which is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.

Datastores:

Datastores persist the data. Typically, in the form of databases, data stores should exist on separate machines.

A modularized architecture like this can be horizontally scaled or vertically scaled depending on which module needs to have the performance increased.

Application Deployment

At a future stage, when the user base grows beyond a certain threshold, it is a good idea to add load balancers to distribute the incoming requests. A very popular deployment model is the use of Unicorn Application Servers and Nginx Load Balancers.<ref>https://www.digitalocean.com/community/tutorials/how-to-scale-ruby-on-rails-applications-across-multiple-droplets-part-1</ref>

Unicorn Application Server<ref>https://github.com/blog/517-unicorn</ref>

Unicorn is an application server which runs Rails applications to process incoming requests. These application servers are responsible for handling only those requests that require processing once they have been filtered and pre-processed by the front-facing Nginx servers. Unicorn is simplistic in design by handling what needs to be done by a web application and it delegates the rest of the responsibilities to the operating system.

Unicorn's master process spawns slave processes to serve requests. This master process also monitors other processes to prevent memory and process related staggering issues.

Nginx HTTP Server / Reverse-Proxy / Load-Balancer<ref>http://nginx.org/en/docs/http/load_balancing.html</ref>

Nginx HTTP server is a multi-purpose, front-facing web server. It acts as the first entry point for requests and is capable of balancing connections and dealing with certain exploit attempts. It filters out the invalid requests and forwards the valid requests to the Unicorn Application servers to be processed. Together, they serve well to make Rails applications scalable.

References

CSC/ECE 517 Fall 2014/ch1a 20 rn

Contents

Introduction

Scalability in Web Applications

Need for scalability in web applications

Good Programming Practices for Scalability

Database Indexes and Query Optimisation<ref>http://mrkirkland.com/prepare-for-web-application-scalability/</ref>

Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>

Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

Use a modular design

Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

Design around seek

Non-programming factors affecting Scalability

Vertical Scaling

Horizontal Scaling

Caching

Load Balancing

Database Replication

Database partitioning

Scalability in Rails applications

Optimizing the application

Scaling Reads by Precomputing<ref>http://devblog.moz.com/2011/09/scaling-the-f-out-of-a-rails-app/</ref>

Cache expensive operations

Make the databases queries faster<ref>http://rakeroutes.com/blog/increase-rails-performance-with-database-indexes/</ref>

Architecture

Application Deployment

Unicorn Application Server<ref>https://github.com/blog/517-unicorn</ref>

Nginx HTTP Server / Reverse-Proxy / Load-Balancer<ref>http://nginx.org/en/docs/http/load_balancing.html</ref>

References

Navigation menu

CSC/ECE 517 Fall 2014/ch1a 20 rn

Introduction

Scalability in Web Applications

Need for scalability in web applications

Good Programming Practices for Scalability

Database Indexes and Query Optimisation<ref>http://mrkirkland.com/prepare-for-web-application-scalability/</ref>

Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>

Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

Use a modular design

Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

Design around seek

Non-programming factors affecting Scalability

Vertical Scaling

Horizontal Scaling

Caching

Load Balancing

Database Replication

Database partitioning

Scalability in Rails applications

Optimizing the application

Scaling Reads by Precomputing<ref>http://devblog.moz.com/2011/09/scaling-the-f-out-of-a-rails-app/</ref>

Cache expensive operations

Make the databases queries faster<ref>http://rakeroutes.com/blog/increase-rails-performance-with-database-indexes/</ref>

Architecture

Application Deployment

Unicorn Application Server<ref>https://github.com/blog/517-unicorn</ref>

Nginx HTTP Server / Reverse-Proxy / Load-Balancer<ref>http://nginx.org/en/docs/http/load_balancing.html</ref>

References

Navigation menu

Search