CSC/ECE 517 Fall 2014/ch1a 20 rn: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(31 intermediate revisions by the same user not shown)
Line 6: Line 6:


=Introduction=
=Introduction=
Scalability is the ability of a system, network or process to handle an increase in load in a capable manner or to expand to accommodate this increase. In a typical scenario, it can be understood as the ability of a system to perform extra work by the addition of resources (mostly hardware). It is generally difficult to define scalability precisely and in any particular case it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a '''scalable system'''. A very good example of a scalable system would be the Domain Name System. The distributed nature of the Domain Name System allows it to scale well even when new systems are added all over the world daily.
Scalability is the ability of a system, network or process to handle an increase in load in a capable manner, or to expand to accommodate this increase. In a typical scenario, it can be understood as the ability of a system to perform extra work by the addition of resources (mostly hardware). It is generally difficult to define scalability precisely and, in any particular case, it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportional to the capacity added, is said to be a '''scalable system'''. A very good example of a scalable system would be the Domain Name System. The distributed nature of the Domain Name System allows it to scale well even when new systems are added all over the world daily.
==Scalability in Web Applications==
==Scalability in Web Applications==
Scalability of a web application is its ability to handle an increase in number of users or site traffic in a timely manner. Ideally, the experience in using the web application should be the same if the site has a single user or a million user. This kind of scalability is usually achieved with the addition of hardware in either a scaling up (vertical scaling) or a scaling out (horizontal scaling) manner.   
The scalability of a web application is its ability to handle an increase in the number of users or site traffic in a timely manner. Ideally, the experience of using the web application should be the same if the site has a single user or a million users. This kind of scalability is usually achieved with the addition of hardware in either a scaling up (vertical scaling) or a scaling out (horizontal scaling) manner.   
A typical web application follows a multilayered architecture. The web application is considered to be scalable if each layer or component in its architecture is scalable. Fig 1 shows a web application which is able to linearly scale out with the addition of resources in its application layer and database layer.<ref>http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html</ref>  
A typical web application follows a multi-layered architecture. The web application is considered to be scalable if each layer or component in its architecture is scalable. Fig 1 shows a web application which is able to linearly scale out with the addition of resources in its application layer and database layer.<ref>http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html</ref>  
[[File:Scalable_Architecture.jpg|frame|Fig 1:A web application scaling out]]
[[File:Scalable_Architecture.jpg|frame|Fig 1:A web application scaling out]]


=Need for scalability in web applications=
=Need for scalability in web applications=
In this age ruled by the internet, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users ie traffic to the web site increases. The server which was able to handle the initial load might not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which would affect the web site in an economic and/or defaming manner. It is thus important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For eg. Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer [http://en.wikipedia.org/wiki/Michael_Jackson Michael Jackson] died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increase in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover by the addition of more resources to ensure that there is minimum downtime, if not zero.
In this internet age, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users i.e. traffic to the website increases. The server which was able to handle the initial load may not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which will affect the website in an economic and/or defamatory manner. It is, thus, important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For e.g., Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer [http://en.wikipedia.org/wiki/Michael_Jackson Michael Jackson] died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increases in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover through the addition of more resources to ensure that there is minimum, if not zero, downtime.


=Good Programming Practices for Scalability=
=Good Programming Practices for Scalability=
Line 23: Line 23:


==Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>==
==Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>==
It is a good practice to make isolate small tasks and run them in parallel. For example, if the web application has a page to update information and a widget on the same page showing a live news feed, they should be running in parallel as both these tasks are independent. Most machines these days have multi-core processors and have support for multi-threading which should be taken advantage of.  
It is good practice to make small isolated tasks and run them in parallel. For example, if the web application has a page to update information and a widget on the same page showing a live news feed, they should be run in parallel as both tasks are independent. Today, most machines have multi-core processors and have support for multi-threading which should be taken advantage of.  
==Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>==
==Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>==
It is  a common practice to save data in server-side sessions between HTTP requests. This increases the per connection overhead and hence does not scale well. A good practice is to save as much data as possible on the client side in cookies. Trust concerns can be mitigated by encrypting the information. Only the required information should be sent to the server, purely on as much as required basis.
It is  a common practice to save data in server-side sessions between HTTP requests. This increases the per connection overhead and, hence, does not scale well. A good practice is to save as much data as possible on the client-side in cookies. Trust concerns can be mitigated by encrypting information. Only the required information should be sent to the server, purely on as much as required basis.
==Use a modular design==
==Use a modular design==
There should be zero coupling between different independent components such as :  
There should be zero coupling between various independent components, such as :  
*Media server.
*Media server
*DB server
*DB server
*Mail server
*Mail server
*Front End(UI)
*Front End (UI)
*Back End etc.  
*Back End etc.  
Combining these logically separate entities into the same module is a recipe for disaster in terms of maintainability, debugging, reusability etc.
Combining these logically separate entities into the same module is a recipe for disaster in terms of maintainability, debugging, re-usability, and so on.


==Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>==
==Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>==
If the application receives a lot of data which has to be stored but which wouldn’t be used frequently, it is a good idea to compress and store the data. For example, old emails(say >3 months) in a mail server. Since the chance of old emails being checked is low, this practice saves a lot of money in terms of disk storage costs, especially when the application is supporting a huge number of users.  
If the application receives a lot of data which must be stored but will not be used frequently, it is a good idea to compress and store the data. For example, old emails (older than 3 months) in a mail server. Since the chance of old emails being checked is low, this practice saves a lot of money in terms of disk storage costs, especially when the application supports a huge number of users.  
Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network.
Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network.
==Design around seek==
==Design around seek==
A lot of overhead is incurred because of IO operations. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential locations instead of random locations. Caching data also helps to reduce the number of seeks needed.
IO operations incur a lot of overhead. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential rather than random locations. Caching data also helps to reduce the number of seeks required.


=Web technology agnostic Scalability=
=Non-programming factors affecting Scalability=
In addition to the software aspects, there are certain hardware related considerations that have to be made for a scalable application. This section explains in detail about the hardware related factors.
In addition to the software aspects, there are certain hardware related considerations that have to be made for a scalable application. This section explains in detail about the hardware related factors.
==Vertical Scaling==
==Vertical Scaling==
[[File:Scaling_up.png|frame|center|Fig 2. Vertical Scaling]]
This is generally the the first as well as most intuitive approach to achieve scalability. Let us consider an analogy to understand this concept. Assume you have a 512MB RAM computer. Initially, when it was brand new and you had very few applications running on it, it was working fine. But once you install many applications and start multitasking your machine becomes slow and you regularly get “Application not responding” errors. The most obvious thing to do is upgrade your machine to have around 2GB of RAM instead of the previous 512 MB of RAM. Now, all applications run as smoothly as before. The same is done to scale web applications. Here, the bottleneck is your server. It can only accommodate a fixed number of requests per second. If the incoming traffic to your web servers increases, users will experience a lag while accessing your application. You go out and buy a web server which offers a higher performance than the previous one. This is called vertical scaling.
This is generally the the first as well as most intuitive approach to achieve scalability. Let  
us take an analogy to understand this concept. let’s say you have a 512MB RAM computer. Initially when it was brand new and you had very few applications running on it, it was working fine. But now, you’ve started installing many applications as well as started multitasking with many applications running together. this has caused your machine to become laggy and you regularly get “Application not responding” errors. the obvious thing that you’ll do is upgrade your machine to have lets say 2GB of RAM instead of the previous 512 MB of RAM. Now, all applications run as smoothly as before. the same is done to scale web applications. Here, the bottleneck is your server. It can only accommodate a fixed number of requests per second. if you start getting more traffic to your web servers, the users will experience lags while accessing your application. So, now you go out and buy a web server which offers a higher performance than the previous one. This is called vertical scaling.


The obvious benefit of this approach is that it is so simple that it doesn’t require any complex changes to be made in order to get it up and running. You simply port your code to the new machine and it will run fine
The obvious benefit of this approach is that it is so simple that it does not require complex changes to get it up and running. You simply port your code to the new machine and it will run fine.


the drawback of this approach is that eventually you will run out of resources to scale vertically. these resources could either be Money or even technological barriers. there may not be a machine developed yet which is as powerful as you currently need.
The drawback of this approach is that eventually you will run out of resources to scale vertically. These resources could be money or even technological barriers. There may not be a machine developed yet which is as powerful as you currently need.
[[File:Scaling_up.png|frame|center|Fig 2. Vertical Scaling]]


==Horizontal Scaling==
==Horizontal Scaling==
[[File:Scaling_out.png|frame|center|Fig 3. Horizontal Scaling]]
This approach comes in when you have reached a technical barrier to scale vertically or  
This approach comes in when you have reached a technical barrier to scale vertically or  
you do not have enough money to buy that powerful machine you need. This approach also has the same motive, to make sure that all machines are run below the max load so as to behave in an acceptable manner. But, instead of having a more powerful machine, this approach utilizes many smaller machines to achieve the same end result. This approach requires use of a load balancer to do the majority of the heavy lifting.  
you do not have enough money to buy that powerful machine you need. This approach also has the same motive, to make sure that all machines are run below the max load so as to behave in an acceptable manner. But, instead of having a more powerful machine, this approach utilizes many smaller machines to achieve the same end result. This approach requires use of a load balancer to do the majority of the heavy lifting.  


the advantage of this approach is that it is cost effective. but, it has a major disadvantage in that it complicates the system. Now, setting this up is complex and managing them is a major issue. Now you need to make sure that you split the traffic evenly among the servers so as to keep everyone running under max load. You now also need to worry about consistency in the system as you still need to make the multiple servers seem like a single logical system to the end user
The advantage of this approach is that it is cost effective. However, it has a major disadvantage in that it complicates the system. Now, setting this up is complex and managing them is a major issue. You need to make sure that you split the traffic evenly among the servers so as to keep everyone running under max load. You now also need to worry about consistency in the system as you still need to make the multiple servers seem like a single logical system to the end user.
[[File:Scaling_out.png|frame|center|Fig 3. Horizontal Scaling]]


==Caching==
==Caching==
[[File:Caching.jpg|frame|center|Fig 4. Caching: Redis Data Example]]
Caching can be a really effective technique if your application frequently deals with similar results/queries. The crux of this method is to cache data in an In-Memory cache. The cached data could be results of queries to the database or a collections of objects in the database. This saves repeated calls to the database, saving computation time as well as avoiding a database access bottleneck. There are a good number of tools like Memcached (http://memcached.org/) and Redis(http://redis.io/) available which help ease the process of utilizing this method.
This could be a really effective technique if your application deals with more or less similar results / queries most of the time. the crux of this method is to cache the data in an In-Memory cache. this data could be results of the queries to the database or a collections of objects in the database. This saves repeated calls to the database thus saving computation time as well as database access bottleneck. There are a good number of tools like Memcached (http://memcached.org/) and Redis(http://redis.io/) available which help ease the process of utilizing this method


The great part about this is that you can store any values that you want in these caches. It could be as simple as a leaderboard or as complex as a multi - table query. These caches are stored as Hashes (key-value pairs).
The great part about this is that you can store any values that you want in these caches. It could be as simple as a leaderboard or as complex as a multi-table query. These caches are stored as Hashes (key-value pairs).


These are tested tools being used by some of the top web application product companies (Youtube, Twitter user Memcached ; GitHub, Pinterest, StackOverflow use Redis)
These are tested tools being used by some of the top web application product companies:
* Youtube and Twitter use Memcached
* GitHub, Pinterest and StackOverflow use Redis


One more advantage of these tools are that they are open source and have an active community support
One more advantage of these tools are that they are open source and have an active community support
[[File:Caching.jpg|frame|center|Fig 4. Caching: Redis Data Example]]


==Load Balancing==
==Load Balancing==
[[File:Load_balancing.png|frame|center|Fig 5. Load Balancing]]
Let us assume your application has become an overnight success and you receive thousands of requests/sec. If you have a single server, it will become a bottleneck and the speed of your website will suffer. To overcome this, you add 2 more servers so that you can route incoming traffic into other web servers too. This looks like a really appealing solution until you run into the fundamental problem of how are you going to ensure that the incoming traffic is not unevenly distributed to the servers.  
Let us assume your application become an overnight success and you are getting thousands of requests / sec. Now, if you have just one server, it will become a bottleneck and your site speed will suffer,. to overcome this, you added 2 more servers so that you can route incoming traffic into other web servers too. This looks like a really appealing solution until you run into the fundamental problem of how are you going to make sure that the incoming traffic is not unevenly distributed to the servers.  
In this scenario, the load balancer is an ideal solution. A load balancer is a software/hardware that is used to distribute the incoming traffic evenly among the web servers.  The load balancer can utilize many different metrics to achieve this. It can use a simple round robin approach to cycle between the web servers. It could also use a more intelligent approach by checking the approximate load on each server and then using it to allocate web requests to web servers. Regardless of which method it uses, the overall effect is the speedup of applications and better response times.
You would be using the load balancer in this case. A load balancer is a software / hardware that is used to distribute the incoming traffic evenly among the web servers.  The load balancer can utilize many different metrics to achieve this.It could use a simple round robin approach cycling between the web servers. It could also use a more intelligent approach by checking the approximate load on each server and then using it to allocate web requests to web servers. whichever way it uses, the overall effect is the speedup of applications and better response times


Some of the benefits of this approach is that you do not need to make public the internal structure of your web application network components like IP address of each server. the load balancer can act as a single point of entry to the application.
A benefit of this approach is that you do not need to make public the internal structure of your web application network components such as IP address of each server. The load balancer can act as a single point of entry to the application.
      
      
There are a few caveats in this method though. We would like to maintain sessions in a single server i.e. If my first request if to server A, for the rest of the requests, it would be beneficial  to use the same web server. For example, If i have authenticated as a user in web server A and if in my next request goes to web server B, then it may happen that my authentication object is not passed and the user is asked to authenticate itself once again.in a more believable scenarion, it would be faster to search all related tasks in a web application server than to call everything again in all the web servers.
There are a few caveats in this method though. We would like to maintain sessions in a single server i.e. If my first request is to server A, for the rest of the requests, it would be beneficial  to use the same web server. For example, if I have been authenticated as a user in Web Server A and if my next request goes to Web Server B, then it may happen that my authentication object is not passed and the user is asked to authenticate himself/herself once again. In a more real-world scenarion, it would be faster to search all related tasks in a web application server than to call everything again in all the web servers.
[[File:Load_balancing.png|frame|center|Fig 5. Load Balancing]]


==Database Replication==
==Database Replication==
[[File:DB_Replication.png|frame|center|Fig 7. DB Replication with multiple writes]]
[[File:DB_Replication.png|frame|center|Fig 7. DB Replication with multiple writes]]
In the same scenario as before, let us assume that now you have multiple servers but you only have one database. Now, you’re back at more or less the same position that you are in as before. Your rendering has become faster but the reads / writes are still a problem as there is just one database and the web servers need to wait for their turn to access it.  
In the earlier scenario, let us assume that you now have multiple servers but only a single database. Now, you are back in more or less the same situation that you were in earlier. Your rendering has become faster but the reads/writes are still a problem as there is just one database and the web servers have to wait for their turn to access it.  
      
      
An obvious solution to this is to replicate the database so now you’ll have multiple copies of the databases too. but it comes with its own set of difficulties. You now need to make sure that the data is consistent across the databases. So any writes that you have will be written back to all the databases. This will be a time consuming as well as a complex task. one could argue that instead of having writes to all databases, any web request should write to just one of the database and let it propagate the changes to the others. This is a feasible solution if your application is read heavy. if it is write heavy, there is another approach known as database partitioning as we will see in the next section.
An obvious solution to this is to replicate the database. You now have multiple copies of the databases too, but it comes with its own set of difficulties. You will have to ensure that the data is consistent across the databases. Any writes that you have will be written back to all the databases. This will be a time-consuming as well as complex task. One can argue that instead of writing to all the databases, any web request should write to a single database and let it propagate the changes to the others. This is a feasible solution if your application is Read-heavy. If it is a Write heavy application, we can use another approach known as database partitioning, detailed in the next section.
[[File:DB_Replication2.png|frame|center|Fig 8. DB Replication with single writes]]
[[File:DB_Replication2.png|frame|center|Fig 8. DB Replication with single writes]]


==Database partitioning==
==Database partitioning==
[[File:DB_Partitioning.png|frame|center|Fig 9. DB Partitioning]]
From the previous section, we know that database partitioning will not be useful in case of a write heavy web application. There are a couple of techniques that form a part of this general umbrella of database partitioning techniques. One way is to maintain separate databases for separate kinds of data eg. separate media database, separate user profile database etc.  
From the previous section, we know that database partitioning will be a useful incase of a write heavy web application. There are a couple of techniques that form a part of this general umbrella of database partitioning techniques. one way is to maintain separate databases for separate kinds of data eg. separate media database, separate user profile database et. al.  


Another widely used approach is sharding. In this technique, you split the data horizontally.  i.e. , instead of separating columns of data alone, you split the data rows / tuples. The main goal behind this step is to convert the large , bulky database into smaller, manageable databases which do not have an interdependency on each other. for eg, you could split a customers database geographically, so, a user who is a part of the asia region will be stored separately from an american user.
Another widely used approach is sharding. In this technique, you split the data horizontally.  i.e. , instead of separating columns of data alone, you split the data rows / tuples. The main goal behind this step is to convert the large , bulky database into smaller, manageable databases which do not have an inter dependency on each other. For example, you could split a customers database geographically, so, a user who is a part of the Asia region will be stored separately from an American user.


The following are the advantages of this technique:
The following are the advantages of this technique:
High availability - only part of the application is affected if one of the databases go down
*High availability - Only a part of the application is affected if one of the databases go down.
Faster queries - since each database has less amount of data, queries take lesser time
*Faster queries - Since each database has less amount of data, queries take lesser time.
More write bandwidth - since data is separate, you could write in multiple databases at the same time
*More write bandwidth - Since data is separate, you could write in multiple databases at the same time.
[[File:DB_Partitioning.png|frame|center|Fig 9. DB Partitioning]]


=Scalability in Rails applications=
=Scalability in Rails applications=
Line 178: Line 180:
The worker module performs compute-intensive operations or other background jobs such as sending emails or running [https://github.com/resque/resque Resque] which is a [http://redis.io/ Redis]-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.
The worker module performs compute-intensive operations or other background jobs such as sending emails or running [https://github.com/resque/resque Resque] which is a [http://redis.io/ Redis]-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.
*'''Datastores''':  
*'''Datastores''':  
Datastores persist the data. Typically, in the form of databases, datastores should exist on separate machines.  
Datastores persist the data. Typically, in the form of databases, data stores should exist on separate machines.  


A modularized architecture like this can be horizontally scaled or vertically scaled depending on which module needs to have the performance increased.
A modularized architecture like this can be horizontally scaled or vertically scaled depending on which module needs to have the performance increased.

Latest revision as of 02:41, 26 September 2014

Scalability in Web Applications

This page introduces scalability and its importance in web applications. It also details good programming practices and aspects to be considered while increasing the scalability of a web application more scalable. Finally, the page briefly describes how to achieve scalability in a Rails application.

Topic Document

Introduction

Scalability is the ability of a system, network or process to handle an increase in load in a capable manner, or to expand to accommodate this increase. In a typical scenario, it can be understood as the ability of a system to perform extra work by the addition of resources (mostly hardware). It is generally difficult to define scalability precisely and, in any particular case, it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportional to the capacity added, is said to be a scalable system. A very good example of a scalable system would be the Domain Name System. The distributed nature of the Domain Name System allows it to scale well even when new systems are added all over the world daily.

Scalability in Web Applications

The scalability of a web application is its ability to handle an increase in the number of users or site traffic in a timely manner. Ideally, the experience of using the web application should be the same if the site has a single user or a million users. This kind of scalability is usually achieved with the addition of hardware in either a scaling up (vertical scaling) or a scaling out (horizontal scaling) manner. A typical web application follows a multi-layered architecture. The web application is considered to be scalable if each layer or component in its architecture is scalable. Fig 1 shows a web application which is able to linearly scale out with the addition of resources in its application layer and database layer.<ref>http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html</ref>

Fig 1:A web application scaling out

Need for scalability in web applications

In this internet age, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users i.e. traffic to the website increases. The server which was able to handle the initial load may not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which will affect the website in an economic and/or defamatory manner. It is, thus, important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For e.g., Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer Michael Jackson died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increases in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover through the addition of more resources to ensure that there is minimum, if not zero, downtime.

Good Programming Practices for Scalability

While some of the practices include following certain approaches during coding, many of the good practices are the outcomes of good architectural design for an application. Some of the good programming practices to support a scalable web application is listed below.

Database Indexes and Query Optimisation<ref>http://mrkirkland.com/prepare-for-web-application-scalability/</ref>

Not just with respect to scalability, but having indexes is a must in databases. Getting the indexes right is going to significantly improve performance. For example, consider a table ORDER which has a column, CustomerID referenced from another table CUSTOMER. ie CustomerID is a foregin key in the table ORDER. If the table does not have an index for CustomerID, then a query to retrieve the orders for a customer would require the entire ORDER table to be scanned to get the required Orders. This would be a serious performance hit and would be terrible when scaled, especially if getting the orders for a customer is a frequent operation. Having an index for CustomerID in ORDER table would prevent having to scan the entire ORDER table. Also, queries should take full advantage of the power of databases. For example, doing a loop to retrieve the rows of a table is a big NO. Instead, all DB operations must be set-based operations as much as possible and should be done in a single query.

Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>

It is good practice to make small isolated tasks and run them in parallel. For example, if the web application has a page to update information and a widget on the same page showing a live news feed, they should be run in parallel as both tasks are independent. Today, most machines have multi-core processors and have support for multi-threading which should be taken advantage of.

Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

It is a common practice to save data in server-side sessions between HTTP requests. This increases the per connection overhead and, hence, does not scale well. A good practice is to save as much data as possible on the client-side in cookies. Trust concerns can be mitigated by encrypting information. Only the required information should be sent to the server, purely on as much as required basis.

Use a modular design

There should be zero coupling between various independent components, such as :

  • Media server
  • DB server
  • Mail server
  • Front End (UI)
  • Back End etc.

Combining these logically separate entities into the same module is a recipe for disaster in terms of maintainability, debugging, re-usability, and so on.

Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>

If the application receives a lot of data which must be stored but will not be used frequently, it is a good idea to compress and store the data. For example, old emails (older than 3 months) in a mail server. Since the chance of old emails being checked is low, this practice saves a lot of money in terms of disk storage costs, especially when the application supports a huge number of users. Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network.

Design around seek

IO operations incur a lot of overhead. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential rather than random locations. Caching data also helps to reduce the number of seeks required.

Non-programming factors affecting Scalability

In addition to the software aspects, there are certain hardware related considerations that have to be made for a scalable application. This section explains in detail about the hardware related factors.

Vertical Scaling

This is generally the the first as well as most intuitive approach to achieve scalability. Let us consider an analogy to understand this concept. Assume you have a 512MB RAM computer. Initially, when it was brand new and you had very few applications running on it, it was working fine. But once you install many applications and start multitasking your machine becomes slow and you regularly get “Application not responding” errors. The most obvious thing to do is upgrade your machine to have around 2GB of RAM instead of the previous 512 MB of RAM. Now, all applications run as smoothly as before. The same is done to scale web applications. Here, the bottleneck is your server. It can only accommodate a fixed number of requests per second. If the incoming traffic to your web servers increases, users will experience a lag while accessing your application. You go out and buy a web server which offers a higher performance than the previous one. This is called vertical scaling.

The obvious benefit of this approach is that it is so simple that it does not require complex changes to get it up and running. You simply port your code to the new machine and it will run fine.

The drawback of this approach is that eventually you will run out of resources to scale vertically. These resources could be money or even technological barriers. There may not be a machine developed yet which is as powerful as you currently need.

Fig 2. Vertical Scaling

Horizontal Scaling

This approach comes in when you have reached a technical barrier to scale vertically or you do not have enough money to buy that powerful machine you need. This approach also has the same motive, to make sure that all machines are run below the max load so as to behave in an acceptable manner. But, instead of having a more powerful machine, this approach utilizes many smaller machines to achieve the same end result. This approach requires use of a load balancer to do the majority of the heavy lifting.

The advantage of this approach is that it is cost effective. However, it has a major disadvantage in that it complicates the system. Now, setting this up is complex and managing them is a major issue. You need to make sure that you split the traffic evenly among the servers so as to keep everyone running under max load. You now also need to worry about consistency in the system as you still need to make the multiple servers seem like a single logical system to the end user.

Fig 3. Horizontal Scaling

Caching

Caching can be a really effective technique if your application frequently deals with similar results/queries. The crux of this method is to cache data in an In-Memory cache. The cached data could be results of queries to the database or a collections of objects in the database. This saves repeated calls to the database, saving computation time as well as avoiding a database access bottleneck. There are a good number of tools like Memcached (http://memcached.org/) and Redis(http://redis.io/) available which help ease the process of utilizing this method.

The great part about this is that you can store any values that you want in these caches. It could be as simple as a leaderboard or as complex as a multi-table query. These caches are stored as Hashes (key-value pairs).

These are tested tools being used by some of the top web application product companies:

  • Youtube and Twitter use Memcached
  • GitHub, Pinterest and StackOverflow use Redis

One more advantage of these tools are that they are open source and have an active community support

Fig 4. Caching: Redis Data Example

Load Balancing

Let us assume your application has become an overnight success and you receive thousands of requests/sec. If you have a single server, it will become a bottleneck and the speed of your website will suffer. To overcome this, you add 2 more servers so that you can route incoming traffic into other web servers too. This looks like a really appealing solution until you run into the fundamental problem of how are you going to ensure that the incoming traffic is not unevenly distributed to the servers. In this scenario, the load balancer is an ideal solution. A load balancer is a software/hardware that is used to distribute the incoming traffic evenly among the web servers. The load balancer can utilize many different metrics to achieve this. It can use a simple round robin approach to cycle between the web servers. It could also use a more intelligent approach by checking the approximate load on each server and then using it to allocate web requests to web servers. Regardless of which method it uses, the overall effect is the speedup of applications and better response times.

A benefit of this approach is that you do not need to make public the internal structure of your web application network components such as IP address of each server. The load balancer can act as a single point of entry to the application.

There are a few caveats in this method though. We would like to maintain sessions in a single server i.e. If my first request is to server A, for the rest of the requests, it would be beneficial to use the same web server. For example, if I have been authenticated as a user in Web Server A and if my next request goes to Web Server B, then it may happen that my authentication object is not passed and the user is asked to authenticate himself/herself once again. In a more real-world scenarion, it would be faster to search all related tasks in a web application server than to call everything again in all the web servers.

Fig 5. Load Balancing

Database Replication

Fig 7. DB Replication with multiple writes

In the earlier scenario, let us assume that you now have multiple servers but only a single database. Now, you are back in more or less the same situation that you were in earlier. Your rendering has become faster but the reads/writes are still a problem as there is just one database and the web servers have to wait for their turn to access it.

An obvious solution to this is to replicate the database. You now have multiple copies of the databases too, but it comes with its own set of difficulties. You will have to ensure that the data is consistent across the databases. Any writes that you have will be written back to all the databases. This will be a time-consuming as well as complex task. One can argue that instead of writing to all the databases, any web request should write to a single database and let it propagate the changes to the others. This is a feasible solution if your application is Read-heavy. If it is a Write heavy application, we can use another approach known as database partitioning, detailed in the next section.

Fig 8. DB Replication with single writes

Database partitioning

From the previous section, we know that database partitioning will not be useful in case of a write heavy web application. There are a couple of techniques that form a part of this general umbrella of database partitioning techniques. One way is to maintain separate databases for separate kinds of data eg. separate media database, separate user profile database etc.

Another widely used approach is sharding. In this technique, you split the data horizontally. i.e. , instead of separating columns of data alone, you split the data rows / tuples. The main goal behind this step is to convert the large , bulky database into smaller, manageable databases which do not have an inter dependency on each other. For example, you could split a customers database geographically, so, a user who is a part of the Asia region will be stored separately from an American user.

The following are the advantages of this technique:

  • High availability - Only a part of the application is affected if one of the databases go down.
  • Faster queries - Since each database has less amount of data, queries take lesser time.
  • More write bandwidth - Since data is separate, you could write in multiple databases at the same time.
Fig 9. DB Partitioning

Scalability in Rails applications

Ruby on Rails(RoR), simply referred to as Rails, is a very elegant open source web application framework for creating the full stack by emphasizing on the use of well-known software engineering patterns and paradigms.<ref> http://guides.rubyonrails.org/getting_started.html#what-is-rails-questionmark</ref>. Many high-profile web firms use Ruby on Rails to build agile, scalable web applications. Some of the largest sites running Ruby on Rails include GitHub, Yammer, Scribd, Shopify, Hulu, and Basecamp<ref> http://articles.businessinsider.com/2011-05-11/tech/30035869_1_ruby-rails-custom-software</ref>. As of May 2014, it is estimated that more than 600,000 web sites are running Ruby on Rails.<ref><http://trends.builtwith.com/framework/Ruby-on-Rails></ref>

When Twitter had some significant downtimes in early 2008, a widespread myth started about Rails, ‘Rails doesn’t scale’<ref> http://techcrunch.com/2008/05/01/twitter-said-to-be-abandoning-ruby-on-rails></ref><ref>http://canrailsscale.com/></ref>. However, when Twitter’s lead architect, Blaine Cook held Ruby blameless<ref> http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster</ref>, the developer community came to the general conclusion that it is not the languages or frameworks that is a bottleneck to scaling, but the architecture. Therefore, Rails does scale well, but it is dependent on the condition that it has been architected to scale well.

Even though it applies to web applications in general, the following sections describe practices which help in the development of scalable Rails applications.

Optimizing the application

In spite of the fact that Ruby has a lot of limitations in terms of lack of true support for multithreading, a lot of scaling issues can be solved by optimizing certain ways in which operations are done.

Scaling Reads by Precomputing<ref>http://devblog.moz.com/2011/09/scaling-the-f-out-of-a-rails-app/</ref>

Often, because of the quick nature of development of Rails applications, developers tend to miss anticipating operations which might require a lot of time to compute as data grows in the future. A typical example of such an operation would be an aggregate view which is generated based on usage in the past. At some point, calculating the aggregate view would take a lot of time and would result in a long time to render UI controls like graphs if it is done on-demand. To solve such issues, it is a good idea to precompute the values, frequency depending on the input volume, and store the results. Rendering would eventually be reduced to either displaying the data or doing minimal calculation so that the end-user does not have to wait long for results.

Cache expensive operations

Sometimes it is not possible to pre-compute for every possible set of operation. However, if such operations are going to be used frequently in a short period of time, it is a good idea to cache the first result so that subsequent results can be displayed faster. A typical example<ref>https://speakerd.s3.amazonaws.com/presentations/50044249a555b00002037c2c/Fast.pdf</ref> would look like:

Caching in Rails.rb (Slower first time)

  Benchmark.measure do
    Rails.cache.fetch(“cache_key”) do 
      User.some_expensive_query
    end
  end.real
  => 20.5s

Caching in Rails 2.rb (Super fast after that)

  Benchmark.measure do
    Rails.cache.fetch(“cache_key”) do 
      User.some_expensive_query
    end
  end.real
  => 0.00001s

Make the databases queries faster<ref>http://rakeroutes.com/blog/increase-rails-performance-with-database-indexes/</ref>

Database accesses is a significant factor affecting the throughput of a web application. Some of the common methods to make database IO faster are:

  • Indexing

DB reads can increase significantly by the addition of database indexes. Indexes are helpful when a search has to be made for an attribute other than id. For example, a query in the Products table,

  products.where(:name => :shampoo)

would result in the query having to look at every row of the table and see if the name of the product is shampoo. This unnecessarily expensive operation can be speeded up by the addition of an index as below

  class AddNameIndexToProducts < ActiveRecord::Migration
    def change
      add_index: :products, :name
    end
  end

This eliminates the need to look-up every row as all the products with name ‘shampoo’ would have references stored together and thus can be quickly accessed. Another common example of indexes is in join statements. i.e if the design has belongs_to and has_many relationship. Rails does not create indexes by default for the foreign keys. Creation of an index similar to the one shown above in the table for the foreign key would also result in a significant improvement for join queries. In Ruby 3.2, the automated explain plan<ref>http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain</ref> was introduced which writes out warnings in application logs when the query takes longer than 0.5s(by default). It is a useful tool to analyze the performance of queries and can give important pointers to identify where indexes should be added. In most web applications, the percentage of reads is significantly higher than writes and hence they can really benefit by using indexes. However, if a particular table has mostly writes then it would not be a good idea to have indexes for the table as having the index causes extra work while inserting and updating entries.

  • Eliminating n+1 queries

n+1 queries are solved by using eager load. Consider the following example<ref>http://guides.rubyonrails.org/active_record_querying.html</ref>,

  users = User.limit(20)
  
  users.each do |user|
    puts user.address.city
  end

Assuming users and address are different tables, the above set of statements would result in 21 queries. One for retrieving 20 users and then twenty queries for getting the city of each user. This is known as the n+1 problem. This solution can be mitigated by using an eager load. In this case, we can load the addresses along with user with a statement like this.

  users = User.includes(:address).limit(20)

  users.each do |user|
    puts user.address.city
  end

This code will only require 2 queries, one to get the users and one to get their associated addresses. This is known as eager loading where the data is loaded in advance. However, this should be used only when it is absolutely require. Unused eager loading might burden the IO bandwidth by retrieving unnecessary data. Gems like Bullet can be used analyze database queries and detect n+1 queries and unused eager loading.

Architecture

Irrespective of the efficiency of the code, at some point, the hardware resources reach the threshold of the maximum load it can take as the application goes popular. It is thus essential to design applications in a way that the application can take advantage of the additional resources added to it. The most essential to support this requirement is to write applications in a modularized way. Thus, a typical Rails application should have the following three parts as separate components:

  • Web: The front end or the web module should run Ruby code. Its sole purpose is to handle requests and/or render UI to the user to display information
  • Worker:

The worker module performs compute-intensive operations or other background jobs such as sending emails or running Resque which is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.

  • Datastores:

Datastores persist the data. Typically, in the form of databases, data stores should exist on separate machines.

A modularized architecture like this can be horizontally scaled or vertically scaled depending on which module needs to have the performance increased.

Application Deployment

At a future stage, when the user base grows beyond a certain threshold, it is a good idea to add load balancers to distribute the incoming requests. A very popular deployment model is the use of Unicorn Application Servers and Nginx Load Balancers.<ref>https://www.digitalocean.com/community/tutorials/how-to-scale-ruby-on-rails-applications-across-multiple-droplets-part-1</ref>

Unicorn Application Server<ref>https://github.com/blog/517-unicorn</ref>

Unicorn is an application server which runs Rails applications to process incoming requests. These application servers are responsible for handling only those requests that require processing once they have been filtered and pre-processed by the front-facing Nginx servers. Unicorn is simplistic in design by handling what needs to be done by a web application and it delegates the rest of the responsibilities to the operating system.

Unicorn's master process spawns slave processes to serve requests. This master process also monitors other processes to prevent memory and process related staggering issues.

Nginx HTTP Server / Reverse-Proxy / Load-Balancer<ref>http://nginx.org/en/docs/http/load_balancing.html</ref>

Nginx HTTP server is a multi-purpose, front-facing web server. It acts as the first entry point for requests and is capable of balancing connections and dealing with certain exploit attempts. It filters out the invalid requests and forwards the valid requests to the Unicorn Application servers to be processed. Together, they serve well to make Rails applications scalable.

References

<references/>