CSC/ECE 517 Fall 2014/ch1a 20 rn: Difference between revisions
Line 15: | Line 15: | ||
In this age ruled by the internet, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users ie traffic to the web site increases. The server which was able to handle the initial load might not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which would affect the web site in an economic and/or defaming manner. It is thus important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For eg. Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer [http://en.wikipedia.org/wiki/Michael_Jackson Michael Jackson] died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increase in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover by the addition of more resources to ensure that there is minimum downtime, if not zero. | In this age ruled by the internet, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users ie traffic to the web site increases. The server which was able to handle the initial load might not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which would affect the web site in an economic and/or defaming manner. It is thus important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For eg. Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer [http://en.wikipedia.org/wiki/Michael_Jackson Michael Jackson] died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increase in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover by the addition of more resources to ensure that there is minimum downtime, if not zero. | ||
=Good | =Good Programming Practices for Scalability= | ||
While some of the practices include following certain approaches during coding, many of the good practices are the outcomes of good architectural design for an application. Some of the good programming practices to support a scalable web application is listed below. | While some of the practices include following certain approaches during coding, many of the good practices are the outcomes of good architectural design for an application. Some of the good programming practices to support a scalable web application is listed below. | ||
Line 38: | Line 38: | ||
Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network. | Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network. | ||
==Design around seek== | ==Design around seek== | ||
A lot of overhead is incurred because of IO operations. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential locations instead of random locations. Caching data also helps to reduce the number of seeks needed. | A lot of overhead is incurred because of IO operations. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential locations instead of random locations. Caching data also helps to reduce the number of seeks needed. | ||
=Web technology agnostic Scalability= | =Web technology agnostic Scalability= |
Revision as of 01:21, 20 September 2014
Scalability in Web Applications
This page starts with an introduction of scalability and its importance in web application. It then explains good programming practices and aspects to be considered to make a web application more scalable. In the end, there is a brief description on how to achieve scalability in Rails application
Introduction
Scalability is the ability of a system, network or process to handle an increase in load in a capable manner or its ability to expand to accommodate this increase. In a typical scenario, it can be understood as the ability of a system to perform extra work by the addition of resources (mostly hardware). It is generally difficult to define scalability precisely and in any particular case it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system. A very good example of a scalable system would be the Domain Name System. The distributed nature of the Domain Name System allows it to scale well even when new systems are added all over the world daily.
Scalability in Web Applications
Scalability of a web application is its ability to handle an increase in number of users or site traffic in a timely manner. Ideally, the experience in using the web application should be the same if the site has a single user or a million user. This kind of scalability is usually achieved with the addition of hardware in either a scaling up (vertical scaling) or a scaling out (horizontal scaling) manner. A typical web application follows a multilayered architecture. The web application is considered to be scalable if each layer or component in its architecture is scalable. Fig 1 shows a web application which is able to linearly scale out with the addition of resources in its application layer and database layer.<ref>http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html</ref>
Need for scalability in web applications
In this age ruled by the internet, web applications or web sites are essential for a company’s survival. Most applications are generally developed and tested with an initial small load. As the application gains popularity, the number of users ie traffic to the web site increases. The server which was able to handle the initial load might not be able to handle the new traffic. This causes the application to become unresponsive and result in a bad user experience which would affect the web site in an economic and/or defaming manner. It is thus important to design a web application with scalability in mind i.e. the application should be able to scale with the addition of new hardware without affecting the usability of the web application. Many popular web applications, especially social networking sites like Facebook and Twitter thrive because of their ability to scale for their ever increasing user base. For eg. Twitter has grown from 400,000 tweets posted per quarter in 2007<ref>http://www.telegraph.co.uk/technology/twitter/7297541/Twitter-users-send-50-million-tweets-per-day.html</ref> to 200,000,000+ monthly users and 500,000,000+ daily tweets in <ref>http://www.sec.gov/Archives/edgar/data/1418091/000119312513390321/d564001ds1.htm</ref>. However there are instances where even the most scalable architecture might not be able to handle the traffic. As an example, when American singer Michael Jackson died on June 25, 2009, Twitter servers crashed after users started updating their status at a rate of 100,000 tweets per hour <ref>http://news.bbc.co.uk/2/hi/technology/8120324.stm</ref>. Therefore, it is important for a web application to be highly scalable so that it is able to adjust to increase in traffic. Even if the web application is overwhelmed by a sudden spike, it should be able to quickly recover by the addition of more resources to ensure that there is minimum downtime, if not zero.
Good Programming Practices for Scalability
While some of the practices include following certain approaches during coding, many of the good practices are the outcomes of good architectural design for an application. Some of the good programming practices to support a scalable web application is listed below.
Database Indexes + Query Optimisation<ref>http://mrkirkland.com/prepare-for-web-application-scalability/</ref>
Not just with respect to scalability, but having indexes is a must in databases. Getting the indexes right is going to significantly improve performance. For example, consider a table ORDER which has a column, CustomerID referenced from another table CUSTOMER. ie CustomerID is a foregin key in the table ORDER. If the table does not have an index for CustomerID, then a query to retrieve the orders for a customer would require the entire ORDER table to be scanned to get the required Orders. This would be a serious performance hit and would be terrible when scaled, especially if getting the orders for a customer is a frequent operation. Having an index for CustomerID in ORDER table would prevent having to scan the entire ORDER table. Also, perform queries using the power of databases. For example, doing a loop to retrieve the rows of a table is a big NO. Instead, all DB operations must be set-based operations as much as possible.
Use parallel processing <ref>http://lwn.net/Articles/441790/</ref>
It is a good practice to make isolate small tasks and run them in parallel. For example, if the web application has a page to update information and a widget on the same page showing a live news feed, they should be running in parallel as both these tasks are independent. Most machines these days have multi-core processors and have support for multi-threading which should be taken advantage of.
Save minimal data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>
It is a common practice to save data in server-side sessions between HTTP requests. This increases the per connection overhead and hence does not scale well. A good practice is to save as much data as possible on the client side in cookies. Trust concerns can be mitigated by encrypting the information. Only the required information should be sent to the server, purely on as much as required basis.
Use a modular design
There should be zero coupling between different independent components such as :
- Media server.
- DB server
- Mail server
- Front End(UI)
- Back End etc.
Combining these logically separate entities into the same module is a recipe for disaster in terms of maintainability, debugging, reusability etc.
Compressing data<ref>http://www.rackspace.com/blog/writing-code-that-scales/</ref>
If the application receives a lot of data which has to be stored but which wouldn’t be used frequently, it is a good idea to compress and store the data. For example, old emails(say >3 months) in a mail server. Since the chance of old emails being checked is low, this practice saves a lot of money in terms of disk storage costs, especially when the application is supporting a huge number of users. Similarly, it also a good practice to compress the data while sending it over the network. The time spent in compressing and decompressing the data is overshadowed by the time saved while sending a smaller payload over the network.
Design around seek
A lot of overhead is incurred because of IO operations. A good programming practice is to reduce the number of seeks to read data. This can be achieved by ensuring that logically coherent data is written in sequential locations instead of random locations. Caching data also helps to reduce the number of seeks needed.
Web technology agnostic Scalability
Vertical Scaling
This is generally the the first as well as most intuitive approach to achieve scalability. Let us take an analogy to understand this concept. let’s say you have a 512MB RAM computer. Initially when it was brand new and you had very few applications running on it, it was working fine. But now, you’ve started installing many applications as well as started multitasking with many applications running together. this has caused your machine to become laggy and you regularly get “Application not responding” errors. the obvious thing that you’ll do is upgrade your machine to have lets say 2GB of RAM instead of the previous 512 MB of RAM. Now, all applications run as smoothly as before. the same is done to scale web applications. Here, the bottleneck is your server. It can only accommodate a fixed number of requests per second. if you start getting more traffic to your web servers, the users will experience lags while accessing your application. So, now you go out and buy a web server which offers a higher performance than the previous one. This is called vertical scaling.
The obvious benefit of this approach is that it is so simple that it doesn’t require any complex changes to be made in order to get it up and running. You simply port your code to the new machine and it will run fine
the drawback of this approach is that eventually you will run out of resources to scale vertically. these resources could either be Money or even technological barriers. there may not be a machine developed yet which is as powerful as you currently need.
Horizontal Scaling
This approach comes in when you have reached a technical barrier to scale vertically or you do not have enough money to buy that powerful machine you need. This approach also has the same motive, to make sure that all machines are run below the max load so as to behave in an acceptable manner. But, instead of having a more powerful machine, this approach utilizes many smaller machines to achieve the same end result. This approach requires use of a load balancer to do the majority of the heavy lifting.
the advantage of this approach is that it is cost effective. but, it has a major disadvantage in that it complicates the system. Now, setting this up is complex and managing them is a major issue. Now you need to make sure that you split the traffic evenly among the servers so as to keep everyone running under max load. You now also need to worry about consistency in the system as you still need to make the multiple servers seem like a single logical system to the end user
Caching
This could be a really effective technique if your application deals with more or less similar results / queries most of the time. the crux of this method is to cache the data in an In-Memory cache. this data could be results of the queries to the database or a collections of objects in the database. This saves repeated calls to the database thus saving computation time as well as database access bottleneck. There are a good number of tools like Memcached (http://memcached.org/) and Redis(http://redis.io/) available which help ease the process of utilizing this method
The great part about this is that you can store any values that you want in these caches. It could be as simple as a leaderboard or as complex as a multi - table query. These caches are stored as Hashes (key-value pairs).
These are tested tools being used by some of the top web application product companies (Youtube, Twitter user Memcached ; GitHub, Pinterest, StackOverflow use Redis)
One more advantage of these tools are that they are open source and have an active community support
Load Balancing
Let us assume your application become an overnight success and you are getting thousands of requests / sec. Now, if you have just one server, it will become a bottleneck and your site speed will suffer,. to overcome this, you added 2 more servers so that you can route incoming traffic into other web servers too. This looks like a really appealing solution until you run into the fundamental problem of how are you going to make sure that the incoming traffic is not unevenly distributed to the servers. You would be using the load balancer in this case. A load balancer is a software / hardware that is used to distribute the incoming traffic evenly among the web servers. The load balancer can utilize many different metrics to achieve this.It could use a simple round robin approach cycling between the web servers. It could also use a more intelligent approach by checking the approximate load on each server and then using it to allocate web requests to web servers. whichever way it uses, the overall effect is the speedup of applications and better response times
Some of the benefits of this approach is that you do not need to make public the internal structure of your web application network components like IP address of each server. the load balancer can act as a single point of entry to the application.
There are a few caveats in this method though. We would like to maintain sessions in a single server i.e. If my first request if to server A, for the rest of the requests, it would be beneficial to use the same web server. For example, If i have authenticated as a user in web server A and if in my next request goes to web server B, then it may happen that my authentication object is not passed and the user is asked to authenticate itself once again.in a more believable scenarion, it would be faster to search all related tasks in a web application server than to call everything again in all the web servers.
Database Replication
In the same scenario as before, let us assume that now you have multiple servers but you only have one database. Now, you’re back at more or less the same position that you are in as before. Your rendering has become faster but the reads / writes are still a problem as there is just one database and the web servers need to wait for their turn to access it.
An obvious solution to this is to replicate the database so now you’ll have multiple copies of the databases too. but it comes with its own set of difficulties. You now need to make sure that the data is consistent across the databases. So any writes that you have will be written back to all the databases. This will be a time consuming as well as a complex task. one could argue that instead of having writes to all databases, any web request should write to just one of the database and let it propagate the changes to the others. This is a feasible solution if your application is read heavy. if it is write heavy, there is another approach known as database partitioning as we will see in the next section.
Database partitioning
From the previous section, we know that database partitioning will be a useful incase of a write heavy web application. There are a couple of techniques that form a part of this general umbrella of database partitioning techniques. one way is to maintain separate databases for separate kinds of data eg. separate media database, separate user profile database et. al.
Another widely used approach is sharding. In this technique, you split the data horizontally. i.e. , instead of separating columns of data alone, you split the data rows / tuples. The main goal behind this step is to convert the large , bulky database into smaller, manageable databases which do not have an interdependency on each other. for eg, you could split a customers database geographically, so, a user who is a part of the asia region will be stored separately from an american user.
The following are the advantages of this technique: High availability - only part of the application is affected if one of the databases go down Faster queries - since each database has less amount of data, queries take lesser time More write bandwidth - since data is separate, you could write in multiple databases at the same time
Scalability in Rails applications
Ruby on Rails(RoR), simply referred to as Rails, is a very elegant open source web application framework for creating the full stack by emphasizing on the use of well-known software engineering patterns and paradigms.<ref> http://guides.rubyonrails.org/getting_started.html#what-is-rails-questionmark</ref>. Many high-profile web firms use Ruby on Rails to build agile, scalable web applications. Some of the largest sites running Ruby on Rails include GitHub, Yammer, Scribd, Shopify, Hulu, and Basecamp<ref> http://articles.businessinsider.com/2011-05-11/tech/30035869_1_ruby-rails-custom-software</ref>. As of May 2014, it is estimated that more than 600,000 web sites are running Ruby on Rails.<ref><http://trends.builtwith.com/framework/Ruby-on-Rails></ref>
When Twitter had some significant downtimes in early 2008, a widespread myth started about Rails, ‘Rails doesn’t scale’<ref> http://techcrunch.com/2008/05/01/twitter-said-to-be-abandoning-ruby-on-rails></ref><ref>http://canrailsscale.com/></ref>. However, when Twitter’s lead architect, Blaine Cook held Ruby blameless<ref> http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster</ref>, the developer community came to the general conclusion that it is not the languages or frameworks that is a bottleneck to scaling, but the architecture. Therefore, Rails does scale well, but it is dependent on the condition that it has been architected to scale well.
Even though it applies to web applications in general, the following sections describe practices which help in the development of scalable Rails applications.
Optimizing the application
In spite of the fact that Ruby has a lot of limitations in terms of lack of true support for multithreading, a lot of scaling issues can be solved by optimizing certain ways in which operations are done.
Scaling Reads by Precomputing<ref>http://devblog.moz.com/2011/09/scaling-the-f-out-of-a-rails-app/</ref>
Often, because of the quick nature of development of Rails applications, developers tend to miss anticipating operations which might require a lot of time to compute as data grows in the future. A typical example of such an operation would be an aggregate view which is generated based on usage in the past. At some point, calculating the aggregate view would take a lot of time and would result in a long time to render UI controls like graphs if it is done on-demand. To solve such issues, it is a good idea to precompute the values, frequency depending on the input volume, and store the results. Rendering would eventually be reduced to either displaying the data or doing minimal calculation so that the end-user does not have to wait long for results.
Cache expensive operations
Sometimes it is not possible to precompute for every possible set of operation. However, if such operations are going to be used frequently in a short period of time, it is a good idea to cache the first result so that subsequent results can be displayed faster. A typical example<ref>https://speakerd.s3.amazonaws.com/presentations/50044249a555b00002037c2c/Fast.pdf</ref> would look like:
Caching in Rails.rb (Slower first time)
Benchmark.measure do Rails.cache.fetch(“cache_key”) do User.some_expensive_query end end.real => 20.5s
Caching in Rails 2.rb (Super fast after that)
Benchmark.measure do Rails.cache.fetch(“cache_key”) do User.some_expensive_query end end.real => 0.00001s
Make the databases queries faster<ref>http://rakeroutes.com/blog/increase-rails-performance-with-database-indexes/</ref>
Database accesses is a significant factor affecting the throughput of a web application. Some of the common methods to make database IO faster are:
- Indexing
DB reads can increase significantly by the addition of database indexes. Indexes are helpful when a search has to be made for an attribute other than id. For example, a query in the Products table,
products.where(:name => :shampoo)
would result in the query having to look at every row of the table and see if the name of the product is shampoo. This unnecessarily expensive operation can be speeded up by the addition of an index as below
class AddNameIndexToProducts < ActiveRecord::Migration def change add_index: :products, :name end end
This eliminates the need to look-up every row as all the products with name ‘shampoo’ would have references stored together and thus can be quickly accessed. Another common example of indexes is in join statements. i.e if the design has belongs_to and has_many relationship. Rails does not create indexes by default for the foreign keys. Creation of an index similar to the one shown above in the table for the foreign key would also result in a significant improvement for join queries. In Ruby 3.2, the automated explain plan<ref>http://weblog.rubyonrails.org/2011/12/6/what-s-new-in-edge-rails-explain</ref> was introduced which writes out warnings in application logs when the query takes longer than 0.5s(by default). It is a useful tool to analyze the performance of queries and can give important pointers to identify where indexes should be added. In most web applications, the percentage of reads is significantly higher than writes and hence they can really benefit by using indexes. However, if a particular table has mostly writes then it would not be a good idea to have indexes for the table as having the index causes extra work while inserting and updating entries.
- Eliminating n+1 queries
n+1 queries are solved by using eager load. Consider the following example<ref>http://guides.rubyonrails.org/active_record_querying.html</ref>,
users = User.limit(20) users.each do |user| puts user.address.city end
Assuming users and address are different tables, the above set of statements would result in 21 queries. One for retrieving 20 users and then twenty queries for getting the city of each user. This is known as the n+1 problem. This solution can be mitigated by using an eager load. In this case, we can load the addresses along with user with a statement like this.
users = User.includes(:address).limit(20) users.each do |user| puts user.address.city end
This code will only require 2 queries, one to get the users and one to get their associated addresses. This is known as eager loading where the data is loaded in advance. However, this should be used only when it is absolutely require. Unused eager loading might burden the IO bandwidth by retrieving unnecessary data. Gems like Bullet can be used analyze database queries and detect n+1 queries and unused eager loading.
Architecture
Irrespective of the efficiency of the code, at some point, the hardware resources reach the threshold of the maximum load it can take as the application goes popular. It is thus essential to design applications in a way that the application can take advantage of the additional resources added to it. The most essential to support this requirement is to write applications in a modularized way. Thus, a typical Rails application should have the following three parts as separate components:
- Web: The front end or the web module should run Ruby code. Its sole purpose is to handle requests and/or render UI to the user to display information
- Worker:
The worker module performs compute-intensive operations or other background jobs such as sending emails or running Resque which is a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.
- Datastores:
Datastores persist the data. Typically, in the form of databases, datastores should exist on separate machines.
A modularized architecture like this can be horizontally scaled or vertically scaled depending on which module needs to have the performance increased.
Application Deployment
At a future stage, when the user base grows beyond a certain threshold, it is a good idea to add load balancers to distribute the incoming requests. A very popular deployment model is the use of Unicorn Application Servers and Nginx Load Balancers.<ref>https://www.digitalocean.com/community/tutorials/how-to-scale-ruby-on-rails-applications-across-multiple-droplets-part-1</ref>
Unicorn Application Server<ref>https://github.com/blog/517-unicorn</ref>
Unicorn is an application server which runs Rails applications to process the incoming requests. These application servers are responsible for handling only those requests that need processing after having them filtered and pre-processed by the front-facing Nginx servers. Unicorn is simplistic in design by handling what needs to be done by a web application and it delegates the rest of the responsibilities to the operating system.
Unicorn's master process spawns slave processes to serve the requests. This master process also monitors the other processes in order to prevent memory and process related staggering issues.
Nginx HTTP Server / Reverse-Proxy / Load-Balancer<ref>http://nginx.org/en/docs/http/load_balancing.html</ref>
Nginx HTTP server is a multi-purpose, front-facing web server. It acts as the first entry point for requests and is capable of balancing connections and dealing with certain exploit attempts. It filters out the invalid requests and forwards the valid requests to the Unicorn Application servers to be processed. Together, they serve well to make Rails applications scalable.
References
<references/>