CSC/ECE 517 Fall 2015/ossA1550RAN: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=='''A1550 - Web Socket Implementation in Apache Ambari'''==
=='''A1550 - Web Socket Implementation in Apache Ambari'''==
Ambari-Web uses simple ajax polling mechanism to fetch data from Ambari-Server. Constant polling is done to show current service status, alerts, service graphs, etc on Ambari-Web. With this mechanism, the performance of Ambari-Server can be affected on a large size cluster with multiple active browser sessions due to continuos heavy requests being made.  
Ambari-Web uses simple ajax polling mechanism to fetch data from Ambari-Server. Constant polling is done to show current service status, alerts, service graphs, etc on Ambari-Web. With this mechanism, the performance of Ambari-Server can be affected on a large size cluster with multiple active browser sessions due to continuous heavy requests being made.  


WebSocket is a protocol providing full-duplex communication channels over a single TCP connection. Implementing Web-Socket between Ambari-Web and Ambari-Server will be helpful to address this scenario.
[https://en.wikipedia.org/wiki/WebSocket WebSocket] is a protocol providing full-duplex communication channels over a single TCP connection. Implementing Web-Socket between Ambari-Web and Ambari-Server will be helpful to address this scenario.


=='''What is Apache Ambari'''==
=='''What is Apache Ambari'''==
Apache Ambari is a software project of the [[Apache Software Foundation]], is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use [[Apache_Hadoop|Hadoop]] management web UI backed by its [[Representational_state_transfer|RESTful]] APIs. Ambari was a sub-project of [[Hadoop]] but is now a [[Apache_Software_Foundation#Projects|top-level]] project in its own right.
Apache Ambari is a software project of the [https://en.wikipedia.org/wiki/Apache_Software_Foundation Apache Software Foundation], is aimed at making [https://en.wikipedia.org/wiki/Apache_Hadoop Hadoop] management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use [https://en.wikipedia.org/wiki/Apache_Hadoop Hadoop] management web UI backed by its [https://en.wikipedia.org/wiki/Representational_state_transfer RESTful] APIs. Ambari was a sub-project of [https://en.wikipedia.org/wiki/Apache_Hadoop Hadoop] but is now a [https://en.wikipedia.org/wiki/Apache_Software_Foundation#Projects top-level] project in its own right.
 
An Apache Hadoop cluster consists on a group or nodes/machines. Each node acts as an Ambari Agent that runs various service components (e.g. Datanode for HDFS). One of the agents acts as an Ambari Server that takes care of the task allocation, management, gathering information about the services status and other information from the agents and giving the information to Ambari UI for display. More information about Agent-Server-Web flow is available in the following sections.


=='''Current Implementation'''==
=='''Current Implementation'''==
Ambari is a multi-module Maven project. The relevant modules for our project from an implementation perspective are Ambari Web and Ambari Server. Ambari Agent is another module, important functionally.
<b>Ambari Agent </b> is a component that resides on every node of the cluster. Its responsibility is to collect the node's overall health which comprises of the health metrics of all the different services running on that node. Ambari Agent sends these statuses to the Ambari Server in the form of periodic messages which is termed as 'heartbeat' in the Ambari nomenclature. In return, Ambari agents receive Heartbeat responses from Ambari Server. Hearbeat responses could contain commands for the agent to perform.
<b> Ambari Web </b> presents a user interface to monitor and manage a Hadoop cluster. Web module is written in JavaScript using ember.js framework. During our code study, we observed this module following the [https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC] design methodology. The whole motivation of this module is to make it convenient to the end-user to manage the concerned Hadoop cluster.
<b> Ambari Server </b> is the most central component of an Ambari ecosystem. Ambari Server exposes [https://en.wikipedia.org/wiki/Application_programming_interface APIs]] to its clients such as Ambari Web, which monitors health of a cluster. Ambari Server clients can query for node health statuses. Ambari Server persists the nodes' statuses in a [https://en.wikipedia.org/wiki/PostgreSQL PostgreSQL] database in the backend. Internally, Ambari Server has several components which help it to manage its responsibilities. Ambari Server's important responsibilities are as follows:
<p> 1. Collect health statuses ( heartbeats ) from Ambari agents. Process their statuses and send heartbeat responses to Ambari agents. </p>
<p> 2. Handle requests from Ambari Server's clients: One such use case is installing a new service on an Ambari agent. This request could be received from Ambari Web and it is the server's responsibility to send appropriate instructions to the particular agent to get the new service up and running on the agent. </p>
=='''Testing Framework'''==
The codebase is very well supported with unit tests. Following are a few important characteristics of the testing process that we observed during our code study:
1. A popular mocking framework, [https://en.wikipedia.org/wiki/Mockito Mockito] is used to mock responses from dependent downstream services. An in-memory cluster is also mocked using Mockito. Mocking frameworks let us have our test cases fast and independent, which is very well depicted in this project.
2. Every controller method is tested in and out. Tests are quite extensive and cover almost all the code branches, testing each and every feature thoroughly.
3. Only the results/responses obtained from controller methods are asserted. We do not find any significant "verify() / expect()" calls. ("verify/expect" calls are used to assert if a given method was invoked during the test method execution). This is quite a good feature followed as the tests are implemented in a result-oriented way and are not bound to the method implementation in anyway. This also goes to show the stability of the tests as if the method implementation undergoes changes in future, it won't fail our tests.
==='''Steps to build and test Ambari-WebSocket'''===
1. Clone the repository to your local system<br>
  git clone https://github.com/nisarg64/ambari
2. Got to ambari directory and checkout the branch "migration" using below command<br>
  git checkout migration


=='''Project Goals'''==
3. You will need nodeJs, maven etc to build the ambari. Check the requirements as described in [https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Development Ambari PreConfigs]


=='''Project Benefits and Challenges'''==
4. After having the required configurations, run below command to build <br>
  mvn clean package -DskipTests -Drat.ignoreErrors=true
 
 
=='''Project Goals, Benefits and Challenges'''==
'''Goals:'''
<p>1. Understand architecture of Ambari-Server and Ambari-web</p>
<p>2. Replace the current pull-based mechanism with the push-based mechanism via Web-Socket</p>
<p>3. Write test cases for the Web Socket implementation to test the functionalities of WebSocket Client and WebSocket Server</p>
 
'''Benefits:'''
<p>1. Web Socket will allow Ambari-Server to perform robustly in a large cluster setup with multiple browser sessions and when continuous heavy requests are being made</p>
 
'''Challenges:'''
<p>1. Ambari server uses [https://en.wikipedia.org/wiki/Jetty_%28web_server%29 Jetty] 8.x version, while support for Web Socket was made available after [https://en.wikipedia.org/wiki/Jetty_%28web_server%29 Jetty] 9.x</p>
<p>2. Ambari has a huge codebase with multiple modules and multiple frameworks and design patterns adopted. Understanding the flow of the project and taking care of dependencies is a huge challenge</p>
<p>3. For testing the code requires a cluster of around 3 nodes. Running 3 virtual machines requires a high performance machine.</p>


=='''Learning Outcomes'''==
=='''Learning Outcomes'''==
<p>1. We have observed that Ambari project adopts various design patterns like [https://en.wikipedia.org/wiki/Singleton_pattern Singleton Pattern] in the Data Access Object Classes</p>
<p>2. We also noticed in one of the classes called HeartBeatHandler, that it obeys the [https://en.wikipedia.org/wiki/Law_of_Demeter Law of Demeter]</p>
<p>3. For testing, project using the [https://en.wikipedia.org/wiki/Mockito Mockito] framework to mock the heartbeat and cluster functionality</p>
<p>4. For configuration management, [https://en.wikipedia.org/wiki/Puppet_%28software%29 Puppet] configuration management tool is used</p>
<p>5. Ambari-Web is implemented in [https://en.wikipedia.org/wiki/Ember.js Ember.js] which is an open-source Javascript framework that follows [https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC] pattern similar to Rails framework.</p>
<p>6. Ambari-server uses [https://en.wikipedia.org/wiki/Jetty_%28web_server%29 Jetty] which is a web-server to handle all the HTTP requests made onto the ambari-server from the UI</p>


=='''Github Location'''==
=='''Github Location'''==
  https://github.com/apache/ambari
  [https://github.com/apache/ambari Ambari Project]
 
Forked Repository:
[https://github.com/nisarg64/ambari Forked Repo]
 
Changes committed to forked repositry:
[https://github.com/nisarg64/ambari/commit/f7b9131cb7324dda39b95db72673a63ab2ac109b Code changes]
 
'''Major code changes:'''
<p>Addition of websocket client-sever files [https://github.com/nisarg64/ambari/commit/f7b9131cb7324dda39b95db72673a63ab2ac109b#diff-4ac64b666141c53eb339c0375455d311 WebSocketClient.java] and [https://github.com/nisarg64/ambari/commit/f7b9131cb7324dda39b95db72673a63ab2ac109b#diff-0d2947581a9630d9b41c77e5f0fd23b6  WebSocketServer.java]<br>
Changes in heartbeat handler - [https://github.com/nisarg64/ambari/commit/f7b9131cb7324dda39b95db72673a63ab2ac109b#diff-1b3d5840e204ec59cec0cc42c9de12ea HeartBeatHandler.java]<br>
Version changes in [https://en.wikipedia.org/wiki/Apache_Maven#Project_Object_Model POM] files.</p>
 
<font color="red">Note: We are required to submit only one pull request and that too at the end of the final project.</font><br>


=='''References'''==
=='''References'''==

Latest revision as of 19:03, 8 November 2015

A1550 - Web Socket Implementation in Apache Ambari

Ambari-Web uses simple ajax polling mechanism to fetch data from Ambari-Server. Constant polling is done to show current service status, alerts, service graphs, etc on Ambari-Web. With this mechanism, the performance of Ambari-Server can be affected on a large size cluster with multiple active browser sessions due to continuous heavy requests being made.

WebSocket is a protocol providing full-duplex communication channels over a single TCP connection. Implementing Web-Socket between Ambari-Web and Ambari-Server will be helpful to address this scenario.

What is Apache Ambari

Apache Ambari is a software project of the Apache Software Foundation, is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. Ambari was a sub-project of Hadoop but is now a top-level project in its own right.

An Apache Hadoop cluster consists on a group or nodes/machines. Each node acts as an Ambari Agent that runs various service components (e.g. Datanode for HDFS). One of the agents acts as an Ambari Server that takes care of the task allocation, management, gathering information about the services status and other information from the agents and giving the information to Ambari UI for display. More information about Agent-Server-Web flow is available in the following sections.

Current Implementation

Ambari is a multi-module Maven project. The relevant modules for our project from an implementation perspective are Ambari Web and Ambari Server. Ambari Agent is another module, important functionally.

Ambari Agent is a component that resides on every node of the cluster. Its responsibility is to collect the node's overall health which comprises of the health metrics of all the different services running on that node. Ambari Agent sends these statuses to the Ambari Server in the form of periodic messages which is termed as 'heartbeat' in the Ambari nomenclature. In return, Ambari agents receive Heartbeat responses from Ambari Server. Hearbeat responses could contain commands for the agent to perform.

Ambari Web presents a user interface to monitor and manage a Hadoop cluster. Web module is written in JavaScript using ember.js framework. During our code study, we observed this module following the MVC design methodology. The whole motivation of this module is to make it convenient to the end-user to manage the concerned Hadoop cluster.

Ambari Server is the most central component of an Ambari ecosystem. Ambari Server exposes APIs] to its clients such as Ambari Web, which monitors health of a cluster. Ambari Server clients can query for node health statuses. Ambari Server persists the nodes' statuses in a PostgreSQL database in the backend. Internally, Ambari Server has several components which help it to manage its responsibilities. Ambari Server's important responsibilities are as follows:

1. Collect health statuses ( heartbeats ) from Ambari agents. Process their statuses and send heartbeat responses to Ambari agents.

2. Handle requests from Ambari Server's clients: One such use case is installing a new service on an Ambari agent. This request could be received from Ambari Web and it is the server's responsibility to send appropriate instructions to the particular agent to get the new service up and running on the agent.

Testing Framework

The codebase is very well supported with unit tests. Following are a few important characteristics of the testing process that we observed during our code study:

1. A popular mocking framework, Mockito is used to mock responses from dependent downstream services. An in-memory cluster is also mocked using Mockito. Mocking frameworks let us have our test cases fast and independent, which is very well depicted in this project.

2. Every controller method is tested in and out. Tests are quite extensive and cover almost all the code branches, testing each and every feature thoroughly.

3. Only the results/responses obtained from controller methods are asserted. We do not find any significant "verify() / expect()" calls. ("verify/expect" calls are used to assert if a given method was invoked during the test method execution). This is quite a good feature followed as the tests are implemented in a result-oriented way and are not bound to the method implementation in anyway. This also goes to show the stability of the tests as if the method implementation undergoes changes in future, it won't fail our tests.

Steps to build and test Ambari-WebSocket

1. Clone the repository to your local system

  git clone https://github.com/nisarg64/ambari

2. Got to ambari directory and checkout the branch "migration" using below command

  git checkout migration

3. You will need nodeJs, maven etc to build the ambari. Check the requirements as described in Ambari PreConfigs

4. After having the required configurations, run below command to build

  mvn clean package -DskipTests -Drat.ignoreErrors=true


Project Goals, Benefits and Challenges

Goals:

1. Understand architecture of Ambari-Server and Ambari-web

2. Replace the current pull-based mechanism with the push-based mechanism via Web-Socket

3. Write test cases for the Web Socket implementation to test the functionalities of WebSocket Client and WebSocket Server

Benefits:

1. Web Socket will allow Ambari-Server to perform robustly in a large cluster setup with multiple browser sessions and when continuous heavy requests are being made

Challenges:

1. Ambari server uses Jetty 8.x version, while support for Web Socket was made available after Jetty 9.x

2. Ambari has a huge codebase with multiple modules and multiple frameworks and design patterns adopted. Understanding the flow of the project and taking care of dependencies is a huge challenge

3. For testing the code requires a cluster of around 3 nodes. Running 3 virtual machines requires a high performance machine.

Learning Outcomes

1. We have observed that Ambari project adopts various design patterns like Singleton Pattern in the Data Access Object Classes

2. We also noticed in one of the classes called HeartBeatHandler, that it obeys the Law of Demeter

3. For testing, project using the Mockito framework to mock the heartbeat and cluster functionality

4. For configuration management, Puppet configuration management tool is used

5. Ambari-Web is implemented in Ember.js which is an open-source Javascript framework that follows MVC pattern similar to Rails framework.

6. Ambari-server uses Jetty which is a web-server to handle all the HTTP requests made onto the ambari-server from the UI

Github Location

Ambari Project

Forked Repository:

Forked Repo

Changes committed to forked repositry:

Code changes

Major code changes:

Addition of websocket client-sever files WebSocketClient.java and WebSocketServer.java
Changes in heartbeat handler - HeartBeatHandler.java
Version changes in POM files.

Note: We are required to submit only one pull request and that too at the end of the final project.

References

https://ambari.apache.org/

https://en.wikipedia.org/wiki/Apache_Ambari

https://cwiki.apache.org/confluence/display/AMBARI/Ambari

https://issues.apache.org/jira/secure/attachment/12559939/Ambari_Architecture.pdf