CSC/ECE 517 Fall 2015 A1550 Web Socket Implementation in Apache Ambari

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

Apache Ambari is a software project of the Apache Software Foundation, aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. Ambari was a sub-project of Hadoop but is now a top-level project in its own right.

An Apache Hadoop cluster consists on a group or nodes/machines. Each node acts as an Ambari Agent that runs various service components (e.g. DataNode for HDFS). One of the agents acts as an Ambari Server that takes care of the task allocation, management, gathering information about the services status and other information from the agents and giving the information to Ambari UI for display. More information about Agent-Server-Web flow is available in the following sections.

Ambari-Web uses simple ajax polling mechanism to fetch data from Ambari-Server. Constant polling is done to show current service status, alerts, service graphs, etc on Ambari-Web. With this mechanism, the performance of Ambari-Server can be affected on a large size cluster with multiple active browser sessions due to continuous heavy requests being made.

WebSocket is a protocol providing full-duplex communication channels over a single TCP connection. Implementing Web-Socket between Ambari-Web and Ambari-Server will be helpful to address this scenario.

Project Description

Note: This is continuation of our OSS project

Project Goals

1. Understand architecture of Ambari-Server and Ambari-Web

2. Replace the current polling mechanism with the push-based mechanism via Web-Socket

3. Write test cases for the Web Socket implementation to test the functionalities of WebSocket Client and WebSocket Server

Project Design

Ambari is a multi-module Maven project. The relevant modules for our project from an implementation perspective are Ambari Web and Ambari Server. Ambari Agent is another module, important functionally.

Major Components

Ambari Agent is a component that resides on every node of the cluster. Its responsibility is to collect the node's overall health which comprises of the health metrics of all the different services running on that node. Ambari Agent sends these statuses to the Ambari Server in the form of periodic messages which is termed as 'Heartbeat' in the Ambari nomenclature. In return, Ambari agents receive Heartbeat responses from Ambari Server. Hearbeat responses could contain commands for the agent to perform.

Ambari Web presents a user interface to monitor and manage a Hadoop cluster. Web module is written in JavaScript using Ember.js framework. During our code study, we observed this module following the MVC design methodology. The whole motivation of this module is to make it convenient to the end-user to manage the concerned Hadoop cluster.

Ambari Server is the most central component of an Ambari ecosystem. Ambari Server exposes APIs] to its clients such as Ambari Web, which monitors health of a cluster. Ambari Server clients can query for node health statuses. Ambari Server persists the nodes' statuses in a PostgreSQL database in the backend. Internally, Ambari Server has several components which help it to manage its responsibilities. Ambari Server's important responsibilities are as follows:

1. Collect health statuses ( Heartbeats ) from Ambari agents. Process their statuses and send Heartbeat responses to Ambari agents.

2. Handle requests from Ambari Server's clients: One such use case is installing a new service on an Ambari agent. This request could be received from Ambari Web and it is the server's responsibility to send appropriate instructions to the particular agent to get the new service up and running on the agent.

Architectural design of WebSocket client-server in Apache Ambari

Flow Diagram

Implementation

Design Pattern

We have observed that Ambari project adopts various design patterns like Singleton Pattern in the Data Access Object Classes. For this project, we will be using request-response, one of the Messaging Pattern.

Request–response is a message exchange pattern in which a requestor sends a request message to a replier system which receives and processes the request, ultimately returning a message in response. This is a simple, but powerful messaging pattern which allows two applications to have a two-way conversation with one another over a channel. In our project, Ambari-Server's HeartBeat handler acts as a requestor which sends the message to Ambari UI via WebSocket protocol. In response, Ambari UI acknowledges the received information and sends new requests if any to the Ambari-Server. Moreover, same messaging pattern is followed for bidirectional communication between Ambari-Agent and Ambari-Server.

UML Diagrams

Class Diagram

1. New class is added for both Web Socket client and Server.

2. One method is added in the existing class "HeartBeatHandler"

WebSocketClient

1. onOpen: An event listener to be called when connection to server is established.

2. onClose: An event listener to be called when connection to server is closed.

3. onText: An event listener to be called when message is received from server.

4. getLatch:

Method to get the CountDownLatch.

5. sendMessage: Method to send message to server

WebSocketServer

1. onOpen: An event listener to be called when connection has been opened.

2. onClose: An event listener to be called when connection has been closed

3. onMessage: An event listener to be called when message is received .

HeartBeatHandler

1. sendMessage: Method to instantiate a WebSocketClient to send message to server.

Testing Framework

A popular mocking framework, Mockito is used to mock responses from dependent downstream services. A whole prototype of an in-memory cluster is also mocked using Mockito. Mocking frameworks let us have our test cases fast and independent, which is very well depicted in this project.

Only the results/responses obtained from controller methods are asserted. We do not find any significant "verify / expect" calls. ("verify/expect" calls are used to assert if a given method was invoked during the test method execution). Hence, tests are implemented in a result-oriented way and are not bound to the method implementation in anyway.

Test cases

Following is a brief overview of the test cases that we look to add in the Ambari project to test out our functionality:

1. Testing if the websocket response to Ambari Web is triggered when a Heartbeat is received and persisted in the backend PostgreSQL DB.
2. Mocking an agent Heartbeat and asserting that the same status is reflected on the front-end Ambari Web.
3. Verifying the correctness of websocket behavior for agent Heartbeats received with different contents. We could have Heartbeats having a variety of data such as node status, component's status on a node, volume of requests processed over a past time period on a node etc.

Steps to build and test Ambari-WebSocket

1. Clone the repository to your local system

  git clone https://github.com/nisarg64/ambari

2. Got to ambari directory and checkout the branch "migration" using below command

  git checkout migration

3. You will need nodeJs, maven etc to build the ambari. Check the requirements as described in Ambari PreConfigs

4. After having the required configurations, run below command to build

  mvn clean package -DskipTests -Drat.ignoreErrors=true


Project Benefits and Challenges

Benefits:

  1. Web Socket will allow Ambari-Server to perform robustly in a large cluster setup with multiple browser sessions and when continuous heavy requests are being made
  2. This project also involves a migration of the current Ambari infrastructure to Jetty 9. Following are the benefits of this upgradation:
    1. Inherent support for WebSocket protocol ( Jetty 8 only provides interfaces for Websocket implementation )
    2. Jetty 9 implements multi protocol, TLS, push and multiplexing features of SPDY protocol in an efficient way. SPDY is an improvement in the network transport layer and it can greatly improve page load times without making any changes at all to a web application.

Challenges:

  1. Ambari server uses Jetty 8.x version, while support for Web Socket was made available after Jetty 9.x
  2. Ambari has a huge codebase with multiple modules and multiple frameworks and design patterns adopted. Understanding the flow of the project and taking care of dependencies is a huge challenge.
  3. Testing the code requires a cluster of around 3 nodes. Running 3 virtual machines requires a high performance machine.
  4. Following are the challenges that we encountered while upgrading to Jetty 9 :
    1. Conflicts in maven packages : Resolution : A dependency analysis to check the Jetty 8 packages that need to removed and the new Jetty 9 packages that need to be included. This also accounted for an upgrade of the Java servlet API version itself from v3.0.1 to v3.1.0
    2. Several Jetty 8 APIs deprecated in Jetty 9, which were used in the Ambari implementation.
    3. Changes in API from Jetty 8 to Jetty 9 : There were many changes in the way a Jetty 9 server needs to be configured as compared to Jetty 8. Some major changes are listed as follows:
      1. Different types of server connectors abstracted to a single ServerConnector class in Jetty 9
      2. Server context factories in Jetty 9 needed an HttpConfiguration instance attached to it. Previously, all of this functionality was available in the server context factory classes themselves.
    4. Issue with rendering the front end : This was due to a different jsp engine that we changed in the Jetty 9 version. In the implementation, the web content was stored in the compressed form. Resolution : We changed the Ambari Web pom.xml to not compress the web content.

Learning Outcomes

  1. We also noticed in one of the classes called HeartBeatHandler, that it obeys the Law of Demeter.
  2. For testing, project using the Mockito framework to mock the heartbeat and cluster functionality.
  3. For configuration management, Puppet configuration management tool is used.
  4. Ambari-Web is implemented in Ember.js which is an open-source Javascript framework that follows MVC pattern similar to Rails framework.
  5. Ambari-server uses Jetty which is a web-server to handle all the HTTP requests made onto the ambari-server from the UI.

Next Steps

  1. Making the Jetty threadpool dynamic  : Jetty 8 threadpool is configured dynamically. Jetty 8 APIs that allowed for this functionality are now deprecated in Jetty 9.
  2. Adding WebSocket classes in Ambari  : We developed WebSocket classes that were successfully deployed on Jetty 9 ( as a part of our OSS project). We need to add WebSocket classes once the migration patch is successfully applied to the latest Ambari trunk
  3. Addressing the review comments

Github Location

Ambari Project

Forked Repository:

Forked Repo

Screencast Link

Project Walkthrough

References

https://ambari.apache.org/

https://en.wikipedia.org/wiki/Apache_Ambari

https://cwiki.apache.org/confluence/display/AMBARI/Ambari

https://issues.apache.org/jira/secure/attachment/12559939/Ambari_Architecture.pdf