CSC517 Fall 2017 OSS M1705

From Expertiza_Wiki
Revision as of 04:28, 1 April 2017 by Nbalaji (talk | contribs) (Added changes from reviews)
Jump to navigation Jump to search

M1705 - Automatically report new contributors to all git repositories

Introduction

Servo<ref>https://en.wikipedia.org/wiki/Servo_(layout_engine)</ref> is a browser layout engine developed by Mozilla<ref>https://en.wikipedia.org/wiki/Mozilla</ref>. It is at its early stage but can easily supply i.e. contribute to Wikipedia and Github<ref>https://en.wikipedia.org/wiki/GitHub</ref> successfully passes the Acid2<ref>https://en.wikipedia.org/wiki/Acid2</ref> test. It aims to create the parallel environment with different components which can be handled by small separate tasks.

We are listing down new contributors to the servo/servo repository. We have to solve the issue of tracking the information across all the repositories in the servo organization. The final goal of this work is to build a system using the Github API to determine this information on a regular basis.


Scope - Initial Phase

The scope of the project was to complete the following steps:

  • create a GitHub organization with several repositories that can be used for manual tests
  • create a tool that initializes a JSON file with the known authors for a local git repository
  • create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)


Project Description

The initial target of our project required us to get familiar with GitHub API and JSON. We did not have an existing code base to start from. We had to make decision on what language and tools we will be using to accomplish the initial tasks. Rationale behind our choice can be found in the Design Choices subsection below


Tool # 1: JSON Author File

Create a tool that initializes a JSON file with the known authors for a local git repository

  • First, it fetches the author name from all the commits in a repository
  • Then, adds them to set. Here we are using set to avoid the duplication of authors if s/he has done more than 1 commit.
  • Saves the set in a JSON file.

Tool # 2: Clone Repositories Tool

Create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)

  • `package.json` has the dependencies needed to run both the tools.
  • `gitApi.js` - clones the all the repositories in the given organization inside `./tmp` folder
 * On line # 12 - define the GITAPI token as environment variable `GITHUB_KEY` **check references below for token**
 * On line # 13 - define the username of the git account user
 * On line # 18 - define the organization of interest from where repositories need to be cloned


git clone https://github.com/OODD-Mozilla/ToolRepository.git
cd ToolRepository
npm install  


Security Concern

In order to access and make requests to GITHUB API, a user must have proper authentication. Details about basic authentication and along with an example is provided on the this developer's manualpage. Any user can generate a personal OAuth token for his or her account from the settings page. This randomly generated token when placed inside a source file authorizes the user and allows to place API requests. Having this token exposed in a source file stored in a version control system is the same as hard coding a password in. This introduces a security risk for the user. One way to avoid this, is to generate the token and store it in a local environment variable. In this way, when the program references the environment variable, the token is accessed but its contents remain hidden when source file changes are pushed and pulled.

Please check out the references section in our README.md that provides a link to see (i) how to generate the token on GITHUB (ii) and how to add it as an environment variable.

Design Patterns

Design patterns are not directly applicable as our task. Our end goal is to identify any new contributors in all the repositories of a given GitHub organization. More details about it can be found directly on the Project Description page.

Design Choices

However, since we were not working on a existing code base, we had to make design and tool choices when we started working on the project. We picked NodeJS as the language of choice since it is a lightweight run time environment based on javascript. It also has a rich collection of third party and open source packages that can be easily be added as dependencies and managed with the npm package manager. Since we were working with the GitHub API, we used a libraries to help us to wrap the calls - nodegit and request .

We rely heavily on callbacks feature that javascript supports since many of the function calls are asynchronous. The callback() idea falls under API Patterns. Promises are another way javascript will support us when we need to manage the sequence of these asynch calls.

Test Plan

Also, there are rich testing frameworks that are available for unit testing - MOCHA and CHAI. In the initial phase, we have not explored these frameworks completely. But we have a test plan in place and very basic test cases implemented in the main class for both tools. In order to see the test plan and check how to run these, please check out the README.md file in our Tool Repository. Unit testing will be part of next phase - TDD - Test Driven Development model will be followed post-initial phase.

Conclusion

After understanding the GitHub API and the way JSON object can be used to access and post the data of the new contributors in the repository we have observed it can be automated and the steps to understand it are as shown in the article.

Next Phase

As we are familiar with GitHub APIs and JSON, we will next be creating tool to report new contributors for particular pull request.

To achieve this, we will be doing following steps:<ref>https://github.com/servo/servo/wiki/Report-new-contributors-project</ref>

  • make the initialization tool support stopping at a particular date or commit
  • create a tool that processes all of the closed pull requests for a github organization during a particular time period
    • for each pull request, get the list of commits present
    • for each commit, get the author/committer present
    • if the author is not known, add it to the list of new contributors
    • return the list of new contributors with names and links to github profiles
    • update the JSON file containing known authors
  • create unit tests mocking the github API usage validating the behaviour of the tool

References

<references/>