Latest revision as of 03:26, 29 April 2017

M1705 - Automatically report new contributors to all git repositories

Introduction

Servo is a browser layout engine developed by Mozilla. It aims to take advantage of parallelism while eliminating many security vulnerabilities.

In the weekly This Week in Servo blog post, the new contributors to the servo/servo repository are listed. We want to be able to track this information across all repositories in the servo organization. Thus, the purpose of this work is to build a system that uses the Github API to determine this information on a regular basis.

Initial Phase

Tasks

The first part of the project was to complete the following steps:

Create a GitHub organization with several repositories that can be used for manual tests
Create a tool that initializes a JSON file with the known authors for a local git repository
Create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)

More details about it can be found directly on the Project Description page.

Our Process

The initial target of our project required us to get familiar with the GitHub API and JSON format. We did not have an existing code base to start from. We had to make decisions on what language and tools would be best to accomplish the initial tasks. The rationale behind our choices can be found in the Design subsection below.

In this phase, we built two tools: Clone Tool and Initialization Tool. For more details, please see the Tools section below. We also created a test GitHub organization, OODD-Mozilla.

In the initial phase, we did not explore any testing frameworks. However, we put a test plan in place and created very basic test cases for both tools.

Second Phase

As we got familiar using GitHub APIs after first phase, we moved to subsequent steps mentioned on the wiki page<ref>https://github.com/servo/servo/wiki/Report-new-contributors-project</ref> We had three main tasks. We worked on them in parallel, since they are independent.

Task 1

First of all, we extended the Initialization Tool to support stopping at a particular date.

We used the until flag for the git log command to achieve this. <ref>https://git-scm.com/docs/git-log</ref>

Task 2

(Tool #3 - Pull Request Tool) Created a tool that processes all of the closed pull requests for a GitHub organization during a particular time period.

For each pull requests, we got the list of commits present.
For each commit, we retrieved the authors/committers present.
Each of the author/committer is checked with list of known author/committers. If they are new, add it to a list of new contributors. Return this list along with link to the author profiles.
Update the JSON file containing the known authors with the new authors.

Using the GitHub web API to get the commits has limitations. We can only get the last 30 commits present. However, for organizations like servo, the number of commits can be much higher. Hence, we first need to clone all the repositories present in an organization (using tool #1) and then fetch the list of commits (modified version tool #2) in order to build on our tool #3. The images below task 3 shows how these tools come together along with a sketch plan for the pull request tool.

Task 3

Tested the tools we have created.

Created unit tests mocking the GitHub API usage for validating the tools. Checked the Test Plan section on this page for more information.
Black box test cases can also be found in the Test section.

The Tools

Dependencies

The package.json file contains a list of all of the necessary libraries. To install the dependencies, run npm install. This will create a node_modules folder with the necessary libraries.

Libraries Used

For Tools:

fs-extra
nodegit
path
request
slash
underscore

For Testing:

mocha
chai
nock

Tool # 1: Clone Tool

Location: tools/CloneTool.js
Clones all repositories in the given organization into the specified folder
Parameters

 * folderPath - the path to the folder that will hold the repos folder, where the repositories will be cloned
 * token - the GITHUB token, required to use the GitHub API
 * organization - the organization whose repositories will be cloned

Logic

 * First, the tool removes the folder at folderPath, if it exists
 * Next, it uses the GitHub API to find all repositories for the given organization
 * Then, it clones all of the repositories into the <folderPath>/repos folder

Returns a promise that is resolved if all repositories are clones successfully, and is rejected otherwise

Tool # 2: Initialization Tool

Location: tools/InitTool.js
Creates / Updates a JSON file with the authors for the repositories in the given path
Parameters

 * folderPath - the path to the folder that holds repos, the folder with the local repositories, cloned by the CloneTool.
 * untilDate - fetches the authors who have committed until that particular date.

Logic

 * First, the tool uses the git log command to find all author names from all the commits in a repository
 * Then, it saves the authors array to authors.json in the provided folderPath

Returns a promise that is resolved if the authors are saved to authors.json, and is rejected otherwise

Tool # 3: Pull Request Tool

Location: tools/PullRequestTool.js
Gives a list of authors of closed pull requests that are not listed in authors.json
Parameters

 * folderPath - the path to the folder that has the repositories and authors
 * token - the GITHUB token, required to use the GitHub API
 * organization - the organization whose repositories will be cloned
 * sinceDate - fetches the pull request from that particular date till the present date

Logic

 * First, the tool gets all closed pull requests for each repo in an organization
 * For each pull request, the tool gets the authors for each commit
 * After all of the repos have been analyzed, the tool returns any new authors that were not in the authors file
 * Finally, the tool saves new authors to the authors file

Returns a promise that is resolved with new authors, or rejected if something goes wrong

Running the Tools

We created a driver, Main.js, to make it easier to run the tools. This runs the tools on the OODD-Mozilla organization, the test organization we created in the initial phase. It outputs the repositories and author.json file into /toolfolder.

For security reasons, the GitHub token is not hard coded. Instead, you must set the environment variable GITHUB_KEY to your token. In Git Bash, this can be done with the following command:

 export GITHUB_KEY=<your token here>

To run the tools, execute the following commands:

git clone https://github.com/OODD-Mozilla/ToolRepository.git
cd ToolRepository
npm install
node Main.js

This will print all new authors for the OODD-Mozilla repository.

Design

Design Patterns

The InitTool and PullRequestTool both require access to the author list file, and if left separate the tools would have duplicate code. To solve this, we plan on using the dependency injection design pattern (see this blog post). We will create a AuthorUtils module that will be passed to both tools when they are created. This will provide a common interface to the author file and lead to DRYer code.

Design Choices

Since we did not start with an existing code base, we had to make design and tool choices when we started working on the project. We picked NodeJS as the language of choice since it is a lightweight run time environment based on JavaScript. It also has a rich collection of third party and open source packages that can be easily be added as dependencies and managed with the npm package manager. Since we were working with the GitHub API, we used a libraries to help us to wrap the calls, including nodegit and request .

We rely heavily on JavaScript's asynchronous features to design tools with asynchronous requests. Every tool returns a JavaScript Promise which allows the user to flexibly respond when the tools are done running.

Security Concerns

In order to access and make requests to the GitHub API, a user must have proper authentication. Details about basic authentication along with an example is provided on this developer's manualpage. Any user can generate a personal OAuth token for his or her account from the settings page. This randomly generated token authorizes the user and allows her or him to place API requests. Having this token exposed in a source file stored in a version control system is the same as hard coding a password in. This introduces a security risk for the user. One way to avoid this is to store it in a local environment variable. The allows the token to be accessed but its contents remain hidden when source file changes are pushed and pulled.

Please check out the references section in our README.md that provides a link to see (i) how to generate the token on GITHUB (ii) and how to add it as an environment variable.

Testing

To test our tools we used both white-box and black-box testing. For white-box testing we used the Mocha framework, with Chai for the assertion library. We chose to use the assert style because the syntax makes the most sense to us. Additionally, we used the Nock library to mock our Github API requests. We've unit tested all of the tools, following the methodology provided by Sandi Metz in her “Magic Tricks of Testing” lecture. Our black box test plan is shown in the figure below.

1. White Box Testing - Using Mocha, Chai, and Nock.

Test suite can be found in our test/ directory. All tests are in test.js, and mock data is in mock.json.
To run the test suite, type:

 npm test

2. Black Box Testing - The tables below outline test cases for each of the tools

Test Plan

Conclusion

After understanding the GitHub API and the way JSON object can be used to access and post the data of the new contributors in the repository, we have observed it can be automated and the steps to understand it are as shown in the article. Once all the tools are written and tested, it can be easily integrated for the dev-servo organization if approved by Mozilla dev-servo community.

References

@@ Line 3: / Line 3: @@
 ==Introduction==
-Servo<ref>https://en.wikipedia.org/wiki/Servo_(layout_engine)</ref> is a browser layout engine developed by Mozilla<ref>https://en.wikipedia.org/wiki/Mozilla</ref>.
+<span class="plainlinks">[https://en.wikipedia.org/wiki/Servo_(layout_engine) Servo]</span> is a browser layout engine developed by <span class="plainlinks">[https://en.wikipedia.org/wiki/Mozilla Mozilla]</span>.
-It aims to create the parallel environment with different components which can be handled by small separate tasks.
+It aims to take advantage of parallelism while eliminating many security vulnerabilities.
-In the weekly This Week in Servo<ref>https://blog.servo.org/</ref> blog post, the new contributors to the servo/servo repository are listed. We want to be able to track this information across all repositories in the servo organization. Thus, the purpose of this work is to build a system that uses the Github API to determine this information on a regular basis.
+In the weekly ''This Week in <span class="plainlinks">[https://en.wikipedia.org/wiki/Servo_(layout_engine) Servo]</span>'' blog post, the new contributors to the servo/servo repository are listed. We want to be able to track this information across all repositories in the servo organization. Thus, the purpose of this work is to build a system that uses the Github API to determine this information on a regular basis.
-==Scope - Initial Phase ==
+==Initial Phase ==
-'''High Level Overview'''
+===Tasks===
-The scope of the project was to complete the following steps:
+The first part of the project was to complete the following steps:
-* create a GitHub organization with several repositories that can be used for manual tests
+* Create a GitHub organization with several repositories that can be used for manual tests
-* create a tool that initializes a JSON file with the known authors for a local git repository
+* Create a tool that initializes a JSON file with the known authors for a local git repository
-* create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)
+* Create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)
-More details about it can be found directly on  the [https://github.com/OODD-Mozilla/ToolRepository/blob/master/Wiki.md#project-description Project Description] page.
+More details about it can be found directly on the [https://github.com/OODD-Mozilla/ToolRepository/blob/master/Wiki.md#project-description Project Description] page.
-'''Detailed Overview'''
+===Our Process===
-The initial target of our project required us to get familiar with GitHub API and JSON. We did not have an existing code base to start from.
+The initial target of our project required us to get familiar with the GitHub API and JSON format. We did not have an existing code base to start from.
-We had to make decision on what language and tools we will be using to accomplish the initial tasks. Rationale behind our choice can be found in the [[#Design|Design]] subsection below
+We had to make decisions on what language and tools would be best to accomplish the initial tasks. The rationale behind our choices can be found in the [[#Design|Design]] subsection below.
-Tool # 1: JSON Author File
+In this phase, we built two tools: Clone Tool and Initialization Tool. For more details, please see the ''Tools'' section below. We also created a test GitHub organization, [https://github.com/OODD-Mozilla OODD-Mozilla].
-Create a tool that initializes a JSON file with the known authors for a local git repository
+In the initial phase, we did not explore any testing frameworks. However, we put a test plan in place and created very basic test cases for both tools.
-* First, it fetches the author name from all the commits in a repository
-* Then, adds them to set. Here we are using set to avoid the duplication of authors if s/he has done more than 1 commit.
-* Saves the set in a JSON file.
-Tool # 2: Clone Repositories Tool
+==Second Phase==
+As we got familiar using GitHub APIs after first phase, we moved to subsequent steps mentioned on the [https://github.com/servo/servo/wiki/Report-new-contributors-projectgithub wiki page]<ref>https://github.com/servo/servo/wiki/Report-new-contributors-project</ref>
+We had three main tasks. We worked on them in parallel, since they are independent.
-Create a tool that clones every git repository in a given GitHub organization (use the GitHub API to retrieve this information)
+===Task 1===
-* `package.json` has the dependencies needed to run both the tools.
-* `gitApi.js` - clones the all the repositories in the given organization inside `./tmp` folder
-  * On line # 12 - define the GITAPI token as environment variable `GITHUB_KEY` **check references below for token**
-  * On line # 13 - define the username of the git account user
-  * On line # 18 - define the organization of interest from where repositories need to be cloned
-[[File:capture.png|Code snippet from gitApi.js]]
+First of all, we extended the Initialization Tool to support stopping at a particular date.
+* We used the ''until'' flag for the ''git log'' command to achieve this. <ref>https://git-scm.com/docs/git-log</ref>
- git clone https://github.com/OODD-Mozilla/ToolRepository.git
+===Task 2===
- cd ToolRepository
- npm install
+(Tool #3 - Pull Request Tool) Created a tool that processes all of the closed pull requests for a GitHub organization during a particular time period.
+* For each pull requests, we got the list of commits present.
+* For each commit, we retrieved the authors/committers present.
+* Each of the author/committer is checked with list of known author/committers. If they are new, add it to a list of new contributors. Return this list along with link to the author profiles.
+* Update the JSON file containing the known authors with the new authors.
+Using the GitHub web API to get the commits has limitations. We can only get the last 30 commits present. However, for organizations like servo, the number of commits can be much higher. Hence, we first need to clone all the repositories present in an organization (using tool #1) and then fetch the list of commits (modified version tool #2) in order to build on our tool #3. The images below task 3 shows how these tools come together along with a sketch plan for the pull request tool.
+===Task 3===
+Tested the tools we have created.
+* Created unit tests mocking the GitHub API usage for validating the tools. Checked the [[#Test Plan|Test Plan]] section on this page for more information.
+* Black box test cases can also be found in the ''Test'' section.
+==The Tools==
+===Dependencies===
+The ''package.json'' file contains a list of all of the necessary libraries. To install the dependencies, run ''npm install''. This will create a ''node_modules'' folder with the necessary libraries.
+'''Libraries Used'''
+For Tools:
+* fs-extra
+* nodegit
+* path
+* request
+* slash
+* underscore
-==Scope: Post-initial Phase==
+For Testing:
-As we are familiar using GitHub APIs now, we are ready to move to subsequent steps mentioned on the [https://github.com/servo/servo/wiki/Report-new-contributors-projectgithub wiki page]<ref>https://github.com/servo/servo/wiki/Report-new-contributors-project</ref>
+* mocha
-We have three main tasks. We will be working between the tasks - they are not necessarily done sequentially, especially testing task.
+* chai
+* nock
-'''Task 1'''
+===Tool # 1: Clone Tool===
+* Location: tools/CloneTool.js
+* Clones all repositories in the given organization into the specified folder
+* Parameters
+  * folderPath - the path to the folder that will hold the ''repos'' folder, where the repositories will be cloned
+  * token - the GITHUB token, required to use the GitHub API
+  * organization - the organization whose repositories will be cloned
+* Logic
+  * First, the tool removes the folder at folderPath, if it exists
+  * Next, it uses the GitHub API to find all repositories for the given organization
+  * Then, it clones all of the repositories into the ''<folderPath>/repos'' folder
+* Returns a promise that is resolved if all repositories are clones successfully, and is rejected otherwise
-First of all, we will fetch all of the closed pull requests for an organization during a time period.
+===Tool # 2: Initialization Tool===
-* We will be looking through a repository's logs in order to get this information. <ref>http://www.commandlinefu.com/commands/view/4519/list-all-authors-of-a-particular-git-project</ref>
+* Location: tools/InitTool.js
+* Creates / Updates a JSON file with the authors for the repositories in the given path
+* Parameters
+  * folderPath - the path to the folder that holds ''repos'', the folder with the local repositories, cloned by the CloneTool.
+  * untilDate - fetches the authors who have committed until that particular date.
+* Logic
+  * First, the tool uses the ''git log'' command to find all author names from all the commits in a repository
+  * Then, it saves the authors array to authors.json in the provided folderPath
+* Returns a promise that is resolved if the authors are saved to authors.json, and is rejected otherwise
-'''Task 2'''
+===Tool # 3: Pull Request Tool===
+* Location: tools/PullRequestTool.js
+* Gives a list of authors of closed pull requests that are not listed in authors.json
+* Parameters
+  * folderPath - the path to the folder that has the repositories and authors
+  * token - the GITHUB token, required to use the GitHub API
+  * organization - the organization whose repositories will be cloned
+  * sinceDate - fetches the pull request from that particular date till the present date
+* Logic
+  * First, the tool gets all closed pull requests for each repo in an organization
+  * For each pull request, the tool gets the authors for each commit
+  * After all of the repos have been analyzed, the tool returns any new authors that were not in the authors file
+  * Finally, the tool saves new authors to the authors file
+* Returns a promise that is resolved with new authors, or rejected if something goes wrong
-(Tool #3 - Pull Request Tool) Create a tool that processes all of the closed pull requests for a github organization during a particular time period
+==Running the Tools==
-* For each pull requests, we will get the list of commits present.
-* For each commit, we will retrieve the author/committers present.
-* Each of the author/committer is checked with list of known author/committers. If they are new, add it to a list of new contributors. Return this list along with link to the author profiles.
-* Update the JSON file containing the known authors with the new authors.
-Using the GitHub web API to get the commits has limitations. We can only get the last 30 commits present. However, for organizations like servo, the number of commits can be much higher. Hence, we first need to clone all the repositories present in an organization (using tool #1) and then fetch the list of commits (modified version tool #2) in order to build on our tool #3. The images below task 3 show how these tools come together along with a sketch plan for the pull request tool.
+We created a driver, Main.js,  to make it easier to run the tools. This runs the tools on the OODD-Mozilla organization, the test organization we created in the initial phase. It outputs the repositories and author.json file into ''/toolfolder''.
-'''Task 3'''
+For security reasons, the GitHub token is not hard coded. Instead, you must set the environment variable GITHUB_KEY to your token. In Git Bash, this can be done with the following command:
+  export GITHUB_KEY=<your token here>
-Test the tools we have created.
+To run the tools, execute the following commands:
-* Create unit tests mocking the github API usage for validating the tools. Check the [[#Test Plan|Test Plan]] section on this page
+ git clone https://github.com/OODD-Mozilla/ToolRepository.git
-* Black box test cases can also be found above.
+ cd ToolRepository
+ npm install
+ node Main.js
-[[File:Initial.PNG]] [[File:PostInitial.PNG]]
+This will print all new authors for the OODD-Mozilla repository.
 ==Design==
+[[File:Done.jpg]]
 ===Design Patterns===
-One opportunity we came across is that the AuthorTool and PullRequestTool both require access to the author list file, and if left separate the tools would have duplicate code. To solve this, we plan on using the dependency injection design pattern ([https://blog.risingstack.com/dependency-injection-in-node-js/ see this blog post]). We will create a AuthorListUtils module that will be passed to both tools when they are created. This will provide a common interface to the author file and lead to DRYer code.
+The InitTool and PullRequestTool both require access to the author list file, and if left separate the tools would have duplicate code. To solve this, we plan on using the dependency injection design pattern ([https://blog.risingstack.com/dependency-injection-in-node-js/ see this blog post]). We will create a AuthorUtils module that will be passed to both tools when they are created. This will provide a common interface to the author file and lead to DRYer code.
 ===Design Choices===
-Since we were not working on a existing code base, we had to make design and tool choices when we started working on the project. We picked NodeJS as the language of choice since it is a lightweight run time environment based on javascript. It also has a rich collection of third party and open source packages that can be easily be added as dependencies and managed with the npm package manager. Since we were working with the GitHub API, we used a libraries to help us to wrap the calls - '''nodegit''' and '''request''' .
+Since we did not start with an existing code base, we had to make design and tool choices when we started working on the project. We picked NodeJS as the language of choice since it is a lightweight run time environment based on JavaScript. It also has a rich collection of third party and open source packages that can be easily be added as dependencies and managed with the npm package manager. Since we were working with the GitHub API, we used a libraries to help us to wrap the calls, including '''nodegit''' and '''request''' .
-We rely heavily on callbacks feature that javascript supports since many of the function calls are asynchronous. The callback() idea falls under API Patterns. Promises are another way javascript will support us when we need to manage the sequence of these asynch calls.
+We rely heavily on JavaScript's asynchronous features to design tools with asynchronous requests. Every tool returns a JavaScript Promise which allows the user to flexibly respond when the tools are done running.
 ===Security Concerns===
-In order to access and make requests to GITHUB API, a user must have proper authentication. Details about basic authentication and along with an example is provided on the [https://developer.github.com/guides/basics-of-authentication/#accepting-user-authorization this developer's manual]page. Any user can generate a personal OAuth token for his or her account from the settings page. This randomly generated token when placed inside a source file authorizes the user and allows to place API requests. Having this token exposed in a source file stored in a version control system is the same as hard coding a password in. This introduces a security risk for the user. One way to avoid this, is to generate the token and store it in a local environment variable. In this way, when the program references the environment variable, the token is accessed but its contents remain hidden when source file changes are pushed and pulled.
+In order to access and make requests to the GitHub API, a user must have proper authentication. Details about basic authentication along with an example is provided on [https://developer.github.com/guides/basics-of-authentication/#accepting-user-authorization this developer's manual]page. Any user can generate a personal OAuth token for his or her account from the settings page. This randomly generated token authorizes the user and allows her or him to place API requests. Having this token exposed in a source file stored in a version control system is the same as hard coding a password in. This introduces a security risk for the user. One way to avoid this is to store it in a local environment variable. The allows the token to be accessed but its contents remain hidden when source file changes are pushed and pulled.
-Please check out the '''references section''' in our [https://github.com/OODD-Mozilla/ToolRepository README.md]  that provides a link to see (i) how to generate the token on GITHUB (ii) and how to add it as an environment variable.
+Please check out the '''references section''' in our [https://github.com/OODD-Mozilla/ToolRepository README.md] that provides a link to see (i) how to generate the token on GITHUB (ii) and how to add it as an environment variable.
-== Test Plan ==
+== Testing ==
-'''Initial Phase'''
+To test our tools we used both white-box and black-box testing. For white-box testing we used the Mocha framework, with Chai for the assertion library. We chose to use the ''assert'' style because the syntax makes the most sense to us. Additionally, we used the Nock library to mock our Github API requests. We've unit tested all of the tools, following the methodology provided by Sandi Metz in her “Magic Tricks of Testing” lecture. Our black box test plan is shown in the figure below.
-In the initial phase, we have not explored any testing frameworks completely. But we have a test plan in place and very basic test cases implemented in the main class for both tools. In order to see the test plan and check how to run these, please check out the README.md file in our [https://github.com/OODD-Mozilla/ToolRepository Tool Repository]. Unit testing will be part of next phase - TDD - Test Driven Development model will be followed post-initial phase.
+. White Box Testing - Using Mocha, Chai, and Nock.
+* Test suite can be found in our test/ directory. All tests are in test.js, and mock data is in mock.json.
+* To run the test suite, type:
+  npm test
-'''Post-initial Phase'''
-To test our tools we will be using both white-box and black-box testing. For white-box testing we will be using the Mocha framework, with Chai for the assertion library. Additionally, we will use the Nock library to mock our Github API requests. We intend to unit test all of our code, following the methodology provided by Sandi Metz in her “Magic Tricks of Testing” lecture. Our black box test plan is shown in Figure 1 below.
-. White Box Testing - Using Mocha, Chai, and Nock. Test suite can be found in our repo ./test directory
 . Black Box Testing - The tables below outline test cases for each of the tools
+=== Test Plan ===
-[[File:Table1.png]]
+[[File:bbtest.png]]
-[[File:Table2.PNG]]
-[[File:Table3.PNG]]
 ==Conclusion==

CSC517 Fall 2017 OSS M1705: Difference between revisions

Latest revision as of 03:26, 29 April 2017

Contents

M1705 - Automatically report new contributors to all git repositories

Introduction

Initial Phase

Tasks

Our Process

Second Phase

Task 1

Task 2

Task 3

The Tools

Dependencies

Tool # 1: Clone Tool

Tool # 2: Initialization Tool

Tool # 3: Pull Request Tool

Running the Tools

Design

Design Patterns

Design Choices

Security Concerns

Testing

Test Plan

Conclusion

References

Navigation menu

CSC517 Fall 2017 OSS M1705: Difference between revisions

Latest revision as of 03:26, 29 April 2017

M1705 - Automatically report new contributors to all git repositories

Introduction

Initial Phase

Tasks

Our Process

Second Phase

Task 1

Task 2

Task 3

The Tools

Dependencies

Tool # 1: Clone Tool

Tool # 2: Initialization Tool

Tool # 3: Pull Request Tool

Running the Tools

Design

Design Patterns

Design Choices

Security Concerns

Testing

Test Plan

Conclusion

References

Navigation menu

Search