CSC/ECE 517 Spring 2017/finalproject E1744: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(27 intermediate revisions by 4 users not shown)
Line 7: Line 7:


===Description===
===Description===
We will add a new feature to provide Expertiza with Github metrics (for example, number of committers, number of commits, number of lines of code modified, number of lines added, number of lines deleted.) from each group’s submitted repo link.  This information should prove useful for differentiating the performance of team members for grading purposes. It may also help instructors to predict which projects are likely to be merged.
We will add a new feature to provide Expertiza with Github metrics (for example, number of committers, number of commits, number of lines of code modified, number of lines added, number of lines deleted.) from each group’s submitted repo link.  This information should prove useful for differentiating the performance of team members for grading purposes. It may also help instructors to predict which projects are likely to be accepted/rejected (even before the final due dates).


===Work to be done===
===Work to be done===
This project is divided into two parts. One is to extract Github metadata of the submitted repos and pull requests. The second part is to build a classifier (e.g., Bayesian) to do the early prediction on some projects that are likely to fail. This prediction is based on more than 200 past projects. The features above should be used, together with some temporal features (e.g. the temporal pattern of this team’s commits so far).  Eventually, we would like to e-mail students whose metrics are bad, giving them advice on how to improve.
This project is divided into two parts. One is to extract Github metadata of the submitted repos and pull requests. The second part to be built at a later time is to build a classifier (e.g., Bayesian) to do the early prediction on some projects that are likely to fail. This prediction is based on more than 200 past projects (Fall 2012- Fall 2016).  Based on the meta-data from students repos/pull requests, we can warn both authors and teaching staff if our model predicts that some projects are likely to fail.


The purpose of this project is to add a means to monitor the individual contributions of various team members throughout the duration of project in order to quantitatively access their work. This will aid the teaching staff and team members during the review process as well as improve visibility to a student of the work he or she has committed.
The methodology of this project is to add a means to monitor the individual contributions of various team members throughout the duration of project in order to quantitatively access their work. This will aid the teaching staff and team members during the review process as well as improve visibility to a student of the work he or she has committed. When an instructor goes to the submission records page for particular team on a project, a link will be added below each hyperlink called "View Github Metrics" in order to request the metrics from Github on demand.


====Extract Github metadata====
[[File:Submission Record.png|frame|center|30px|30px]]


=====Use Cases=====


=====Data Flow=====


The code should sync the data with Github whenever someone (student or instructor) looks at a view that shows Github data.
If the link is not a valid github page the controller will return a "No Results Found" page.
[[File:Invalid Github link.png|frame|center|30px|30px]]
 
 
 
If the link is valid it will pull data from Github using the API described below and show the lines added, lines updated, and lines deleted.
[[File:Github metrics.png|frame|center|30px|30px]]
 
 
=====Extract Github metadata=====
* First, get an access token from github. Here are the [https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/ steps]
* Save the access token in the environment variable 'EXPERTIZA_GITHUB_TOKEN'
* Now, github data is fetched from github's [https://developer.github.com/v3/repos/statistics/ Statistic API]


=====Architectural Design=====
=====Architectural Design=====
Line 26: Line 36:


=====UML=====
=====UML=====
[[File:1744 design.png|frame|center]]
[[File:1744 design.png|frame|center|30px|30px]]


=====Database Schema Changes=====
=====Database Schema Changes=====
[[File:db_github_schema.png|frame|center]]
[[File:1744_db_schema.png|frame|center|30px|30px]]


A new table called '''github_contributors''' is created to store the data for each committer. The table contain's the committer's email, github_id and all the metrics associated with a project. At the moment we handle the following metrics:
A new table called '''github_contributors''' is created to store the data for each committer. The table contain's the committer's email, github_id and all the metrics associated with a project. At the moment we handle the following metrics:
Line 40: Line 50:
* Lines of code removed - lines_removed
* Lines of code removed - lines_removed
* Lines of code added that survived until final submission - lines_persisted.
* Lines of code added that survived until final submission - lines_persisted.
* submission_record_id - Foreign Key to '''submission_records''' table.


An index on committer_id is added to enable search.
An index on committer_id is added to enable search.


A new table called '''submission_records_github_contributors''' which acts as a reference between the '''submission_records''' and '''github_contributors''' tables. It has two columns:
====Test Plan====
* github_contributor_id - Foreign Key to '''github_contributors''' table.
* submission_record_id - Foreign Key to '''submission_records''' table.


A composite unique key constraint is added on ''github_contributor_id'' and ''submission_record_id''.
The tests will use rspec to validate the unit testing of the system by testing the github contributor controller. To run the rspec test, from the top expertiza directory execute the following command "rspec spec/controllers/github_contributors_controller_spec.rb" to run the four unit tests.


=====Test Plan=====
=====Unit Tests=====


The tests will use stubs to simulate the number of commits from github and displaying the various metrics. These tests will be test the data in new tables '''submission_records_github_contributors''' and '''github_contributors''' as well as expand the existing tests on '''submission_records'''.
{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Method
! Parameter
! Expected result
|-
| show
| submission_id with valid github hyperlink
| Github Metrics and status code 200
|-
| show
| submission_id with file uploaded
| render 'github_contributors/not_found'
|-
| show
| submission_id with non github hyperlink
| render 'github_contributors/not_found'
|-
| show
| submission_id with private github hyperlink
| render 'github_contributors/not_found'
|-
|}


====Build a classifier====
====Build a classifier====
THIS WILL NOT BE IMPLEMENTED AS PART OF THIS PROJECT.  This is future work to be done.
THIS WILL NOT BE IMPLEMENTED AS PART OF THIS PROJECT.  This is future work to be done.

Latest revision as of 02:00, 29 April 2017

CSC517 Final Project - E1744 Github Metrics

(asorgiu, george2, mdunlap, ygou14)


Proposed Design Document

Description

We will add a new feature to provide Expertiza with Github metrics (for example, number of committers, number of commits, number of lines of code modified, number of lines added, number of lines deleted.) from each group’s submitted repo link. This information should prove useful for differentiating the performance of team members for grading purposes. It may also help instructors to predict which projects are likely to be accepted/rejected (even before the final due dates).

Work to be done

This project is divided into two parts. One is to extract Github metadata of the submitted repos and pull requests. The second part to be built at a later time is to build a classifier (e.g., Bayesian) to do the early prediction on some projects that are likely to fail. This prediction is based on more than 200 past projects (Fall 2012- Fall 2016). Based on the meta-data from students repos/pull requests, we can warn both authors and teaching staff if our model predicts that some projects are likely to fail.

The methodology of this project is to add a means to monitor the individual contributions of various team members throughout the duration of project in order to quantitatively access their work. This will aid the teaching staff and team members during the review process as well as improve visibility to a student of the work he or she has committed. When an instructor goes to the submission records page for particular team on a project, a link will be added below each hyperlink called "View Github Metrics" in order to request the metrics from Github on demand.


If the link is not a valid github page the controller will return a "No Results Found" page.


If the link is valid it will pull data from Github using the API described below and show the lines added, lines updated, and lines deleted.


Extract Github metadata
  • First, get an access token from github. Here are the steps
  • Save the access token in the environment variable 'EXPERTIZA_GITHUB_TOKEN'
  • Now, github data is fetched from github's Statistic API
Architectural Design

This feature has similar functionality with a web crawler, which is crawling the data from a server and store locally. So that for the architectural style of our subsystem, we would like to choose client/server style, which segregates the system into two applications, where the client makes requests to the server whenever a user is looking for the metrics. In many cases, the server is a database with application logic represented as stored procedures, in our case, is Github.

UML
Database Schema Changes

A new table called github_contributors is created to store the data for each committer. The table contain's the committer's email, github_id and all the metrics associated with a project. At the moment we handle the following metrics:

  • Committer email - commiter_url
  • Committer id - commiter_id
  • Total number of commits - total_commits
  • Number of files changed - files_changed
  • Lines of code changed - lines_changed
  • Lines of code added - lines_added
  • Lines of code removed - lines_removed
  • Lines of code added that survived until final submission - lines_persisted.
  • submission_record_id - Foreign Key to submission_records table.

An index on committer_id is added to enable search.

Test Plan

The tests will use rspec to validate the unit testing of the system by testing the github contributor controller. To run the rspec test, from the top expertiza directory execute the following command "rspec spec/controllers/github_contributors_controller_spec.rb" to run the four unit tests.

Unit Tests
Unit Test Summary
Method Parameter Expected result
show submission_id with valid github hyperlink Github Metrics and status code 200
show submission_id with file uploaded render 'github_contributors/not_found'
show submission_id with non github hyperlink render 'github_contributors/not_found'
show submission_id with private github hyperlink render 'github_contributors/not_found'

Build a classifier

THIS WILL NOT BE IMPLEMENTED AS PART OF THIS PROJECT. This is future work to be done.