CSC/ECE 517 Fall 2018- Project E1858. Github metrics integration

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

Problem Statement

Expertiza provides Teammate Reviews under View Scores functionality for each assignment. Purpose of this project is to augment existing assignment submissions with data that can give a more realistic view of the work contribution of every team member using external tools like GitHub. This external data may include: number of commits, number of lines of code modified, number of lines added, number of lines deleted from each group’s submitted repository link from GitHub.

1. Teammate Reviews functionality in the View Scores page gauges teammate views on how much other team members contributed to the project. We need to augment this data with data from external tools like GitHub in order to validate that feedback. New metrics will be appended under each student data under the same functionality.

While this data will not have marks associated directly, it will prove useful to the instructor in differentiating the performance of team members and hence awarding marks as per contribution. Overall data for the team, like the number of committers and number of commits may also help instructors to predict which projects are likely to be merged.

Current Scenario

At present, group assignments are submitted as a single submission that shows work done by the team as a whole. This does not show work contribution per teammate.


Teammate review shows peer review amongst teammates. Currently, however, there is no way to validate and verify these reviews.


Checking commits performed by each team member on GitHub is a solution, but that is inefficient from instructor's/reviewer's perspective as there are many assignments, submissions, and tight deadlines.

Proposed Solution Design

Design Considerations

  • The first thing was to determine what metrics we are looking for exactly. These are what the requirements document suggested:
    1. Total number of commits.
    2. Number of files changed.
    3. Lines of Code added
    4. Lines of code modified.
    5. Lines of code deleted.
    6. LOCs that survived until final submission - (exclude from MVP due to the complexity and lower priority).
We are assuming these metrics are needed on a per-user basis.
The requirement document expects a view to be created for viewing the metrics under a new tab “GitHub Metrics” under “View Submissions” in the instructor view.
A line chart of the number of lines vs time needs to be included in the view.
  • The next thing was to narrow down what hosting service for version control we would use. For now, the plan is to only support GitHub integration due to its popularity, ease-of-use and API documentation. Future projects could add in support for Gitlab and others, though it is far easier to just require that all students use GitHub.
    1. The main impact of this change will be that all submission repositories need to be made public as we need access to pull in the data.
    2. We also considered whether to ask students for GitHub repository link separately(changes to views) or to parse all the uploaded links and determine the correct one (extra logic, students uploading multiple links or not giving links at all). We have decided to go with parsing the links as giving the link to PR is anyway mandatory.
  • An important question was whether we needed to store metric information in our own db at all.
    1. An older investigation came up with this schema, but this would likely cause issues with stale information and would have been difficult to maintain.
    2. Having a db was redundant as every time a user wants to see the metrics, we would need to sync the db with GitHub and then show the results. So we end up hitting GitHub API anyway.
    3. An alternative to the above approach was to take snapshots of metrics and store them in the db right on the submission deadline of projects. This would allow for fairer grading by making sure we pull in data at the correct moment. Unfortunately, doing this for so many projects would put a lot of load on the server. Also, for open source projects, this would mean that we don’t have the latest data to work with(people will keep committing past the deadline). Thus, this approach might have been good for grading purposes but wouldn't have helped with determining the current status of a project.
    4. We have decided against using our own tables for this data and will be getting the GitHub data on-demand directly using the GitHub API.
    5. All that said, we will maintain some meta-data in our db around what metrics we are retrieving(like LOCs, commiters, frequencies etc.)
  • We also considered if we needed to account for different branches. The plan is to only consider the master branch.
  • A suggestion was also to make sure that there isn’t a lot of padding in the tables we show. We will be keeping this in mind when we implement the views.
  • With respect to showing GitHub metrics in the View scores page, it would have been very difficult to map Expertiza users and their names to public GitHub profiles as students may use a different name. So instead of appending GitHub data to Teammate reviews table, we will be showing a new table below it to display the metrics. This will allow the instructor full view of how teammate rated each other and how that maps to factual information from GitHub.
  • The instructors will need to spell out exact guidelines for committing to the project's repositories(like everyone should commit their own code, keep the master as PR branch, commit regularly, be mindful of squashing too many commits for one user), so that we can have proper and correct data and, also so that students can’t weasel their way out later claiming they worked but forgot or didn’t know.

Use Case Diagram

Use Case diagram of two approaches to append 'GitHub contribution metric' in teammate review.
Use Case diagram explaining approach to add new column 'GitHub contribution metric' in 'View submission

Use Case Diagram Details

Actors:

  • Instructor: This actor is responsible for creating assignments and adding students to the assignment.
  • Student: This actor is responsible for submitting, self-reviewing, and viewing the scores.

Database:

  • The database where all the data of Expertiza is getting stored.

Pre Conditions:

  • The Student should submit the assignment and self-review.
  • The other students should submit the reviews of the work submitted.

Primary Sequence:

  • The student should login.
  • The student should browse and upload the assignment.
  • The student should submit the assignment.
  • The student should submit teammate-reviews.

Post Conditions:

  • Instructor will be able to see the team contribution done by each team member in 'View Submission' page using graph diagrams, as shown in the figure.
  • Instructor will be able to see the work done by each student in 'Teammate Review Tab' with new metric appended at the end, as shown in the figure.

Design Principles

  • MVC – The project is implemented in Ruby on Rails that uses MVC architecture. It separates an application’s data model, user interface, and control logic into three distinct components (model, view and controller, respectively). We intend to follow the same when implementing our end-point for pulling GitHub data.
  • Dry Principle – We are trying to reuse the existing functionalities in Expertiza, thus avoiding code duplication. Whenever possible, code modification based on the existing classes, controllers, or tables will be done instead of creating the new one.

Design Detail

First Change

  • The first change would would be under the "Teammate Review" tab in the "View Scores" page.
Approach 1
  • We either add new rows to the table, one for each of the GitHub metrics, and display the results for every student in the project as mocked up here:
The GitHub metrics are colored in blue
Approach 2

Or we might add a new tab called "Github Metrics" in the View scores page and then show the metrics and the results in tabular format as mocked up:

The GitHub metrics are colored in blue

Second Change

  • The second change is in the View Submissions page, where we intend to add a new column to the table that shows a chart per assignment team.
The GitHub metrics column shows the line graphs for every team

Database Design

As of now, we do not have plans for database modifications. As number of commits and LOCs(lines of code) will keep of changing and stale data does not seem to be of significant benefit for analysis, we will be concentrating on latest state of project repositories.

Plan of action

  • In line with our investigation, we will be using the official gem Octokit to get GitHub data from API. The idea is to always get the latest data from GitHub and show it to the user. For grading purposes, we will alternate a view between the latest data and data till submission deadlines.
  • Unfortunately, GitHub only allows 60 public GET requests per hour and 5000 if authenticated. We thus intend to use omniauth-github gem to integrate GitHub OAuth to authenticate the instructor.
  • There is ample documentation for GitHub API. The GitHub API returns JSON and we have been able to retrieve sample data using it.
  • The "Teammate Review" section will get an additional table, in the View Scores page. We will be introducing our logic in the Grades controller(grades_controller.rb). A new function will be created that will use the Octokit functions to retrieve the required data for the entire team.
  • Our new function will be called from the view action in the same controller. It will return the GitHub data which will be stored in an instance variable accessible to the views.
  • We then implement the view. To show the GitHub metrics, we will be modifying the _teammate_reviews_tab.html.erb partial under grades view-folder. The instance variable will be accessible here and will be used to pass data to the table.
  • For charting, we will use the bar-chart function in the grades controller, also used in a previous project- 1 2 3.
  • The table will be shown below the actual "Teammate Review" table as per the requirements.

Test Plan

Subtask 1: GitHub metrics in teammate reviews

Test plan for proposed solution 1:

1) Log in as an instructor

2) Navigate to assignments through Manage --> Assignments

3) Select "View scores" icon for the assignment of your choice

4) Select the team for which you wish to view scores

5) Go to "GitHub metrics" tab

6) View data based on different GitHub metrics (e.g. lines of code added/changed/removed etc.) for each teammate

Test plan for proposed solution 2:

1) Log in as an instructor

2) Navigate to assignments through Manage --> Assignments

3) Select "View scores" icon for the assignment of your choice

4) Select the team for which you wish to view scores

5) Go to "Teammate reviews" tab

6) Select the student for whom you wish to view teammate review

7) Below the usual criteria, view criteria for different GitHub metrics (e.g. lines of code added/changed/removed etc.) portrayed in a different color scheme (light blue)

Subtask 2: Line chart for # of lines changed by the overall team

1) Log in as an instructor

2) Navigate to assignments through Manage --> Assignments

3) Select "View submissions" icon for the assignment of your choice

4) Select the team whose submissions you wish to view

5) A newly added GitHub metrics column is added to show # of lines changed since the start of the assignment

References

Expertiza_wiki

E1815:_Improvements_to_review_grader

Expertiza_PR_1179

Expertiza_PR_1179_Video

GitHub API documentation

Change-log for reviewers

This section will be removed in the final draft. It is just here for convenience of reviewers to know which sections were majorly updated from last review.

  • Updated "Plan of Action" section to include which files and functions will be exactly updated.
  • Removed "View submission" changes as per updated requirements(by the Professor).
  • Updated "Design Considerations" with results of further investigation and our conclusions as to which approach we will take.