CSC/ECE 517 Spring 2017/finalproject E1744: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 18: Line 18:
====Database Schema Changes====
====Database Schema Changes====
* Add table: '''github_contributors''':
* Add table: '''github_contributors''':
.* Columns:
..* Columns:
..* Committer email
....* Committer email
..* Committer id
....* Committer id
..* Total number of commits
....* Total number of commits
..* Number of files changed
....* Number of files changed
..* Lines of code changed
....* Lines of code changed
..* Lines of code added
....* Lines of code added
..* Lines of code removed
....* Lines of code removed
..* Lines of code added that survived until final submission
....* Lines of code added that survived until final submission
.* Indices:
..* Indices:
..* Committer ID
....* Committer ID
* Add table: '''submission_records_github_contributors'''
* Add table: '''submission_records_github_contributors'''
.* Columns:
..* Columns:
..* submission_record_id
....* submission_record_id
..* github_contributor_id
....* github_contributor_id
.* Indices:
..* Indices:
..* submission_record_id: Foreign Key to submission_records
....* submission_record_id: Foreign Key to submission_records
..* github_contributor_id: Foreign Key to github_contributors
....* github_contributor_id: Foreign Key to github_contributors

Revision as of 05:35, 6 April 2017

CSC517 Final Project - E1744 Github Metrics

(asorgiu, george2, mdunlap, ygou14)


Proposed Design Document

Description

We will add a new feature to provide Expertiza with Github metrics (for example, number of committers, number of commits, number of lines of code modified, number of lines added, number of lines deleted.) from each group’s submitted repo link. This information should prove useful for differentiating the performance of team members for grading purposes. It may also help instructors to predict which projects are likely to be merged.

Work to be done

This project is divided into two parts. One is to extract Github metadata of the submitted repos and pull requests. The second part is to build a classifier (e.g., Bayesian) to do the early prediction on some projects that are likely to fail. This prediction is based on more than 200 past projects. The features above should be used, together with some temporal features (e.g. the temporal pattern of this team’s commits so far). Eventually, we would like to e-mail students whose metrics are bad, giving them advice on how to improve.

Extract Github metadata

Build a classifier

Database Schema Changes

  • Add table: github_contributors:

..* Columns: ....* Committer email ....* Committer id ....* Total number of commits ....* Number of files changed ....* Lines of code changed ....* Lines of code added ....* Lines of code removed ....* Lines of code added that survived until final submission ..* Indices: ....* Committer ID

  • Add table: submission_records_github_contributors

..* Columns: ....* submission_record_id ....* github_contributor_id ..* Indices: ....* submission_record_id: Foreign Key to submission_records ....* github_contributor_id: Foreign Key to github_contributors