CSC/ECE 517 Spring 2023 - E2324. Github metrics integration

From Expertiza_Wiki
Revision as of 10:04, 3 May 2023 by Skaalam (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

About Expertiza

Expertiza is an open source project based on Ruby on Rails framework. Expertiza allows the instructor to create new assignments and customize new or existing assignments. It also allows the instructor to create a list of topics the students can sign up for. Students can form teams in Expertiza to work on various projects and assignments. Students can also peer review other students' submissions. Expertiza supports submission across various document types, including the URLs and wiki pages.

Introduction

Expertiza provides teammate reviews to gauge how much each team member contributed, but we would like to augment this data with data from external tools like Github (for example, number of commits, number of lines of code modified, number of lines added, number of lines deleted.) from each group’s submitted repo link. This information should prove useful for differentiating the performance of team members for grading purposes. Overall data for the team, like number of committers and number of commits may also help instructors to predict which projects are likely to be merged.

Project Overview and Mission

  • The first step is to extract Github metadata of the submitted repos and pull requests.
  • The metadata should be stored in the local Expertiza DB. For each participant, record should include at least include:
  -Committer id
  -Total number of commits
  -Number of files changed
  -Lines of code changed
  -Lines of code added
  -Lines of code removed
  -Lines of code added that survived until final submission [if available from Github]
  • The code should sync the data with Github whenever someone (student or instructor) looks at a view that shows Github data.
  • The data for teams should be shown in the instructor’s View Scores window, in a new tab, probably between Reviews and Author Feedback:
  -Design a good view for showing data on individuals
  -This should be on the Teammate Reviews tab, right below the grid for teammate reviews.  The reason for this is that we’d like to see all the data on an individual
   in a single view.  For teams, by contrast, there is already a pretty large grid, and potentially multiple grids for multiple rounds, so adding a new grid is more
   likely to clutter the display
  • Create a bar chart for the # of lines changed for each assignment team on “view_submissions” page. The x-axis should be the time starting from the assignment creation time, till the last deadline of the assignment, or current time, whichever is earlier.



Current Implementation

Use Case Diagram

Use Github Metrics Functionality

  • This functionality gives the instructor the option to use Github metrics when creating an assignment.
  • In the below screenshot, while creating an assignment, Use github metrics? option pointed by the red arrow is not selected


  • As you can see below, if the Use github metrics? option is not selected, the instructor cannot see the github metrics for the project submissions.



  • In the below screenshot, while creating an assignment, Use github metrics? option is selected to be able see the github metrics for that assignment.



  • Now the instructor has the option "GitHub metrics" under each team as indicated by the red arrow.



When the instructor clicks on the "GitHub metrics" option, he/she can see the following:

  • A graph which shows the no. of commits made by each contributor on a particular date.
  • A pie chart which depicts the total no. of commits made by each contributor.
  • Team statistics, which includes total no. of commits, total no. of code lines added and deleted, total no. of files changed, merge statutes, and check statuses.


Refactored Files

The following files have been refactored:

  • github_metric_uses_controller.rb
  • metrics_controller.rb
  • grades_helper.rb
  • metrics_helper.rb
  • metric.rb


github_metric_uses_controller.rb :

  • The code has been refactored to improve readability and reduce duplication.
  • The refactored code has replaced the self methods with instance methods, which makes the code more object-oriented.
  • The delete method has been simplified by using the safe navigation operator (&.) to avoid raising an exception if the record is not found.
  • Finally, the code has been restructured to remove unnecessary comments and improve overall clarity.

Before:


class GithubMetricUsesController < ApplicationController

  skip_before_action :authorize
  helper_method :exist

  #check if assignment_id exists in the table github_metric_uses
  def self.exist(assignment_id)
    github_use = GithubMetricUses.find_by(assignment_id: assignment_id)
    !github_use.nil?
  end

  #if assignment_id does not exist in the table github_metric_uses, save it
  def self.save(assignment_id)
    unless exist(assignment_id)
      github_use = GithubMetricUses.new(assignment_id)
      github_use.assignment_id = assignment_id
      github_use.save
    end
  end

  #if assignment_id exists in the table github_metric_uses, delete it
  def self.delete(assignment_id)
    if exist(assignment_id)
      github_use = GithubMetricUses.find_by(assignment_id: assignment_id)
      github_use.destroy
    end
  end

  #check if assignment_id exists in the table github_metric_uses
  def exist
    assignment_id = params[:assignment_id]
    GithubMetricUsesController.exist(assignment_id)
  end

  #if assignment_id does not exist in the table github_metric_uses, save it
  def save
    assignment_id = params[:assignment_id]
    GithubMetricUsesController.save(assignment_id)
    respond_to do |format|
      format.json { render json: assignment_id }
    end
  end

  #if assignment_id exists in the table github_metric_uses, delete it
  def delete
    assignment_id = params[:assignment_id]
    GithubMetricUsesController.delete(assignment_id)
    respond_to do |format|
      format.json { render json: assignment_id }
    end
  end

end

After:

class GithubMetricUsesController < ApplicationController
  skip_before_action :authorize
  
  # Check if a record with the given assignment_id exists in the GithubMetricUses table
  def record_exists?
    assignment_id = params[:assignment_id]
    github_use = GithubMetricUses.find_by(assignment_id: assignment_id)
    render json: !github_use.nil?
  end

  # Save a new record with the given assignment_id to the GithubMetricUses table
  # if a record with the same assignment_id doesn't already exist
  def save
    assignment_id = params[:assignment_id]
    github_use = GithubMetricUses.find_or_create_by(assignment_id: assignment_id)
    render json: assignment_id
  end

  # Delete the record with the given assignment_id from the GithubMetricUses table
  # if it exists
  def delete
    assignment_id = params[:assignment_id]
    github_use = GithubMetricUses.find_by(assignment_id: assignment_id)
    github_use&.destroy
    render json: assignment_id
  end
end

metrics_controller.rb:

  • Extracting methods - Several large and complex methods were extracted into smaller and more specific methods, which improved the code's readability and made it easier to understand and maintain.
  • Removing repetitive code - The refactored code removed duplicated code by using helper methods and inheritance.
  • Better naming conventions - The naming conventions used in the refactored code are more consistent and descriptive, making it easier to understand what the code is doing.
  • Improved use of Rails features - The refactored code takes advantage of features built into the Rails framework, such as the use of scopes to simplify queries and reduce code complexity.
  • Better organization of code - The refactored code is organized into sections that group related methods and functionality together, making it easier to navigate and understand the codebase.
  • We moved few methods to metrics_helper.rb, as business logic should be in helper.

grades_helper.rb:

  • Comments were removed or updated to better reflect the intent of the code.
  • A guard clause was added at the beginning of the method to return nil if there are no metrics for the given team.
  • The logic to find a user associated with a metric was simplified and made more efficient by chaining multiple "find_by" methods.
  • The use of "unless" was replaced with "if" to improve the readability of the code.
  • The key creation and commit aggregation logic was updated to use the "||=" operator to create a new sub-hash if the user key does not yet exist in the hash and to increment the total number of commits for the current user.

Before:

# E2111 This method creates the code to display the metrics heatgrid in view_team in grades.
  def metrics_table(team)
    metrics = Metric.where("team_id = ?", team)

    unless metrics.nil?
      data_array = {}
      metrics.each do |metric|
        unless metric.participant_id.nil?
          #Lookup user if ID was stored at query time
          user = User.find(metric.participant_id)
        else
          # If not, try to find user by recently-entered github ID
          user = User.find_by_github_id(metric.github_id)
          # If still not, try to find user by their NCSU email if it's the same as github.com
          user = User.find_by_email(metric.github_id) if user.nil?
        end

        #Finally, if user was not found, handle by using github email in the
        # Student Name field, or Student Fullname if found.
        user_fullname = user.nil? ? "Github Email: " + metric.github_id : user.fullname
        if data_array[user_fullname]
          data_array[user_fullname][:commits] += metric.total_commits
        else
          data_array[user_fullname] = {}
          data_array[user_fullname][:commits] = metric.total_commits
        end
      end

After:

  # This method generates a metrics table for a given team and returns it as a hash
  def metrics_table(team)
    # Find all metrics associated with the given team
    metrics = Metric.where("team_id = ?", team)
    
    # Return nil if no metrics are found
    return nil if metrics.empty?
    
    # Initialize an empty hash to store the table data
    data_array = {}

    # Loop through each metric and extract relevant data
    metrics.each do |metric|
      # Try to find a user by participant_id, github_id, or email
      user = User.find_by(participant_id: metric.participant_id) || User.find_by_github_id(metric.github_id) || User.find_by_email(metric.github_id)
      
      # Use the user's full name or Github email address as the key in the hash
      user_fullname = user.nil? ? "Github Email: #{metric.github_id}" : user.fullname
      
      # If the user key is not yet present in the hash, create a new sub-hash
      data_array[user_fullname] ||= {}
      
      # Increment the total number of commits for the current user
      data_array[user_fullname][:commits] ||= 0
      data_array[user_fullname][:commits] += metric.total_commits
    end

metrics_helper.rb :

  • The methods "parse_hyperlink_data" and "sort_commit_dates" were moved from metrics_controller.rb to metrics_helper.rb

metric.rb :

  • Constants for GraphQL queries: In the refactored code, the GraphQL query text is stored as constants within the Metric class. This makes the code more readable and easier to modify in case the GraphQL queries need to be changed.
  • Interpolation: The code uses string interpolation to substitute values in the GraphQL queries. This approach is more concise and readable than string concatenation, which the previous implementation uses.
  • Query text format: The code uses the format method to substitute placeholders in the GraphQL query text with the actual values. This improves the readability of the code and reduces the risk of syntax errors.
  • Default parameters: In the refactored code, the after parameter in both pull_query and repo_query methods have been made optional and are given default values of nil. This allows the code to handle both cases where after is present and where it is not, without the need for conditional statements.
  • Date format: The refactored code formats the date parameter in the repo_query method as a ISO 8601 string using the DateTime class. This ensures that the date is formatted correctly for use in the GraphQL query.

Before:

class Metric < ActiveRecord::Base
  # Constants for GraphQL queries.
  PULL_REQUEST_QUERY = <<~QUERY
    query {
      repository(owner: "%<owner_name>s", name: "%<repository_name>s") {
        pullRequest(number: %<pull_request_number>s) {
          number additions deletions changedFiles mergeable merged headRefOid
          commits(first: 100 %<after_clause>s) {
            totalCount
            pageInfo {
              hasNextPage startCursor endCursor
            }
            edges {
              node {
                id commit {
                  author {
                    name email
                  }
                  additions deletions changedFiles committedDate
                }
              }
            }
          }
        }
      }
    }
  QUERY

  REPO_QUERY = <<~QUERY
    query {
      repository(owner: "%<owner_name>s", name: "%<repository_name>s") {
        ref(qualifiedName: "master") {
          target {
            ... on Commit {
              id
              history(%<after_clause>s since: "%<start_date>s") {
                edges {
                  node {
                    id author {
                      name email date
                    }
                  }
                }
                pageInfo {
                  endCursor
                  hasNextPage
                }
              }
            }
          }
        }
      }
    }
  QUERY

  # Generate the GraphQL query text for a PULL REQUEST link.
  #
  # hyperlink_data - a hash containing the owner name, repository name, and pull request number.
  # after - a pointer provided by the Github API to where the last query left off.
  #
  # Returns a hash containing the query text.
  def self.pull_query(hyperlink_data, after: nil)
    format(PULL_REQUEST_QUERY, {
      owner_name: hyperlink_data["owner_name"],
      repository_name: hyperlink_data["repository_name"],
      pull_request_number: hyperlink_data["pull_request_number"],
      after_clause: after ? "after: \"#{after}\"" : ""
    })
  end

  # Generate the GraphQL query text for a REPOSITORY link.
  #
  # hyperlink_data - a hash containing the owner name and repository name.
  # date - the assignment start date, in the format "YYYY-MM-DD".
  # after - a pointer provided by the Github API to where the last query left off.
  #
  # Returns a hash containing the query text.
  def self.repo_query(hyperlink_data, date, after: nil)
    format(REPO_QUERY, {
      owner_name: hyperlink_data["owner_name"],
      repository_name: hyperlink_data["repository_name"],
      start_date: DateTime.parse(date).to_time.iso8601(3),
      after_clause: after ? "after: \"#{after}\"" : ""
    })
  end
end

After:

class Metric < ActiveRecord::Base

  # Generate the graphQL query text for a PULL REQUEST link, based on the link data and "after", which is a pointer
  # provided by the Github API to where the last query left off. Used to handle pulls containing more than 100 commits.
  def self.pull_query(hyperlink_data, after)
    {
      query: "query {
        repository(owner: \"" + hyperlink_data["owner_name"] + "\", name:\"" + hyperlink_data["repository_name"] + "\") {
          pullRequest(number: " + hyperlink_data["pull_request_number"] + ") {
            number additions deletions changedFiles mergeable merged headRefOid
              commits(first:100 #{ "after:\""+ after + "\"" unless after.nil? }){
                totalCount
                  pageInfo{
                    hasNextPage startCursor endCursor
                    }
                      edges{
                        node{
                          id  commit{
                                author{
                                  name email
                                }
                               additions deletions changedFiles committedDate
                        }}}}}}}"
    }
  end

  # Generate the graphQL query text for a REPOSITORY link, based on the link data and "after", which is a pointer
  # provided by the Github API to where the last query left off. Used to handle repositories containing more than 100 commits.
  def self.repo_query(hyperlink_data, date, after=nil)
    date = date.to_time.iso8601.to_s[0..18] # Format assignment start date for github api
    { query: "query {
        repository(owner: \"" + hyperlink_data["owner_name"] + "\", name: \"" + hyperlink_data["repository_name"] + "\") {
          ref(qualifiedName: \"master\") {
            target {
              ... on Commit {
                id
                  history(#{ "after:\""+ after + "\"" unless after.nil? } since:\"#{date}\") {
                    edges {
                      node {
                        id author {
                          name email date
                        }
                      }
                    }
                  pageInfo {
                    endCursor
                    hasNextPage
                  }
                }
              }
            }
          }
        }
      }"
    }
  end
end

Test Plan

UI Testing

Our UI tests aim to capture the following core pieces of functionality:

Turn on and turn off functionality button:

In order to test the functionality manually, we follow the following steps:

1. Log in to Expertiza as an instructor
2. Navigate to assignments through Manage
3. Show a turn on/turn down option for Github metric on the page
4. Process the page without choosing Github metrics report for a particular assignment
5. The page should not display "Login to Query of Github data" options.


Rspec

Detailed of Using Github Testing first, go to the assignment edit page and check the checkbox of use github metric, then go to the list submission page check the page if has the content about ithub data.

   it "check the box of the use github metrics? and the list_submissions page will have the content 'Github data'  " do
        visit "/assignments/#{@assignment.id}/edit"
   	check('Use github metrics?', allow_label_click: true)
        visit "/assignments/list_submissions?id=#{@assignment.id}"
        expect(page).to have_content("Github data")
   end
   it "uncheck the checkbox of the use github metrics? and the list_submissions page will not have the content 'Github data' " do
        visit "/assignments/#{@assignment.id}/edit"
        check('Use github metrics?', allow_label_click: false)
        page.uncheck('Use github metrics?')
        visit "/assignments/list_submissions?id=#{@assignment.id}"
        expect(page).to have_no_content("Github data")
   end


Future Implementation

  • Add more test cases.
  • Successfully pass all tests for pull request to expertiza repository.

Project Mentor

Ed Gehringer (efg@ncsu.edu)

Contributors to this project

Soham Bapat (sbapat2@ncsu.edu) Srihitha Reddy Kaalam (skaalam@ncsu.edu) Sukruthi Modem (smodem@ncsu.edu)

Pull Request Link

https://github.com/expertiza/expertiza/pull/2524