CSC/ECE 517 Spring 2022 - E2212: Testing for hamer.rb

From Expertiza_Wiki
Revision as of 02:02, 22 March 2022 by Molmez (talk | contribs)
Jump to navigation Jump to search

This page describes the changes made for the Spring 2022 OSS Project E2212: Testing for hamer.rb

About Expertiza

Expertiza is a multi-purpose web application built using Ruby on Rails for Students and Instructors. Instructors enrolled in Expertiza can create and customize classes, teams, assignments, quizzes, and many more. On the other hand, Students are also allowed to form teams, attempt quizzes, and complete assignments. Apart from that, Expertiza also allows students to provide peer reviews enabling them to work together to improve others' learning experiences. It is an open-source application and its Github repository is Expertiza.

Description

hamer.rb was the file that implemented one of the “reputation systems” that can be used to determine the reliability of peer reviewers. However, this file is no longer current, having been replaced by a web service in 2015. Therefore, we will be trying to describe and test this web service in the following sections.

Mentor

Ed Gehringer, efg@ncsu.edu

Team Members

  • Joshua Lin (jlin36@ncsu.edu)
  • Muhammet Mustafa Olmez (molmez@ncsu.edu)
  • Soumyadeep Chatterjee (schatte5@ncsu.edu)


Reputation System

Online peer-review systems are now in common use in higher education. They free the instructor and course staff from having to provide personally all the feedbackthat students receive on their work. However, if we want to assure that all students receive competent feedback, or even use peer-assigned grades, we need a way tojudge which peer reviewers are most credible. The solution is to use a reputation system. The reputation system is meant to provide objective value to student assigned peer review scores. Students select from a list of tasks to be performed and then preparetheir work and submit it to a peer-review system. The work is then reviewed by other students who offer comments/graded feedback to help the submitters improvetheir work. During the peer review period it is important to determine which reviews are more accurate and show higher quality. Reputation is one way to achieve thisgoal; it is a quantization measurement to judge which peer reviewers are more reliable. Peer reviewers can use expertiza to score an author. If Expertiza shows aconfidence ratings for grades based upon the reviewers reputation then authors can more easily determine the legitimacy of the peer assigned score. In addition, theteaching staff can examine the quality of each peer review based on reputation values and, potentially, crowd-source a significant portion of the grading function.Currently the reputation system is implemented in Expertiza through web-service. The service does not work all the time although expertiza employees can sometimes run the system, we could not reach the service and values even though we tried it on our own local computer and vcl as well. Nevertheless, we have implement some test scenerios based on the algorithms used in the web service.


Algorithms

Reputation systems may take various factors into account: • Does a reviewer assign scores that are similar to scores assigned by the instructor (on work that they both grade)? • Does a reviewer assign scores that match those assigned by other reviewers? • Does the reviewer assign different scores to different work? • How competent has the reviewer been on other work done for the class?

There are two algorithms used, the Hamer-peer algorithm has the lowest maximum absolute bias and the Lauw-peer algorithm has the lowest overall bias.This indicates, from theinstructor’s perspective, if there are further assignments of this kind, expert grading may not be necessary. It is observed in the article (https://ieeexplore.ieee.org/abstract/document/7344292) that the overall bias is a little bit higher, but the max. absolute bias is very high (more than 20). This indicates that for future similar courses, the instructor can trust most students’ peer grading, but should be aware that the students may give inflated grades. Therefore spot-checking is necessary. However, overall bias is quite low, as the students gave grades at least 16 points lower than expert grades. This may because either more training is needed, or the review rubric is inadequate. The results also suggest that for future courses of this kind, the instructor cannot trust the students' grades; expert grades are still necessary. The main difference between the Hamer-peer and the Lauw-peer algorithm is that the Lauw-peer algorithm keeps track of the reviewer's leniency (“bias”), which can be either positive or negative. A positive leniency indicates the reviewer tends to give higher scores than average. Additionally, the range for Hamer’s algorithm is (0,∞) while for Lauw’s algorithm it is [0,1].


Reputation Web Service Test Snippet

import math

# Parameters: reviews list
# reviews list - a list of each reviewer's grades for each assignment
# Example:
# reviews = [[5,4,4,3,2],[5,3,4,4,2],[4,3,4,3,2]]
# Corresponding reviewer and grade for each assignment table
# Essay          Reviewer1 Reviewer2 Reviewer3
# Assignment1    5         5          4
# Assignment2    4         3          3
# Assignment3    4         4          4
# Assignment4    3         4          3
# Assignment5    2         2          2

# Reivewer's grades given to each assignment 2D array
# Each index of reviews is a reviewer. Each index in reviews[i] is a review grade
reviews = [[5,4,4,3,2],[5,3,4,4,2],[4,3,4,3,2]]

# Number of reviewers
numReviewers = len(reviews)
# Number of assignments
numAssig = len(reviews[0])
# Initial empty grades for each assignment array
grades = []
# Initial empty delta R array
deltaR = []
# Weight prime
weightPrime = []
# Reviewer's reputation weight
weight= []

# Calculating Average Weighted Grades per Reviewer
for numAssigIndex in range(numAssig):
    assignmentGradeAverage = 0
    for numReviewerIndex in range(numReviewers):
        assignmentGradeAverage += reviews[numReviewerIndex][numAssigIndex]
    grades.append(assignmentGradeAverage/numReviewers)
print("Average Grades:")
print(grades)

# Calculating delta R
for numReviewerIndex in range(numReviewers):
    reviewerDeltaR = 0
    assignmentAverageGradeIndex = 0
    for reviewGrade in reviews[numReviewerIndex]:
        reviewerDeltaR += ((reviewGrade - grades[assignmentAverageGradeIndex]) ** 2)
        assignmentAverageGradeIndex += 1
    reviewerDeltaR /= numAssig
    deltaR.append(reviewerDeltaR)
print("deltaR:")
print(deltaR)

# Calculating weight prime
averageDeltaR = 0
for reviewerDeltaR in deltaR:
    averageDeltaR += reviewerDeltaR
averageDeltaR /= numReviewers
print("averageDeltaR:")
print(averageDeltaR)

# Calculating weight prime
for reviewerDeltaR in deltaR:
    weightPrime.append(averageDeltaR/reviewerDeltaR)
print("weightPrime:")
print(weightPrime)
    
# Calculating reputation weight
for reviewerWeightPrime in weightPrime:
    if reviewerWeightPrime <= 2:
        weight.append(reviewerWeightPrime)
    else:
        weight.append(2 + math.log(reviewerWeightPrime - 1))
print("reputation per reviewer:")
i = 1
for reviewerWeight in weight:
    print("Reputation of Reviewer ", i)
    print(round(reviewerWeight,1))
    i += 1

Output

Reputation of Reviewer  1
1.0
Reputation of Reviewer  2
1.0
Reputation of Reviewer  3
1.0

Scenarios

1) Reviewer gives all max scores
2) Reviewer gives all min scores
3) Reviewer completes no review
alternative scenario - reviewer gives max scores even if no inputs


Conclusion

We as a team figured out the algorithms and applications and write some test scenarious. However, we did not have chance to work on web service since it does not work due to module errors. What we had is undefined method strip on Reputation Web Service Controller. Although sometimes it works on expertiza team side, we were not able to see the web service working. We created some test scenarios and write a python code for testing reputation algorithm.


GitHub Links

Link to Expertiza repository: here

Link to the forked repository: here


References

1. Expertiza on GitHub (https://github.com/expertiza/expertiza)
2. The live Expertiza website (http://expertiza.ncsu.edu/)
3. Pluggable reputation systems for peer review: A web-service approach (https://doi.org/10.1109/FIE.2015.7344292)