CSC/ECE 517 Fall 2021 - E2168. Testing - Reputations: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(Added testing results and coverage category for future use)
(Explain why send_post_request was not tested)
Line 245: Line 245:


==== send_post_request ====  
==== send_post_request ====  
This method send a post request to peerlogic.csc.ncsu.edu/reputation/calculations/reputation_algorithms to calculate get the reputation result and use show the result  
This method send a post request to peerlogic.csc.ncsu.edu/reputation/calculations/reputation_algorithms to calculate get the reputation result and use show the result in corresponding UI and update given reviewer's reputation. We will test this method based on the algorithm in the paper, first to test the result reputation value, second to test the update value in database.
in corresponding UI and update given reviewer's reputation. We will test this method based on the algorithm in the paper, first to test the result reputation value, second to test the
 
update value in database.
Note that this method is not functioning due to the missing public & private key file for RSA encryption. The algorithm involving "expert grade" was also not implemented.
Thus, this method could not be properly tested. However we have gave the template to create tests for future use.


<pre>
<pre>

Revision as of 21:42, 29 November 2021

Project Overview

Introduction

Online peer-review systems are now in common use in higher education. They free the instructor and course staff from having to provide personally all the feedback that students receive on their work. However, if we want to assure that all students receive competent feedback, or even use peer-assigned grades, we need a way to judge which peer reviewers are most credible. The solution is to use a reputation system.
The reputation system is meant to provide objective value to student assigned peer review scores. Students select from a list of tasks to be performed and then prepare their work and submit it to a peer-review system. The work is then reviewed by other students who offer comments/graded feedback to help the submitters improve their work. During the peer review period it is important to determine which reviews are more accurate and show higher quality. Reputation is one way to achieve this goal; it is a quantization measurement to judge which peer reviewers are more reliable. Peer reviewers can use expertiza to score an author. If Expertiza shows a confidence ratings for grades based upon the reviewers reputation then authors can more easily determine the legitimacy of the peer assigned score. In addition, the teaching staff can examine the quality of each peer review based on reputation values and, potentially, crowd-source a significant portion of the grading function. Currently the reputation system is implemented in Expertiza through web-service, but there's no test written for it. Thus our goal is to write tests to verify Hamer's and Lauw’s algorithm from the reputation system.

System Design

The below is referenced from project E1625, which would give us the overall description of the reputation system.

There are two algorithms intended for use in calculation of the reputation values for participants.

There is a web-service (the link accessible only to the instructors) available which serves a JSON response containing the reputation value based on the seed provided in the form of the last known reputation value which we store in the participants table. An instructor can specify which algorithm to use for a particular assignment to calculate the confidence rating.

As the paper on reputation system by observes, “the Hamer-peer algorithm has the lowest maximum absolute bias and the Lauw-peer algorithm has the lowest overall bias.This indicates, from the instructor’s perspective, if there are further assignments of this kind, expert grading may not be necessary.”

Reputation range of Hamer’s algorithm is
red                   value < 0.5
yellow              value is >= 0.5 and <= 1
orange             value is > 1 and <= 1.5
light green       value is > 1.5 and <= 2
green               value is > 2


The main difference between the Hamer-peer and the Lauw-peer algorithm is that the Lauw-peer algorithm keeps track of the reviewer's leniency (“bias”), which can be either positive or negative. A positive leniency indicates the reviewer tends to give higher scores than average. This project determines reputation by subtracting the absolute value of the leniency from 1. Additionally, the range for Hamer’s algorithm is (0,∞) while for Lauw’s algorithm it is [0,1].

Reputation range of Lauw’s algorithm is
red                   value is < 0.2
yellow              value is >= 0.2 and <= 0.4
orange             value is > 0.4 and <= 0.6
light green       value is > 0.6 and <= 0.8
green               value is > 0.8

The instructor can choose to show results from Hamer’s algorithm or Lauw’s algorithm. The default algorithm should be Lauw’s algorithm.

Objectives

Our objectives for this project are the following:

  • Double and stub an assignment, a few submissions to the assignment, under different review rubrics
  • Manually calculate reputation scores based on paper "Pluggable reputation systems for peer review: A web-service approach"
  • Validate correct reputation scores based on different review rubrics generated by reputation management VS manual computation of reputation score expectation on different reputation range of Hamer's and Lauw's algorithm with or without instructor score impact.

Files Involved

  • reputation_web_service_controller_spec

Test Plan

Setup Testing Objects

In order to implement testing on reputation, it is crucial to create sample reviews so that we could possibly obtain reputation score. During the kickoff meeting, our team defined four necessary steps to follow for the purpose of testing. Also, appropriate objects could be created and confined as discussed below.

Assignment

  • Objects involved

assignments

  • Essential Parameters to be configured

submitter_count = 0;
num_reviews = 5;
num_reviewers = 5;
num_reviews_allowed = 5;
rounds_of_reviews = 2;
reputation_algorithm = lauw/hamer;
Note: Two assignment objects were created. Assignment_1 used lauw's algorithm, whereas assignment_2 used hamer's alogorithm.

  • Code Implemented
  @assignment_1 = create(:assignment, created_at: DateTime.now.in_time_zone - 13.day, submitter_count: 0, num_reviews: 3, num_reviewers: 5, num_reviews_allowed: 5, rounds_of_reviews: 2, reputation_algorithm: 'lauw', id: 1)
  @assignment_2 = create(:assignment, created_at: DateTime.now.in_time_zone - 13.day, submitter_count: 0, num_reviews: 3, num_reviewers: 5, num_reviews_allowed: 5, rounds_of_reviews: 2, reputation_algorithm: 'hamer', id: 2)

Questionnaires(Rubrics)

  • Objects involved

questionnaires
assignment_questionnaires

  • Essential Parameters to be configured

instructor_id = from_fixture;
min_question_score = 0;
max_question_score = 5;
type = ReviewQuestionnaire;
Note: We will define the assignment with ReviewQuestionnaire type rubric.

  • Code Implemented
  @questionnaire_1 = create(:questionnaire, min_question_score: 0, max_question_score: 5, type: 'ReviewQuestionnaire', id: 1)
  @assignment_questionnaire_1_1 = create(:assignment_questionnaire, assignment_id: @assignment_1.id, questionnaire_id: @questionnaire_1.id, used_in_round: 1)
  @assignment_questionnaire_1_2 = create(:assignment_questionnaire, assignment_id: @assignment_1.id, questionnaire_id: @questionnaire_1.id, used_in_round: 2)
  @assignment_questionnaire_2_1 = create(:assignment_questionnaire, assignment_id: @assignment_2.id, questionnaire_id: @questionnaire_1.id, used_in_round: 1)
  @assignment_questionnaire_2_2 = create(:assignment_questionnaire, assignment_id: @assignment_2.id, questionnaire_id: @questionnaire_1.id, used_in_round: 2, id: 4)

Questions under Questionnaires

  • Objects involved

questions

  • Code Implemented
  @question_1_1 = create(:question, questionnaire_id: @questionnaire_1.id, id: 1)
  @question_1_2 = create(:question, questionnaire_id: @questionnaire_1.id, id: 2)
  @question_1_3 = create(:question, questionnaire_id: @questionnaire_1.id, id: 3)
  @question_1_4 = create(:question, questionnaire_id: @questionnaire_1.id, id: 4)
  @question_1_5 = create(:question, questionnaire_id: @questionnaire_1.id, id: 5)

Reviewers and Reviewees

  • Objects involved

participants
teams

  • Code Implemented

Reviewers (Participant):

  @reviewer_1 = create(:participant, can_review: 1)
  @reviewer_2 = create(:participant, can_review: 1)
  @reviewer_3 = create(:participant, can_review: 1)

Reviewees (Teams):

  @reviewee_1 = create(:assignment_team, assignment: @assignment)
  @reviewee_2 = create(:assignment_team, assignment: @assignment)
  @reviewee_3 = create(:assignment_team, assignment: @assignment)

Responses

  • Objects involved

response_maps
responses

  • Essential Parameters to be configured

reviewed_object_id = assignment_id;
reviewer_id = participants;
reviewee_id = AssignmentTeam;
Note: we will setup response map to determine relationship between reviewer and reviewee of an assignment.

  • Code Implemented

Response_maps:

  @response_map_1_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_1.id)
  @response_map_1_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_1.id)
  @response_map_1_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_1.id)

  @response_map_2_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_2.id)
  @response_map_2_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_2.id)
  @response_map_2_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_2.id)

  @response_map_3_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_3.id)
  @response_map_3_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_3.id)
  @response_map_3_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_3.id)

Responses:

  @response_1_1 = create(:response, is_submitted: true, map_id: @response_map_1_1.id)
  @response_1_2 = create(:response, is_submitted: true, map_id: @response_map_1_2.id)
  @response_1_3 = create(:response, is_submitted: true, map_id: @response_map_1_3.id)

  @response_2_1 = create(:response, is_submitted: true, map_id: @response_map_2_1.id)
  @response_2_2 = create(:response, is_submitted: true, map_id: @response_map_2_2.id)
  @response_2_3 = create(:response, is_submitted: true, map_id: @response_map_2_3.id)

  @response_3_1 = create(:response, is_submitted: true, map_id: @response_map_3_1.id)
  @response_3_2 = create(:response, is_submitted: true, map_id: @response_map_3_2.id)
  @response_3_3 = create(:response, is_submitted: true, map_id: @response_map_3_3.id)


The manifestation of each object will contribute to the success of the following test on reputations. Some fields in each object can be empty or have default values. Some attributes are not relevant to the test. When implementing the test, the test scripts need to generate or set fixed values for corresponding fields.

Relevant Methods

db_query

This is the normal db query method, call this method will return peer review grades with given assignment id. We will test this method in two aspect. 1. Test whether or not the grade return is right based on the specified algorithm. 2. We need to test the correctness of the query.

  context 'test db_query' do
    it 'return average score' do
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 1)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 2)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 4)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)
      result = ReputationWebServiceController.new.db_query(1, 1, false)
      expect(result).to eq([[2, 1, 60.0]])
    end
  end

json_generator

This method will generate the hash format of the review, we will test this method by calling to and convert the result to json format the print to test its correctness.

  context 'test json_generator' do
    it 'test 3 reviewer for one reviewee' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      # reivewer_2's review for reviewee_1: [3, 3, 3, 3, 3]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_2.id, answer: 3)

      # reivewer_3's review for reviewee_1: [1, 1, 1, 1, 1]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_3.id, answer: 1)

      #result = ReputationWebServiceController.new.db_query(1, 1, false)
      #expect(result).to eq([[2, 1, 100.0], [3, 1, 60.0], [4, 1, 20.0]])
      result = ReputationWebServiceController.new.json_generator(1, 0, 1)
      expect(result).to eq({"submission1"=>{"stu2"=>100.0, "stu3"=>60.0, "stu4"=>20.0}})
      #repeat for different answers
    end

    it 'test same reviewer for different reviewee' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      # reivewer_1's review for reviewee_2: [3, 3, 3, 3, 3]
      create(:answer, question_id: @question_1_1.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_2.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_3.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_5.id, response_id: @response_2_1.id, answer: 3)

      result = ReputationWebServiceController.new.json_generator(1, 0, 1)
      expect(result).to eq("submission1"=>{"stu2"=>100.0}, "submission2"=>{"stu2"=>60.0})
      #repeat for different answers
    end
  end

send_post_request

This method send a post request to peerlogic.csc.ncsu.edu/reputation/calculations/reputation_algorithms to calculate get the reputation result and use show the result in corresponding UI and update given reviewer's reputation. We will test this method based on the algorithm in the paper, first to test the result reputation value, second to test the update value in database.

Note that this method is not functioning due to the missing public & private key file for RSA encryption. The algorithm involving "expert grade" was also not implemented. Thus, this method could not be properly tested. However we have gave the template to create tests for future use.

  context 'test send_post_request' do
    it 'failed because of no public key file' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      params = {assignment_id: 1, round_num: 1, algorithm: 'hammer', checkbox: {expert_grade: "empty"}}
      session = {user: build(:instructor, id: 1)}

      expect(true).to eq(true)

      # comment out because send_post_request method request public key file while this file is missing
      # so at this time send_post_request is not functioning normally
      # get :send_post_request, params, session
      # expect(response).to redirect_to '/reputation_web_service/client'
    end
  end

Results

Testing Results

Coverage

Future Tasks

As our team could not obtain the public/private key pair to access the reputation web service, we were only able get to the step prior to sending the JSON to the web service of reputation algorithms. Therefore, future steps are required to test on reputation system.

  1. There is a lot of unused/commented code, which should be removed.
  2. Figure out what the code is doing and write appropriate comments for it.
  3. In the case of db_query, the name should say what it queries for. Also, this method not only queries, but calculates sums. Since each method should do only one thing, the code for calculating sums should be in another method. And there should be comments in the code!
  4. json_generator should be generate_json. There needs to be a method comment saying what the parameters are.
  5. In send_post_request, there are references to specific assignments, such as 724, 735, and 756. They were put in to gather data for a paper published in 2015. They are no longer relevant and should be removed. send_post_request is 91 lines long, far too long.
  6. There is a password for a private key in the code (and the code is open-sourced!) It should be in the db instead.
  7. Fix spelling of “dimention”
  8. client is a bad method name; why is stuff being copied from class variables to instance variables?

Collaborators

Jinku Cui (jcui23)

Henry Chen (hchen34)

Dong Li (dli35)

Zijun Lu (zlu5)

References

  1. Expertiza on GitHub
  2. The live Expertiza website
  3. Pluggable reputation systems for peer review: A web-service approach