CSC/ECE 517 Fall 2021 - E2168. Testing - Reputations

From Expertiza_Wiki
Jump to navigation Jump to search

Project Overview

Introduction

Online peer-review systems are now in common use in higher education. They free the instructor and course staff from having to provide personally all the feedback that students receive on their work. However, if we want to assure that all students receive competent feedback, or even use peer-assigned grades, we need a way to judge which peer reviewers are most credible. The solution is to use a reputation system.
The reputation system is meant to provide objective value to student assigned peer review scores. Students select from a list of tasks to be performed and then prepare their work and submit it to a peer-review system. The work is then reviewed by other students who offer comments/graded feedback to help the submitters improve their work. During the peer review period it is important to determine which reviews are more accurate and show higher quality. Reputation is one way to achieve this goal; it is a quantization measurement to judge which peer reviewers are more reliable. Peer reviewers can use expertiza to score an author. If Expertiza shows a confidence ratings for grades based upon the reviewers reputation then authors can more easily determine the legitimacy of the peer assigned score. In addition, the teaching staff can examine the quality of each peer review based on reputation values and, potentially, crowd-source a significant portion of the grading function. Currently the reputation system is implemented in Expertiza through web-service, but there's no test written for it. Thus our goal is to set up assignments and reviews that would produce specific reputation scores, and test that the correct reputations are in fact being produced.

System Design

The below is referenced from project E1625, which would give us the overall description of the reputation system.

There are two algorithms intended for use in calculation of the reputation values for participants.

There is a web-service (the link accessible only to the instructors) available which serves a JSON response containing the reputation value based on the seed provided in the form of the last known reputation value which we store in the participants table. An instructor can specify which algorithm to use for a particular assignment to calculate the confidence rating.

As the paper on reputation system by observes, “the Hamer-peer algorithm has the lowest maximum absolute bias and the Lauw-peer algorithm has the lowest overall bias.This indicates, from the instructor’s perspective, if there are further assignments of this kind, expert grading may not be necessary.”

Reputation range of Hamer’s algorithm is
red                   value < 0.5
yellow              value is >= 0.5 and <= 1
orange             value is > 1 and <= 1.5
light green       value is > 1.5 and <= 2
green               value is > 2


The main difference between the Hamer-peer and the Lauw-peer algorithm is that the Lauw-peer algorithm keeps track of the reviewer's leniency (“bias”), which can be either positive or negative. A positive leniency indicates the reviewer tends to give higher scores than average. This project determines reputation by subtracting the absolute value of the leniency from 1. Additionally, the range for Hamer’s algorithm is (0,∞) while for Lauw’s algorithm it is [0,1].

Reputation range of Lauw’s algorithm is
red                   value is < 0.2
yellow              value is >= 0.2 and <= 0.4
orange             value is > 0.4 and <= 0.6
light green       value is > 0.6 and <= 0.8
green               value is > 0.8

The instructor can choose to show results from Hamer’s algorithm or Lauw’s algorithm. The default algorithm should be Lauw’s algorithm.

Objectives

Our objectives for this project are the following:

  • Double and stub an assignment, a few submissions to the assignment, under different review rubrics
  • Manually calculate reputation scores based on paper "Pluggable reputation systems for peer review: A web-service approach"
  • Validate correct reputation scores based on different review rubrics generated by reputation management VS manual computation of reputation score using the Hamer's and Lauw's algorithm.

Files Involved

  • reputation_web_service_controller_spec

Test Plan

Set up Testing Objects

In order to implement testing on reputation, it is crucial to create sample reviews so that we could possibly obtain reputation score. During the kickoff meeting, our team defined four necessary steps to follow for the purpose of testing. Also, appropriate objects could be created and confined as discussed below.

The manifestation of each object will contribute to the success of the following test on reputations. Some fields in each object can be empty or have default values. Some attributes are not relevant to the test. When implementing the test, the test scripts need to generate or set fixed values for corresponding fields.

Assignment

  • Objects involved

assignments

  • Essential Parameters to be configured

submitter_count = 0;
num_reviews = 5;
num_reviewers = 5;
num_reviews_allowed = 5;
rounds_of_reviews = 2;
reputation_algorithm = lauw/hamer;
Note: Two assignment objects were created. Assignment_1 used lauw's algorithm, whereas assignment_2 used hamer's alogorithm.

  • Code Implemented
  @assignment_1 = create(:assignment, created_at: DateTime.now.in_time_zone - 13.day, submitter_count: 0, num_reviews: 3, num_reviewers: 5, num_reviews_allowed: 5, rounds_of_reviews: 2, reputation_algorithm: 'lauw', id: 1)
  @assignment_2 = create(:assignment, created_at: DateTime.now.in_time_zone - 13.day, submitter_count: 0, num_reviews: 3, num_reviewers: 5, num_reviews_allowed: 5, rounds_of_reviews: 2, reputation_algorithm: 'hamer', id: 2)

Questionnaires (Rubrics)

  • Objects involved

questionnaires
assignment_questionnaires

  • Essential Parameters to be configured

min_question_score = 0;
max_question_score = 5;
type = ReviewQuestionnaire;
Note: We will define the assignment with ReviewQuestionnaire type rubric.

  • Code Implemented
  @questionnaire_1 = create(:questionnaire, min_question_score: 0, max_question_score: 5, type: 'ReviewQuestionnaire', id: 1)
  # assignment_questionnaire_<i>_<j> means assignment #I'd #j th round of review.
  @assignment_questionnaire_1_1 = create(:assignment_questionnaire, assignment_id: @assignment_1.id, questionnaire_id: @questionnaire_1.id, used_in_round: 1)
  @assignment_questionnaire_1_2 = create(:assignment_questionnaire, assignment_id: @assignment_1.id, questionnaire_id: @questionnaire_1.id, used_in_round: 2)
  @assignment_questionnaire_2_1 = create(:assignment_questionnaire, assignment_id: @assignment_2.id, questionnaire_id: @questionnaire_1.id, used_in_round: 1)
  @assignment_questionnaire_2_2 = create(:assignment_questionnaire, assignment_id: @assignment_2.id, questionnaire_id: @questionnaire_1.id, used_in_round: 2)

Questions under Questionnaires

  • Objects involved

questions

  • Code Implemented
  # question_i_j means question #j in questionnaire #i.
  @question_1_1 = create(:question, questionnaire_id: @questionnaire_1.id, id: 1)
  @question_1_2 = create(:question, questionnaire_id: @questionnaire_1.id, id: 2)
  @question_1_3 = create(:question, questionnaire_id: @questionnaire_1.id, id: 3)
  @question_1_4 = create(:question, questionnaire_id: @questionnaire_1.id, id: 4)
  @question_1_5 = create(:question, questionnaire_id: @questionnaire_1.id, id: 5)

Reviewers and Reviewees

  • Objects involved

participants
teams

  • Code Implemented

Reviewers (Participant):

  @reviewer_1 = create(:participant, can_review: 1)
  @reviewer_2 = create(:participant, can_review: 1)
  @reviewer_3 = create(:participant, can_review: 1)

Reviewees (Teams):

  @reviewee_1 = create(:assignment_team, assignment: @assignment)
  @reviewee_2 = create(:assignment_team, assignment: @assignment)
  @reviewee_3 = create(:assignment_team, assignment: @assignment)

Responses

  • Objects involved

response_maps
responses

  • Essential Parameters to be configured

reviewed_object_id = <target_assignment>.id ;
reviewer_id = <target_reviewer>.id ;
reviewee_id = <target_reviewee>.id ;
Note: The response map is set up to determine the relationship between reviewer and reviewee of an assignment.

  • Code Implemented

Response_maps:

  # response_map_<i>_<j> means response of reviewer #j to reviewee #i. 
  @response_map_1_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_1.id)
  @response_map_1_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_1.id)
  @response_map_1_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_1.id)

  @response_map_2_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_2.id)
  @response_map_2_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_2.id)
  @response_map_2_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_2.id)

  @response_map_3_1 = create(:review_response_map, reviewer_id: @reviewer_1.id, reviewee_id: @reviewee_3.id)
  @response_map_3_2 = create(:review_response_map, reviewer_id: @reviewer_2.id, reviewee_id: @reviewee_3.id)
  @response_map_3_3 = create(:review_response_map, reviewer_id: @reviewer_3.id, reviewee_id: @reviewee_3.id)

Responses:

  # response_<i>_<j> means response of reviewer #j to reviewee #i. 
  @response_1_1 = create(:response, is_submitted: true, map_id: @response_map_1_1.id)
  @response_1_2 = create(:response, is_submitted: true, map_id: @response_map_1_2.id)
  @response_1_3 = create(:response, is_submitted: true, map_id: @response_map_1_3.id)

  @response_2_1 = create(:response, is_submitted: true, map_id: @response_map_2_1.id)
  @response_2_2 = create(:response, is_submitted: true, map_id: @response_map_2_2.id)
  @response_2_3 = create(:response, is_submitted: true, map_id: @response_map_2_3.id)

  @response_3_1 = create(:response, is_submitted: true, map_id: @response_map_3_1.id)
  @response_3_2 = create(:response, is_submitted: true, map_id: @response_map_3_2.id)
  @response_3_3 = create(:response, is_submitted: true, map_id: @response_map_3_3.id)

Relevant Methods

db_query

This is the normal db query method, call this method will return peer review grades with given assignment id. We will test this method in two aspect. 1. Test whether or not the grade return is right based on the specified algorithm. 2. We need to test the correctness of the query.

  context 'test db_query' do
    it 'return average score' do
      # reivewer_1's review for reviewee_1: [1, 2, 3, 4, 5]
      # create 5 answers for 5 related questions

      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 1)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 2)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 4)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)
      result = ReputationWebServiceController.new.db_query(1, 1, false)
      #expect to see a data array return generated by the score given.
      expect(result).to eq([[2, 1, 60.0]])
    end
  end

json_generator

This method will generate the hash format of the review, we will test this method by calling to and convert the result to json format the print to test its correctness.

  context 'test json_generator' do
    it 'test 3 reviewer for one reviewee' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      # reivewer_2's review for reviewee_1: [3, 3, 3, 3, 3]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_2.id, answer: 3)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_2.id, answer: 3)

      # reivewer_3's review for reviewee_1: [1, 1, 1, 1, 1]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_3.id, answer: 1)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_3.id, answer: 1)

      result = ReputationWebServiceController.new.json_generator(1, 0, 1)
      expect(result).to eq({"submission1"=>{"stu2"=>100.0, "stu3"=>60.0, "stu4"=>20.0}})
      #repeat for different answers
    end

    it 'test same reviewer for different reviewee' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      # reivewer_1's review for reviewee_2: [3, 3, 3, 3, 3]
      create(:answer, question_id: @question_1_1.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_2.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_3.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_4.id, response_id: @response_2_1.id, answer: 3)
      create(:answer, question_id: @question_1_5.id, response_id: @response_2_1.id, answer: 3)

      result = ReputationWebServiceController.new.json_generator(1, 0, 1)
      expect(result).to eq("submission1"=>{"stu2"=>100.0}, "submission2"=>{"stu2"=>60.0})
      #repeat for different answers
    end
  end

send_post_request

This method send a post request to peerlogic.csc.ncsu.edu/reputation/calculations/reputation_algorithms to calculate get the reputation result and use show the result in corresponding UI and update given reviewer's reputation. We will test this method based on the algorithm in the paper, first to test the result reputation value, second to test the update value in database.

Note that this method is not functioning due to the missing public & private key file for RSA encryption. The algorithm involving "expert grade" was also not implemented. Thus, this method could not be properly tested. However we have given the template to create tests for future use.

  context 'test send_post_request' do
    it 'failed because of no public key file' do
      # reivewer_1's review for reviewee_1: [5, 5, 5, 5, 5]
      # create 5 answers for 5 related questions
      create(:answer, question_id: @question_1_1.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_2.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_3.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_4.id, response_id: @response_1_1.id, answer: 5)
      create(:answer, question_id: @question_1_5.id, response_id: @response_1_1.id, answer: 5)

      # choose hammer algorithm without expert grade(intructor's given grade)
      params = {assignment_id: 1, round_num: 1, algorithm: 'hammer', checkbox: {expert_grade: "empty"}}
      session = {user: build(:instructor, id: 1)}

      expect(true).to eq(true)

      # comment out because send_post_request method request public key file while this file is missing
      # so at this time send_post_request is not functioning normally
      # if it functions correctly, it will update the reviewer's reputation score according to the selected reputation algorithm.

      # get :send_post_request, params, session
      # expect(response).to redirect_to '/reputation_web_service/client'
    end
  end

aes_encrypt & aes_decrypt

These two methods are counterparts of each other. Instead of testing them separately, we test both in the same rspect context. We generate a random string mixed with numbers and then encrypt it with the method aes_encrypt and receive the tuple [cipher, key, iv]. Then the test invokes the aes_decrypt method with the tuples to retrieve the plain text. Finally, the test checks whether the decrypted text is the same as the original random data. The test covers both the aes_encrypt and aes_decrpyt methods in the reputation web service controller.

context 'test aes_decrypt' do
  it 'return the correct plain text' do
    data = (0...8).map { (65 + rand(26)).chr }.join
    cipher, key, iv = ReputationWebServiceController.new.aes_encrypt(data)
    plain = ReputationWebServiceController.new.aes_decrypt(cipher, key, iv)
    expect(plain).to eq(data)
  end
end

Results

Testing Results

All of current examples passed the test. There are 5 examples in the reputation_web_service_controller_spec.rb file and there is no failure.

There are total 10 method in the reputation_web_service_controller.rb controller:

  1. action_allowed?
  2. db_query
  3. db_query_with_quiz_score
  4. json_generator
  5. client
  6. send_post_request
  7. rsa_public_key1
  8. rsa_private_key2
  9. aes_encrypt
  10. aes_decrypt

Our test covers 7 of them:

  1. action_allowed?
  2. db_query
  3. json_generator
  4. client
  5. send_post_request
  6. aes_encrypt
  7. aes_decrypt

The rsa_public_key1 and rsa_public_key2 methods requires the public key file public1.pem and the private key file private1.pem. However, these two files are missing and therefore we could not test the related two methods.

Coverage

The test coverage boosts from 0% to 50.31%.

There is no testing implemented for the reputation_web_service_controller.rb prior to our work and therefore the previous testing coverage is 0%.

From the coverage report generated by the Simplecov gem, there are 80 lines covered by our test. As for the residual 72 lines of code, they are related to the public key file and deprecated functions for gathering data for a paper [1] published in 2015.

The code case under the first if statement in the send_post_request method of the reputation_web_service_controller.rb should be removed. The team in Fall 2020 [2] also mentioned the issues.

If the redundant method send_post-request in the controller reputation_web_service_controller.rb is commented out, the test coverage can achieve 52.63%.

Relevant Links

  1. Github Repo
  2. Pull Request
  3. Demo Video

Future Tasks

As our team could not obtain the public/private key pair to access the reputation web service, we were only able get to the step prior to sending the JSON to the web service of reputation algorithms. Therefore, future steps are required to test on reputation system.

  1. In send_post_request, there are references to specific assignments, such as 724, 735, and 756. They were put in to gather data for a paper published in 2015. They are no longer relevant and should be removed.
  2. Implement reputation score correctness test for both Lauw's and Hamer's Algorithm assuming reputation web service available in the future.
  3. Current test cases are implemented only based on round 1 reputation scoring even though the assignment_1 and assignment_2 are subjective to be 2 rounds of review assignment. As the result of no accessibility to reputation web service, creation of round 2 object is meaningless because of the absence of round 1 reputation score. Therefore, future test cases need to stub the behavior of fulfilling assignment_questionnaire_1_2 (2nd round questionnaire) and assignment_questionnaire_2_2 (2nd round questionnaire) respectively assuming reputation web service available that time.
  4. The db_query violates the DRY principle as it repetitively calculates sum of the assignment. Such sum calculation should be handled in the assignment.rb.

Collaborators

Jinku Cui (jcui23)

Henry Chen (hchen34)

Dong Li (dli35)

Zijun Lu (zlu5)

References

  1. Expertiza on GitHub
  2. The live Expertiza website
  3. Pluggable reputation systems for peer review: A web-service approach