CSC/ECE 517 Summer 2020 - Active Learning for Review Tagging

This page provides a description of the Expertiza based independent development project.

Introduction

Background

The web application Expertiza, used by students in CSC 517 and other courses, allows students to peer-review and give suggestive comments to each other's work. Students will later be asked to voluntarily participate in an extra-credit review-tagging assignment in which they tag comments they received for helpfulness, positive tone, and other characteristics interested by researchers. Currently, students have to tag hundreds of comments they received in order to get full participation credits. Researchers are concerned that this amount of work would cause inattentive participants to submit responses deviating from what they should be, thus corrupting the established model. Therefore, by having the machine-learning algorithm pre-determine the confidence level of the asked characteristic's presence in a comment, one can ask students to assign only tags that the algorithm is unsure of, so students can focus on fewer tags with more attention and accuracy.

Problem Statement

The goal of this project is to construct a workable infrastructure for active learning, by incorporating machine-learning algorithms in evaluating which tags, by having a manual input, can help the AI learn more effectively. In particular, the following requirements are fulfilled:

Incorporate metrics analysis into the review-giving process
Reduce the number of tags students have to assign
Reveal gathered information to report pages
Update the web service to include paths to the confidence level of each prediction
Decide a proper tag certainty threshold that says how certain the ML algorithm must be of a tag value before it will ask the author to tag it manually

Notes

This project is simultaneously being held with another project named 'Integrate Suggestion Detection Algorithm.' Whereas that project focuses on forming a central outlet to external web services, this project focuses more on interpreting results from external web services.

Design

Control Flow Diagram

Peer Logic is an NSF-funded research project that provides services for educational peer-review systems. It has a set of mature machine learning algorithms and models that compute metrics on the reviews. It would be helpful for Expertiza to integrate these algorithms into the peer-review process. Specifically, we want to

1. Let students see the quality of their review before submission, and
2. Selectively query manual tagging that are used to further train the models (active learning)

In order to integrate these algorithms into the Expertiza system, we have to build a translator-like model, which we named ReviewMetricsQuery, that converts outputs from external sources into a form that our system can understand and use.

Below we show the control flow diagram to help illustrate the usage of ReviewMetricsQuery model.

The ReviewMetricsQuery class is closely tied to the peer-review process. It is first called when students finish and about to submit their reviews on other students' work. Our plan is that, before the system marks their reviews as submitted, the ReviewMetricsQuery class intercepts the content of these reviews and send them to the Peer Logic web service for predictions. After it receives the results, it caches them to the local Expertiza database and then releases the intercept. Students, instead of being redirected to the list of reviews, are presented with an analysis report on the quality of their reviews. They may go back and edit their review comments, or confirm to submit, depending on whether they are satisfied with the results displayed to them.

Every prediction from the web service comes with a confidence level, indicating how confident the algorithm is in this prediction. Whenever a student goes to the review tagging page, before rendering any tags, the system consults the ReviewMetricsQuery class to see whether the value of the tag has previously been determined by the web service and whether its confidence level exceeds the pre-set threshold. If yes, meaning the algorithm is confident of its prediction, then it applies a lightening effect onto the tag to make it less noticeable. Students who do the tagging can easily distinguish the difference between normal tags and gray-out tags, and focus their attention more on normal tags. This is what active learning is about, to query manual inputs only if it adds to the knowledge of the algorithm.

These cached data would also be used in the instructor's report views, and that's the reason why these data must be cached locally. One review consists of about 10 to 20 comments and takes about minutes to process, and a report composes of thousands of such reviews. Querying web service results in real-time is impractical with respect to the time it consumes. We limit the number of contacts with the web service the least, by sending requests only when students decide to submit their reviews. In this way, the predicted values of each tag are up to date with the stored reviews.

Database Design

The only change to the database is to add a "confidence_level" column to the existing answer_tags table which is originally used to store tags assigned by students. One can imagine results from web service being a stack of tags assigned by the outside tool, with confidence level indicating how confident the outside tool is to each tag is assigned. The answer_tags table will therefore have two types of tags, one from the student, which has user_id but not confidence_level, and the other inferred from web service, which has confidence_level but not user_id. The system can determine what type of tags they are by checking the presence of values in these two fields.

UI Design

Four pages are needed to be modified to reflect the addition of new functionality.

Metrics Analysis Page

When students click the 'submit' button on the review giving page, the button will be put on the disabled effect to prevent students from submitting requests multiple times. The consequence of submitting request multiple times is that the same set of comments are being sent to the external web service for processing, wasting resources on both sides. To reassure students that the request has been made, we add to the bottom of the 'submit' button a loader and message, asking them to wait patiently to avoid overload the system.

About half a minute after students click the 'submit' button, they are redirected to a page that shows the analysis of their submitted reviews. On that page, students can see every of their submitted comments along with the analyzed result on each metric. These metrics came from tag prompt deployments set by the instructor in a per questionnaire scope. Predictions with confidence levels under the predefined threshold are not rendered to the report so students do not see predictions that are uncertain or inaccurate. When students confirm to submit, they are returned to the list of reviews to perform other actions.

Review Tagging Page

From the above image, one can see that the slider has been changed into three forms:

The original form, meaning it needs input from the user.

The gray-out form, presenting tag inferred by the web service. Tag in this form is editable, meaning students can override some of the inferred tags if they wish.

The overridden form, which is used to represent a tag that originally has a value assigned by the web service but gets overrode by the user.

Review Report Page

In each row that represents each student, a metrics chart is added below the volume chart that's already there. Graders get useful information by looking at these two charts combined and are able to offer more accurate review grades to students. Due to the space limitation, each metric name cannot be fully expanded. Grader could hover the cursor above each bar to see its corresponding metric name.

Answer-Tagging Report Page

Changes made to this page include changing column names and adding an additional column for the number of inferred tags. Below we explained how each column is calculated.

% tags applied by author = # tags applied by author / # appliable tags
# tags applied by author = from # appliable tags, how many are tagged by the author
# tags not applied by author = # appliable tags - # tags applied by the author
# appliable tags = # tags whose comment is longer than the length threshold - # tags inferred by ML
# tags inferred by ML = # tags whose comment is predicted by the machine learning algorithm with high confidence

Implementation

Core Changes

app/models/review_metrics_query.rb

The only model class that is responsible for communications between `MetricsController` and the rest of the Expertiza system where tags are used
Added average_number_of_qualifying_comments method which returns either the average for one reviewer or the average for the whole class, depending on whether the reviewer is supplied

db/migrate/20200825210644_add_confidence_level_to_answer_tags_table.rb

Added confidence_level column to the answer_tags table

Cache Inferred Tags

app/models/answer.rb

Added de_tag_comments method which remove html tags from the submitted review comment

app/models/answer_tag.rb

Corrected typo (tag_prompt_deployment instead of tag_prompts_deployment)
Added validation clause that checks the presence of either the user_id or the confidence_level

app/controllers/metrics_controller.rb

Created an empty MetricsController class so tests could be passed

app/controllers/response_controller.rb

Alternated the redirection so students could be redirected to the analysis page after they click to submit their reviews
Added confirm_submit method which marks the review in the parameter to 'submitted'

app/views/response/analysis.html.erb

Drafted the analysis page, which shows the web service's prediction for each comment on each metric

app/views/response/response.html.erb

Added codes that disable the "Submit" button after it is being clicked

app/assets/stylesheets/response.scss

Added styles for disabled button and spinning loader

config/routes.rb

Added a confirm_submit route

Show Inferred Tags

app/models/tag_prompt.rb

Added codes to set up the style of the slider (none, gray-out, or overridden) when it is about to be rendered

app/assets/javascripts/answer_tags.js

Controlled the dynamic effect of overriding an inferred tag

app/assets/stylesheets/three_state_toogle.scss

Added styles for different forms of tags (gray-out and overridden)

Show Summary of Inferred Tags

app/models/tag_prompt_deployment.rb

Slightly changed how each column in the answer_tagging report is calculated

app/models/vm_user_answer_tagging.rb & app/helpers/report_formatter_helper.rb

Added one variable that stores the number of tags inferred by ML

app/views/reports/_answer_tagging_report.html.erb

Renamed columns in the answer_tagging report
Added a new column to the table named "# tags inferred by ML"

app/views/reports/_review_report.html.erb & app/helpers/review_mapping_helper.rb

Added an additional bar chart in each row of the review report for metrics
Fixed redirection bug

Bug Fixes

app/views/popup/view_review_scores_popup.html.erb

Fixed syntax error

app/views/versions/search.html.erb

Fixed syntax error

Test Plan

RSpec Testing

Since this project involves changes to many places of the system, some existing tests needed to be fixed. These include

spec/features/peer_review_spec.rb

The button "Submit Review" will no longer redirect students to the list of reviews. We made the test to click "Save Review" instead of "Submit Review" so the expected behavior could still be tested.

spec/models/tag_prompt_spec.rb

This spec file tests functionalities regarding TagPrompt. Some tests break because we incorporated the logic of calling the confident? method to determine the slider's style. We fixed these tests by always letting the confident? method to return false, so as to strip that part of logic out of testing.

Two new spec files are written for the new code:

spec/models/review_metrics_query_spec.rb

Ensure the cache_ws_results method calls MetricsController with the right parameters and saves the results into the answer_tags table.
Ensure the inferred_value method interprets the web service results correctly
Ensure the inferred_confidence method flips the confidence value for predictions that have a negative meaning
Ensure confident?, confidence(), and has? methods access the right column in the answer_tags table.

UI Testing

Following UI tests were done to ensure the following:

The 'Submit' button is disabled after the student clicks it to prevent multiple queries to the web service.
The student gets redirected to the analysis page after the web service request completes.
The student sees the analysis of their review comments on the analysis page.
The slider for inferred tags are gray-out.
The student can override the gray-out tag with a new value, and the slider changes to the overridden style.
The instructor sees the new bar chart for review metrics.
The instructor sees the column summarizing inferred tags in the answer-tagging report.

Reference

Yulin Zhang (yzhan114@ncsu.edu)

CSC/ECE 517 Summer 2020 - Active Learning for Review Tagging

Contents

Introduction

Background

Problem Statement

Notes

Design

Control Flow Diagram

Database Design

UI Design

Metrics Analysis Page

Review Tagging Page

Review Report Page

Answer-Tagging Report Page

Implementation

Core Changes

Cache Inferred Tags

Show Inferred Tags

Show Summary of Inferred Tags

Bug Fixes

Test Plan

RSpec Testing

UI Testing

Reference

Navigation menu

CSC/ECE 517 Summer 2020 - Active Learning for Review Tagging

Introduction

Background

Problem Statement

Notes

Design

Control Flow Diagram

Database Design

UI Design

Metrics Analysis Page

Review Tagging Page

Review Report Page

Answer-Tagging Report Page

Implementation

Core Changes

Cache Inferred Tags

Show Inferred Tags

Show Summary of Inferred Tags

Bug Fixes

Test Plan

RSpec Testing

UI Testing

Reference

Navigation menu

Search