CSC/ECE 517 Summer 2020 - Active Learning for Review Tagging: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 192: Line 192:
# The student gets redirected to the analysis page after the web service request completes.
# The student gets redirected to the analysis page after the web service request completes.
# The student sees the analysis of their review comments on the analysis page.
# The student sees the analysis of their review comments on the analysis page.
# The slider for inferred tags are gray-out.
# The slider for inferred tags is gray-out.
# The student can override the gray-out tag with a new value, and the slider changes to the overridden style.
# The student can override the gray-out tag with a new value, and the slider changes to the overridden style.
# The instructor sees the new bar chart for review metrics.
# The instructor sees the new bar chart for review metrics.

Revision as of 20:31, 21 November 2020

This page provides a description of the Expertiza based independent development project.

Introduction

Background

The web application Expertiza, used by students in CSC 517 and other courses, allows students to peer-review and give suggestive comments to each other's work. Students will later be asked to voluntarily participate in an extra-credit review-tagging assignment in which they tag comments they received for helpfulness, positive tone, and other characteristics interested by researchers. Currently, students have to tag hundreds of comments they received in order to get full participation credits. Researchers are concerned that this amount of work would cause inattentive participants to submit responses deviating from what they should be, thus corrupting the established model. Therefore, by having the machine-learning algorithm pre-determine the confidence level of the asked characteristic's presence in a comment, one can ask students to assign only tags that the algorithm is unsure of, so students can focus on fewer tags with more attention and accuracy.

Problem Statement

The goal of this project is to construct a workable infrastructure for active learning, by incorporating machine-learning algorithms in evaluating which tags, by having a manual input, can help the AI learn more effectively. In particular, the following requirements are fulfilled:

  • Incorporate metrics analysis into the review-giving process
  • Reduce the number of tags students have to assign
  • Reveal gathered information to report pages
  • Update the web service to include paths to the confidence level of each prediction
  • Decide a proper tag certainty threshold that says how certain the ML algorithm must be of a tag value before it will ask the author to tag it manually

Notes

This project is simultaneously being held with another project named 'Integrate Suggestion Detection Algorithm.' Whereas that project focuses on forming a central outlet to external web services, this project focuses more on interpreting results from external web services.

Design

Control Flow Diagram

Peer Logic is an NSF-funded research project that provides services for educational peer-review systems. It has a set of mature machine learning algorithms and models that compute metrics on the reviews. It would be helpful for Expertiza to integrate these algorithms into the peer-review process. Specifically, we want to

    1. Let students see the quality of their review before submission, and
    2. Selectively query manual tagging that are used to train the models (active learning) further

In order to integrate these algorithms into the Expertiza system, we have to build a translator-like model, which we named ReviewMetricsQuery, that converts outputs from external sources into a form that our system can understand and use.

Below we show the control flow diagram to help illustrate the usage of the ReviewMetricsQuery model.

The ReviewMetricsQuery class is closely tied to the peer-review process. It first gets called when students finish and about to submit their reviews on other students' work. Before the system marks their reviews as submitted, we plan that the ReviewMetricsQuery class intercepts these reviews' content and sends them to the Peer Logic web service for predictions. After it receives the predicted results, it caches them to the local Expertiza database and then releases the intercept. Instead of being redirected to the list of reviews, students are presented with an analysis report on the quality of their reviews. They may go back and edit their review comments or confirm to submit, depending on whether they are satisfied with the results displayed to them.

Every prediction from the web service comes with a confidence level, indicating how confident the algorithm is about this prediction. Whenever a student visits the review tagging page, before rendering any tags, the system consults the ReviewMetricsQuery class to check whether the web service has previously determined the value of the tag and whether its confidence level exceeds the pre-set threshold. If yes, meaning the algorithm is confident about its prediction, it applies a lightening effect onto the tag to make it less noticeable. Students who do the tagging can easily distinguish between normal tags and gray-out tags and focus their attention more on normal tags. This is what active learning is about, to query manual inputs only if it adds to the algorithm's knowledge.

These cached data would also be used in the instructor's report views, and that's why we need these data to be cached locally. One review consists of about 10 to 20 comments and takes about minutes to process, and a report composes thousands of such reviews. Querying web service results in real-time is impractical concerning the time it consumes. We limit the number of contacts with the web service the least by sending requests only when students decide to submit their reviews. In this way, the predicted values of each tag are up to date with the stored reviews.

Database Design

The only change to the database is to add a confidence_level column to the existing answer_tags table, which is originally used to store tags assigned by students. One can imagine results from web service to be a stack of tags assigned by the outside tool, with a confidence level indicating how confident the outside tool is to each tag it assigns. Therefore, the answer_tags table will have two types of tags, one from the student, which has the user_id but not the confidence_level, and the other inferred from web service, which has the confidence_level but not the user_id. The system can determine what type of tags they are by checking the presence of values in these two fields.

answer_tags table
id answer_id tag_prompt_deployment_id user_id value confidence_level created_at updated_at
1 1066685 5 7513 0 NULL 2017-11-04 02:32:59 2017-11-04 02:32:59
353916 1430431 149 NULL -1 0.99700 2020-11-16 01:47:29 2020-11-16 01:47:29

UI Design

Four pages are needed to be modified to reflect the addition of new functionality.

Metrics Analysis Page

When students click the 'submit' button on the review-giving page, the button will be put on the disabled effect to prevent students from submitting requests multiple times. The consequence of submitting requests multiple times is that the same set of comments are sent to the external web service for processing, wasting resources on both sides. To reassure students that the request has been made, we add a loader and a message to the bottom of the 'submit' button, asking them to wait patiently to avoid overload the system.

About a minute after students click the 'submit' button, they are redirected to a page that shows the analysis of their submitted reviews. Students can see every of their submitted comments along with the analyzed result on each metric. These metrics came from tag prompt deployments set by the instructor in a per questionnaire scope. Predictions with the confidence level under the predefined threshold are not rendered in the report, so students do not see uncertain or inaccurate predictions. When students confirm to submit, they return to the list of reviews to perform other actions.

Review Tagging Page

From the above image, one can see that the slider has been changed into three forms:

The original form, meaning it needs input from the user.

The gray-out form, presenting tag inferred by the web service. The tag in this form is still editable, meaning students can override some of the inferred tags if they wish to.

The overridden form, which is used to represent a tag that originally has a value assigned by the web service but gets overrode by the user.

Review Report Page

In each row representing a student, a metrics chart is added below the volume chart that is already there. Graders get useful information by looking at these two charts combined and can offer more accurate feedback to students. Due to the space limitation, each metric name cannot be fully expanded. Grader could hover the cursor over each bar to view its corresponding metric name.

Answer-Tagging Report Page

Changes made to this page include changing column names and adding a column for the number of inferred tags. Below we explained how each column is calculated.

  • % tags applied by author = # tags applied by author / # appliable tags
  • # tags applied by author = from # appliable tags, how many are tagged by the author
  • # tags not applied by author = # appliable tags - # tags applied by the author
  • # appliable tags = # tags whose comment is longer than the length threshold - # tags inferred by ML
  • # tags inferred by ML = # tags whose comment is predicted by the machine learning algorithm with high confidence

Implementation

Core Changes

app/models/review_metrics_query.rb

  • The only model class that is responsible for communications between MetricsController and the rest of the Expertiza system where tags are used
  • Added average_number_of_qualifying_comments method, which returns either the metric average for one reviewer or the metric average for the whole class, depending on whether the reviewer is supplied

db/migrate/20200825210644_add_confidence_level_to_answer_tags_table.rb

  • Added confidence_level column to the answer_tags table

Cache Inferred Tags

app/models/answer.rb

  • Added de_tag_comments method which strips html tags from the submitted review comment

app/models/answer_tag.rb

  • Corrected typo (tag_prompt_deployment instead of tag_prompts_deployment)
  • Added validation clause that checks the presence of either the user_id or the confidence_level

app/controllers/metrics_controller.rb

  • Created an empty MetricsController class so tests could pass

app/controllers/response_controller.rb

  • Alternated the redirection so students could be redirected to the analysis page after they click to submit their reviews
  • Added confirm_submit method which marks the review in the parameter as 'submitted'

app/views/response/analysis.html.erb

  • Drafted the analysis page, which shows the web service's prediction for each comment on each metric

app/views/response/response.html.erb

  • Added codes that disable the "Submit" button after it is clicked

app/assets/stylesheets/response.scss

  • Added styles for the disabled button and the spinning loader

config/routes.rb

  • Added a confirm_submit route

Show Inferred Tags

app/models/tag_prompt.rb

  • Added codes to set up the style of the slider (none, gray-out, or overridden) when it is about to be rendered

app/assets/javascripts/answer_tags.js

  • Controlled the dynamic effect of overriding an inferred tag

app/assets/stylesheets/three_state_toogle.scss

  • Added styles for different forms of tags (gray-out and overridden)

Show Summary of Inferred Tags

app/models/tag_prompt_deployment.rb

  • Slightly changed how each column in the answer-tagging report is calculated

app/models/vm_user_answer_tagging.rb & app/helpers/report_formatter_helper.rb

  • Added one variable that stores the number of tags inferred by ML

app/views/reports/_answer_tagging_report.html.erb

  • Renamed columns in the answer-tagging report
  • Added a new column to the table named "# tags inferred by ML"

app/views/reports/_review_report.html.erb & app/helpers/review_mapping_helper.rb

  • Added a bar chart in each row of the review report for metrics

Test Plan

RSpec Testing

Since this project involves changes to many places of the system, some existing tests needed to be fixed. These include

spec/features/peer_review_spec.rb

  • The button "Submit Review" will no longer redirect students to the list of reviews. We made the test to click "Save Review" instead of "Submit Review" so the expected behavior could still be tested.

spec/models/tag_prompt_spec.rb

  • This spec file tests functionalities regarding TagPrompt. Some tests break because we incorporated the logic of calling the confident? method to determine the slider's style. We fixed these tests by always letting the confident? method to return false, so as to strip that part of logic out of testing.

Two new spec files are written for the new code:

spec/models/review_metrics_query_spec.rb

  • Ensure the cache_ws_results method calls MetricsController with the right parameters and saves the results into the answer_tags table.
  • Ensure the inferred_value method interprets the web service results correctly
  • Ensure the inferred_confidence method flips the confidence value for predictions that have a negative meaning
  • Ensure confident?, confidence(), and has? methods access the right column in the answer_tags table.

UI Testing

Following UI tests were done to ensure the following:

  1. The 'Submit' button is disabled after the student clicks it to prevent multiple web service queries.
  2. The student gets redirected to the analysis page after the web service request completes.
  3. The student sees the analysis of their review comments on the analysis page.
  4. The slider for inferred tags is gray-out.
  5. The student can override the gray-out tag with a new value, and the slider changes to the overridden style.
  6. The instructor sees the new bar chart for review metrics.
  7. The instructor sees the column summarizing inferred tags in the answer-tagging report.

Reference

Yulin Zhang (yzhan114@ncsu.edu)