CSC/ECE 517 Summer 2020 - Active Learning for Review Tagging
This page provides a description of the Expertiza based independent development project.
Introduction
Background
The web application Expertiza, used by students in CSC 517 and other courses, allows students to peer-review and give suggestive comments to each other's work. Students will later be asked to voluntarily participate in an extra-credit review-tagging assignment in which they tag comments they received for helpfulness, positive tone, and other characteristics interested by researchers. Currently, students have to tag hundreds of comments they received in order to get full participation credits. Researchers are concerned that this amount of work would cause inattentive participants to submit responses deviating from what they should be, thus corrupting the established model. Therefore, by having the machine-learning algorithm pre-determine the confidence level of the asked characteristic's presence in a comment, one can ask students to assign only tags that the algorithm is unsure of, so students can focus on fewer tags with more attention and accuracy.
Problem Statement
The goal of this project is to construct a workable infrastructure for active learning, by incorporating machine-learning algorithms in evaluating which tags, by having a manual input, can help the AI learn more effectively. In particular, the following requirements are fulfilled:
- Incorporate metrics analysis into the review-giving process
- Reduce the number of tags students have to assign
- Reveal gathered information to report pages
- Update the web service to include paths to the confidence level of each prediction
- Decide a proper tag certainty threshold that says how certain the ML algorithm must be of a tag value before it will ask the author to tag it manually
Notes
This project is simultaneously being held with another project named 'Integrate Suggestion Detection Algorithm.' Whereas that project focuses on forming a central outlet to external web services, this project focuses more on interpreting results from external web services.
Design
Control Flow Diagram
Peer Logic is an NSF-funded research project that provides services for educational peer-review systems. It has a set of mature machine learning algorithms and models that compute metrics on the reviews. It would be helpful for Expertiza to integrate these algorithms into the peer-review process. Specifically, we want to
- Let students see the quality of their review before submission, and
- Selectively query manual tagging that are used to further train the models (active learning)
In order to integrate these algorithms into the Expertiza system, we have to build a translator-like model, which we named ReviewMetricsQuery
, that converts outputs from external sources into a form that our system can understand and use.
Below we show the control flow diagram to help illustrate the usage of ReviewMetricsQuery
model.
The ReviewMetricsQuery
class is closely tied to the peer-review process. It is first called when students finish and about to submit their reviews on other students' work. Our plan is that, before the system marks their reviews as submitted, the ReviewMetricsQuery
class intercepts the content of these reviews and send them to the Peer Logic web service for predictions. After it receives the results, it caches them to the local Expertiza database and then releases the intercept. Students, instead of being redirected to the list of reviews, are presented with an analysis report on the quality of their reviews. They may go back and edit their review comments, or confirm to submit, depending on whether they are satisfied with the results displayed to them.
Every prediction from the web service comes with a confidence level, indicating how confident the algorithm is in this prediction. Whenever a student goes to the review tagging page, before rendering any tags, the system consults the ReviewMetricsQuery
class to see whether the value of the tag has previously been determined by the web service and whether its confidence level exceeds the pre-set threshold. If yes, meaning the algorithm is confident of its prediction, then it applies a lightening effect onto the tag to make it less noticeable. Students who do the tagging can easily distinguish the difference between normal tags and gray-out tags, and focus their attention more on normal tags. This is what active learning is about, to query manual inputs only if it adds to the knowledge of the algorithm.
These cached data would also be used in the instructor's report views, and that's the reason why these data must be cached locally. One review consists of about 10 to 20 comments and takes about minutes to process, and a report composes of thousands of such reviews. Querying web service results in real-time is impractical with respect to the time it consumes. We limit the number of contacts with the web service the least, by sending requests only when students decide to submit their reviews. In this way, the predicted values of each tag are up to date with the stored reviews.
Database Design
The only change to the database is to add a "confidence_level" column to the existing answer_tags table which is originally used to store tags assigned by students. One can imagine results from web service being a stack of tags assigned by the outside tool, with confidence level indicating how confident the outside tool is to each tag is assigned. The answer_tags table will therefore have two types of tags, one from the student, which has user_id but not confidence_level, and the other inferred from web service, which has confidence_level but not user_id. The system can determine what type of tags they are by checking the presence of values in these two fields.
UI Design
Four pages are needed to be modified to reflect the addition of new functionality.
Metrics Analysis Page
When students click the 'submit' button on the review giving page, the button will be put on the disabled effect to prevent students from submitting requests multiple times. The consequence of submitting request multiple times is that the same set of comments are being sent to the external web service for processing, wasting resources on both sides. To reassure students that the request has been made, we add to the bottom of the 'submit' button a loader and message, asking them to wait patiently to avoid overload the system.
About half a minute after students click the 'submit' button, they are redirected to a page that shows the analysis of their submitted reviews. On that page, students can see every of their submitted comments along with the analyzed result on each metric. These metrics came from tag prompt deployments set by the instructor in a per questionnaire scope. Predictions with confidence levels under the predefined threshold are not rendered to the report so students do not see predictions that are uncertain or inaccurate. When students confirm to submit, they are returned to the list of reviews to perform other actions.
Review Tagging Page
From the above image, one can see that the slider has been changed into three forms:
The original form, meaning it needs input from the user.
The gray-out form, presenting tag inferred by the web service. Tag in this form is editable, meaning students can override some of the inferred tags if they wish.
The overridden form, which is used to represent a tag that originally has a value assigned by the web service but gets overrode by the user.
Review Report Page
In each row that represents each student, a metrics chart is added below the volume chart that's already there. Graders get useful information by looking at these two charts combined and are able to offer more accurate review grades to students. Due to the space limitation, each metric name cannot be fully expanded. Grader could hover the cursor above each bar to see its corresponding metric name.
Answer-Tagging Report Page
Changes made to this page include changing column names and adding an additional column for the number of inferred tags. Below we explained how each column is calculated.
- % tags applied by author = # tags applied by author / # appliable tags
- # tags applied by author = from # appliable tags, how many are tagged by the author
- # tags not applied by author = # appliable tags - # tags applied by the author
- # appliable tags = # tags whose comment is longer than the length threshold - # tags inferred by ML
- # tags inferred by ML = # tags whose comment is predicted by the machine learning algorithm with high confidence
Implementation
Core Changes
app/models/review_metrics_query.rb
- The only model class that is responsible for communications between `MetricsController` and the rest of the Expertiza system where tags are used
- Added
average_number_of_qualifying_comments
method which returns either the average for one reviewer or the average for the whole class, depending on whether the reviewer is supplied
db/migrate/20200825210644_add_confidence_level_to_answer_tags_table.rb
- Added
confidence_level
column to theanswer_tags
table
Cache Inferred Tags
app/models/answer.rb
- Added
de_tag_comments
method which remove html tags from the submitted review comment
app/models/answer_tag.rb
- Corrected typo (tag_prompt_deployment instead of tag_prompts_deployment)
- Added validation clause that checks the presence of either the
user_id
or theconfidence_level
app/controllers/metrics_controller.rb
- Created an empty
MetricsController
class so tests could be passed
app/controllers/response_controller.rb
- Alternated the redirection so students could be redirected to the analysis page after they click to submit their reviews
- Added
confirm_submit
method which marks the review in the parameter to 'submitted'
app/views/response/analysis.html.erb
- Drafted the analysis page, which shows the web service's prediction for each comment on each metric
app/views/response/response.html.erb
- Added codes that disable the "Submit" button after it is being clicked
app/assets/stylesheets/response.scss
- Added styles for disabled button and spinning loader
config/routes.rb
- Added a
confirm_submit
route
Show Inferred Tags
app/models/tag_prompt.rb
- Added codes to set up the style of the slider (none, gray-out, or overridden) when it is about to be rendered
app/assets/javascripts/answer_tags.js
- Controlled the dynamic effect of overriding an inferred tag
app/assets/stylesheets/three_state_toogle.scss
- Added styles for different forms of tags (gray-out and overridden)
Show Summary of Inferred Tags
app/models/tag_prompt_deployment.rb
- Slightly changed how each column in the answer_tagging report is calculated
app/models/vm_user_answer_tagging.rb & app/helpers/report_formatter_helper.rb
- Added one variable that stores the number of tags inferred by ML
app/views/reports/_answer_tagging_report.html.erb
- Renamed columns in the answer_tagging report
- Added a new column to the table named "# tags inferred by ML"
app/views/reports/_review_report.html.erb & app/helpers/review_mapping_helper.rb
- Added an additional bar chart in each row of the review report for metrics
- Fixed redirection bug
Bug Fixes
app/views/popup/view_review_scores_popup.html.erb
- Fixed syntax error
app/views/versions/search.html.erb
- Fixed syntax error
Test Plan
RSpec Testing
Since this project involves changes to many places of the system, some existing tests needed to be fixed. These include
spec/features/peer_review_spec.rb
- The button "Submit Review" will no longer redirect students to the list of reviews. We made the test to click "Save Review" instead of "Submit Review" so the expected behavior could still be tested.
spec/models/tag_prompt_spec.rb
- This spec file tests functionalities regarding
TagPrompt
. Some tests break because we incorporated the logic of calling theconfident?
method to determine the slider's style. We fixed these tests by always letting theconfident?
method to return false, so as to strip that part of logic out of testing.
Two new spec files are written for the new code:
spec/models/review_metrics_query_spec.rb
- Ensure the
cache_ws_results
method callsMetricsController
with the right parameters and saves the results into theanswer_tags
table. - Ensure the
inferred_value
method interprets the web service results correctly - Ensure the
inferred_confidence
method flips the confidence value for predictions that have a negative meaning - Ensure
confident?
,confidence()
, andhas?
methods access the right column in theanswer_tags
table.
UI Testing
Following UI tests were done to ensure the following:
- The 'Submit' button is disabled after the student clicks it to prevent multiple queries to the web service.
- The student gets redirected to the analysis page after the web service request completes.
- The student sees the analysis of their review comments on the analysis page.
- The slider for inferred tags are gray-out.
- The student can override the gray-out tag with a new value, and the slider changes to the overridden style.
- The instructor sees the new bar chart for review metrics.
- The instructor sees the column summarizing inferred tags in the answer-tagging report.
Reference
Yulin Zhang (yzhan114@ncsu.edu)