Design for Automatic Evaluation of Peer Reviews: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 11: Line 11:
Students learn best by doing reviews.  So, maybe you want the LLM to teach students to review.
Students learn best by doing reviews.  So, maybe you want the LLM to teach students to review.


allow instructors to '''view an evaluation of each reviewer’s overall reviewing performance'''.
We also want to show instructors '''an evaluation of each reviewer’s overall reviewing performance'''.
Rather than looking at individual reviews, instructors will get a '''summary report''' describing how well each reviewer performed overall.
Rather than looking at individual reviews, instructors will get a '''summary report''' describing how well each reviewer performed overall.
The evaluation includes aspects like quality of comments, score consistency, engagement, and other rubric-related metrics.
The evaluation includes aspects like quality of comments, score consistency, engagement, and other rubric-related metrics.
Line 20: Line 20:
== How it is going to be added to Expertiza ==
== How it is going to be added to Expertiza ==


The "Evaluate using LLM" option has been added to the '''dropdown menu''' alongside Review Report, Author Feedback Report, etc.
We can add an "Evaluate using LLM" option to the '''dropdown menu''' alongside Review Report, Author Feedback Report, etc.


When the instructor selects "Evaluate using LLM" and clicks View:
When the instructor selects "Evaluate using LLM" and clicks View:

Latest revision as of 18:15, 16 July 2025

Evaluate Using LLMs Integration

What we want to do

The goal of this project is to integrate LLMs with the peer-assessment process to improve learning. This can be done in various ways, including

  • Letting the LLM rate the submission and grade reviewers on how close their review is to the LLM’s (LLM as oracle).
  • Using the LLM to read a review and give the reviewer advice on how to improve it (LLM as advisor).
  • Using the LLM to rate the reviews and use that metric to weight reviewers’ score in calculating a grade (LLM as reputation system)
  • Letting the LLM to do the review and ask reviewers (or authors) if they agree with the LLM and why or why not (LLM as opening salvo)

Students learn best by doing reviews. So, maybe you want the LLM to teach students to review.

We also want to show instructors an evaluation of each reviewer’s overall reviewing performance. Rather than looking at individual reviews, instructors will get a summary report describing how well each reviewer performed overall. The evaluation includes aspects like quality of comments, score consistency, engagement, and other rubric-related metrics. This report will be generated using a Large Language Model (LLM), such as GPT.

Thus, instructors and TAs will be able to edit, overwrite, and finalize reviewer evaluation based on the LLM-generated suggestions.

How it is going to be added to Expertiza

We can add an "Evaluate using LLM" option to the dropdown menu alongside Review Report, Author Feedback Report, etc.

When the instructor selects "Evaluate using LLM" and clicks View:

  • Review data (responses, reviewer info, reviewee info, scores, comments, etc.) is gathered.
  • The data is sent to an external API (currently stubbed with fake data).
  • The returned evaluation is populated into an editable table view similar to the existing Review Report.

Classes

  • LlmEvaluationService: Located in `app/services/llm_evaluation_service.rb`. Handles outbound API requests and inbound responses.
  • ReportsController: New method `llm_evaluation_report` added to generate the LLM evaluation page.

Web Service(s)

  • External API call: Currently stubbed with fake data.
  • View partial: `_llm_evaluation_report.html.erb` created to display editable evaluations.

How much of it is implemented now

The following parts are fully working:

  • "Evaluate using LLM" dropdown option in the _searchbox.html.erb partial.
  • Routing to the correct controller action (`llm_evaluation_report`).
  • Service object (`LlmEvaluationService`) created to collect and send review data.
  • Stubbed API returning a hardcoded response.
  • Editable report table rendered via a new partial called _llm_evaluation_report.html.erb.
  • "Overwrite" button added (placeholder functionality). Modify the submit method to overwrite the already existing grades and comments in the reviews_grade table with the LLM generated grades and comments.

Thus, the feature end-to-end works with fake data for now.

How to continue development

To complete this project:

  • Connect to the real LLM API instead of fake responses.
  • Work on the schema for reviews_grade table to incorporate the LLM generated scores and feedback in a new column.
  • Implement saving functionality for the "Overwrite" button which will overwrite the grade_for_reviewer and comment_for_reviewer that already exists with the LLM generated grades and comments.
  • Add robust error handling (for timeout, API errors, etc.).
  • Extend support for multiple rounds and varying rubrics.
  • Add RSpec tests for the service and controller logic.


Pull request details

The pull request contains the following:

  • New dropdown entry: Evaluate using LLM

A new option titled "Evaluate using LLM" has been added alongside existing report types (such as Review report, Author feedback report) in the reports searchbox partial (`_searchbox.html.erb`). This allows instructors and TAs to select the LLM-based evaluation report from the interface just like any other report type.

  • New controller action: llm_evaluation_report

The `llm_evaluation_report` method has been introduced inside the `ReportsController`. This action is responsible for gathering the necessary assignment and participant data, invoking the service object to fetch LLM-evaluated responses, and rendering the new LLM Evaluation Report view for the instructor or TA.

  • New service object: LlmEvaluationService

A service class `LlmEvaluationService` has been created under `app/services/llm_evaluation_service.rb`. This service collects review and reviewer information from the database, formats it into the expected API request payload, sends a POST request to an external (currently dummy) LLM evaluation API using `HTTParty`, and parses the received response into a structured format that can be easily rendered in the view.

  • Dummy API interaction using HTTParty

For now, the API interaction is stubbed using a static JSON response that simulates an LLM’s feedback. This stubbed interaction ensures the full data flow is functional even without a live backend service. The dummy API response provides evaluation metrics such as reviewer scores, average scores, and LLM-generated comments for the reviews.

  • New view partial: _llm_evaluation_report.html.erb

A new view partial `_llm_evaluation_report.html.erb` has been created to display the LLM evaluation report. The styling and table structure closely follow the traditional Review Report design. It presents reviewer names, the number of reviews completed, teams reviewed, scores (both awarded and average), review volume metrics, and editable input fields for grades and comments.

  • Editable fields populated with API-returned evaluation data

The fields for assigning grades and writing comments are populated directly from the LLM-evaluated API data. These fields remain editable so that instructors and TAs can modify the suggested grades and comments before deciding to overwrite and save them manually if needed.

  • "Overwrite" button for future saving

Each review entry now features an "Overwrite" button that is intended to allow instructors to save changes made to the LLM-suggested evaluations. Currently, this button is connected to a placeholder action, and future development will implement functionality to update and persist these evaluations into the database.

  • Updated routes and views to integrate the feature cleanly

The `routes.rb` file was modified to define a route for the new `llm_evaluation_report` controller action. Additionally, `response_report.html.haml` was updated to render the `_llm_evaluation_report.html.erb` partial when the user selects the "Evaluate using LLM" report type, ensuring a seamless integration into the existing reporting infrastructure.

The pull request provides a working prototype ready to be connected to a real API.