Report generation framework(WIP)
Report Generation
Design and Implementation Documentation
Expertiza Reimplementation Back-End
Overview
The reporting subsystem was ported from the original Expertiza codebase (referred to as Repo X) into the reimplemented Rails API back-end (referred to as Repo Y) and redesigned in the process.
Repo X used a Rails helper module (ReportFormatterHelper) that
assigned instance variables, such as @reviewers and
@review_scores, for ERB views.
Since Repo Y is a JSON API with no views, instance variables are not applicable. The architecture was therefore redesigned around a streaming reduce pipeline that returns plain Ruby hashes rendered as JSON.
Design Goals
- Avoid loading entire result sets into memory at once.
- Keep domain-specific computation out of the base class.
- Make each report type independently composable and testable.
- Fix N+1 query patterns present in Repo X.
Anti-Patterns Addressed
The design directly addresses two anti-patterns identified during architectural review of Repo X.
The fetch_responses Anti-Pattern
Repo X loaded all response records into an unnamed, ad-hoc array before processing:
# WRONG
responses = fetch_responses # full memory load
grouped = group(responses)
metrics = compute_metrics(grouped)
This forces the entire result set into Ruby memory, prevents streaming, and makes the intermediate structure implicit.
The fix is to never materialise all rows at once. Instead, use
find_each to stream records in batches, so memory usage scales
with the number of groups, not the number of raw rows.
Default Metrics in the Base Class
Repo X placed domain-specific math directly in the base class:
# WRONG — in BaseReport
def compute_metrics(grouped)
grouped.transform_values do |responses|
{
count: responses.size,
avg_score: responses.map(&:score).sum / responses.size.to_f
}
end
end
This ties every subclass to one particular shape of computation. The fix is to move all domain math into per-report accumulator logic. The base class contains zero domain math.
What "Domain Math" Means
Domain math refers to the business-logic calculations specific to a particular report type — the actual formulas and aggregations that answer what the report is trying to show.
It is called "domain" math because it belongs to the problem domain (peer assessment), not to the generic pipeline machinery.
Each report type has its own domain math:
| Report | Its domain math |
|---|---|
| Review scores | (raw_score / max_score) * 100 — percentage score per reviewer per round
|
| Avg/ranges | max, min, sum / size — score aggregates across a team's reviewers
|
| Feedback | Bucket response IDs into round_1, round_2, round_3 arrays
|
| Bookmark rating | Collect distinct bookmark IDs into a Set
|
Notice that these are completely different in shape: one computes a percentage, another computes min/max/avg, and another just collects IDs.
If avg_score lived in BaseReport, every subclass
would either inherit math it does not need — for example, a bookmark report has
no scores — or be forced to override the method just to suppress it.
The rule is therefore:
> BaseReport only defines how to run the pipeline — stream, group, fold, finalize.
> The what to compute belongs entirely inside each report's own accumulate and finalize methods.
System Diagrams [To be updated]
Overall Request Flow
The diagram below shows how an incoming HTTP request travels through the system from the controller down to the pipeline and back out as JSON.
Since MediaWiki does not render TikZ directly, this diagram is represented as a text-based flow.
HTTP Client (Front-end)
|
| GET /reports/response_report?assignment_id=&type=
v
ReportsController#response_report
|
| params[:type]
v
REPORT_CLASSES[type]
look up concrete class
|
| .new(assignment).run
v
Concrete Report
for example, FeedbackReport
|
| inherits run
v
BaseReport#run
inherited find_each streaming loop
|
| calls subclass methods
v
source -> grouper -> accumulate -> finalize
|
| output hash
v
render json: { ... }
|
| JSON response
v
HTTP Client (Front-end)
Figure: End-to-end request flow for report generation
Pipeline Internals
This diagram shows the four stages inside BaseReport#run.
The stages are defined by each concrete subclass; the pipeline loop itself
never changes.
+------------------+ +------------------+
| 1. Source | rows | 2. Grouper |
| AR relation | -----> | lambda: row->key |
| streamed via | | e.g. reviewer_id |
| find_each | +------------------+
+------------------+ |
| key, row
v
+------------------+ +------------------+
| 4. Finalize | state | 3. Accumulate |
| shape state into | <----- | fold row into |
| output hash | | state |
+------------------+ | domain math here |
| +------------------+
|
v
Hash -> JSON
Additional notes:
sourceusesincludes(...)where needed to avoid N+1 queries.accumulatehandles scores, deduplication, bucketing, and counting depending on the report type.
Figure: The four stages every report passes through inside BaseReport#run
Class Hierarchy
This diagram shows how concrete report classes relate to BaseReport.
Solid inheritance from BaseReport is represented by indentation.
Composition from ReviewReport to its inner pipelines is represented
under the coordinator.
BaseReport
|
|-- ReviewReport (coordinator)
| |
| |-- ReviewersPipeline
| |-- ScoresPipeline
| |-- AvgRangesPipeline
|
|-- FeedbackReport
|-- TeammateReviewReport
|-- BookmarkRatingReport
|-- BasicReport
Legend:
- Direct child under
BaseReport= inheritsBaseReport - Pipelines under
ReviewReport= coordinator runs inner pipeline - The inner pipelines also inherit
BaseReport
Figure: Class hierarchy for the report generation subsystem
Architecture: The Streaming Pipeline
Pipeline Shape
All reports are built on a single pipeline template defined in
Reports::BaseReport:
def run
state = initial_state
source.find_each(batch_size: 500) do |row|
accumulate(state, grouper.call(row), row)
end
finalize(state)
end
The pipeline consists of four concerns:
| Concern | Responsibility |
|---|---|
| Source | ActiveRecord relation streamed via find_each. Subclasses use includes(...) to eagerly load associations and prevent N+1 queries.
|
| Grouper | A lambda (row) -> key that determines how rows are bucketed in the accumulator state.
|
| Accumulate | Folds one row into the state in place. Contains all domain-specific math for that report. |
| Finalize | Post-processes the finished state into the output hash. Default implementation returns state unchanged. |
Memory Model
State grows proportional to the number of groups, such as the number of distinct reviewers, not the number of raw rows.
For example, a dataset with 10,000 responses across 20 reviewers keeps only 20 entries in the accumulator state at any point.
Base Class Definition
module Reports
class BaseReport
def initialize(assignment)
@assignment = assignment
end
def run
state = initial_state
source.find_each(batch_size: 500) do |row|
accumulate(state, grouper.call(row), row)
end
finalize(state)
end
private
def source = raise NotImplementedError
def grouper = raise NotImplementedError
def initial_state = raise NotImplementedError
def accumulate(_state, _key, _row)
raise NotImplementedError
end
def finalize(state) = state
end
end
Controller
Dispatch Mechanism
The controller uses a constant hash to map type strings to report classes,
replacing the send(type) meta-programming pattern used in Repo X.
REPORT_CLASSES = {
'review_response_map' => Reports::ReviewReport,
'feedback_response_map' => Reports::FeedbackReport,
'teammate_review_response_map' => Reports::TeammateReviewReport,
'bookmark_rating_response_map' => Reports::BookmarkRatingReport,
'basic' => Reports::BasicReport
}.freeze
def response_report
type = params.dig(:report, :type) || params[:type] || 'basic'
report_class = REPORT_CLASSES[type]
unless report_class
return render json: { error: "Unknown report type: #{type}" },
status: :unprocessable_entity
end
assignment = Assignment.find(params[:assignment_id] || params[:id])
data = report_class.new(assignment).run
render json: { type: type, assignment_id: assignment.id }.merge(data)
end
Route
GET /reports/response_report?assignment_id=<id>&type=<type>
POST /reports/response_report
Report Implementations
Review Report (review_response_map)
The review report is the most complex. It is implemented as a coordinator
class (ReviewReport) that runs three independent inner pipelines
and merges their results.
| Pipeline | Source | Groups by | Produces |
|---|---|---|---|
ReviewersPipeline
|
ReviewResponseMap
|
reviewer_id
|
Sorted reviewer list |
ScoresPipeline
|
Response JOIN map
|
reviewer_id
|
Score percentage per round/reviewee |
AvgRangesPipeline
|
Response JOIN map
|
[reviewee_id, round]
|
Max/min/avg per team/round |
N+1 Fix: Precomputed Max Question Score
Repo X called response.maximum_score inside the accumulation loop.
This method internally calls:
response_assignment.assignment_questionnaires
.find_by(used_in_round: round)
.questionnaire
This resulted in one query per response.
In Repo Y, both score pipelines precompute a
round -> max_question_score map with a single query before the
pipeline runs:
def precompute_max_q_scores
AssignmentQuestionnaire
.joins(:questionnaire)
.where(assignment_id: @assignment.id)
.pluck(:used_in_round, 'questionnaires.max_question_score')
.to_h
end
The result, for example {nil => 10, 1 => 10, 2 => 5}, is stored in
@max_q_score and used as a lookup inside accumulate:
max_score = total_wt * (@max_q_score[round] || @max_q_score[nil] || 1)
Sample Response
{
"type": "review_response_map",
"assignment_id": 1,
"reviewers": [
{ "id": 5, "user_id": 2, "name": "alice",
"full_name": "Alice Smith", "handle": "alice" }
],
"review_scores": { "5": { "1": { "3": 87.5 } } },
"avg_and_ranges": { "3": { "1": { "max": 92.0,
"min": 75.0,
"avg": 83.5 } } }
}
Feedback Report (feedback_response_map)
Produces the list of authors and the IDs of review responses that received author feedback, bucketed by round for varying-rubric assignments.
def source
Response
.joins(:response_map)
.where(
response_maps: { type: 'ReviewResponseMap',
reviewed_object_id: @assignment.id }
)
.order(created_at: :desc)
end
def grouper = ->(r) { [r.map_id, r.round] }
def initial_state = { seen: Set.new, round_1: [],
round_2: [], round_3: [], all: [] }
def accumulate(state, key, response)
return if state[:seen].include?(key)
state[:seen].add(key)
if @assignment.varying_rubrics_by_round?
case response.round
when 1 then state[:round_1] << response.id
when 2 then state[:round_2] << response.id
when 3 then state[:round_3] << response.id
end
else
state[:all] << response.id
end
end
Deduplication uses a Set, which gives O(1) lookup, rather than the
array-based seen_map_round_keys.include? from Repo X, which gives
O(n) lookup.
Authors are fetched once in finalize, not inside the stream.
End-to-End Execution Flow
The run method is inherited from BaseReport.
FeedbackReport never defines it.
Calling FeedbackReport.new(assignment).run triggers the following
sequence:
ReportsController
REPORT_CLASSES['feedback_response_map'].new(assignment).run
|
| (inherited from BaseReport)
v
BaseReport#run
|
|-- Step 1: state = initial_state
| => { seen: Set.new,
| round_1: [], round_2: [], round_3: [], all: [] }
|
|-- Step 2: source.find_each(batch_size: 500)
| => Response.joins(:response_map)
| .where(type: 'ReviewResponseMap',
| reviewed_object_id: assignment.id)
| .order(created_at: :desc)
| streams Response records newest-first, in batches
|
|-- Step 3: for each Response row:
| key = grouper.call(row)
| => [row.map_id, row.round] e.g. [42, 1]
|
| accumulate(state, key, row)
| => skip if state[:seen] already has [map_id, round]
| (keeps only the latest response per map per round
| because source is ordered newest-first)
| => otherwise: add key to :seen, then bucket row.id:
| round == 1 => state[:round_1] << row.id
| round == 2 => state[:round_2] << row.id
| round == 3 => state[:round_3] << row.id
| (or state[:all] if single-rubric assignment)
|
|-- Step 4: finalize(state)
=> fetch_authors (one query: teams -> users -> participants)
=> if varying_rubrics_by_round?
return { authors: [...],
review_response_ids: {
round_1: [...], round_2: [...], round_3: [...] } }
else
return { authors: [...],
review_response_ids: [...] }
The key point is that FeedbackReport only defines the four pieces
the pipeline needs:
sourcegrouperinitial_stateaccumulatefinalize
Ruby's inheritance mechanism means calling .run on a
FeedbackReport instance automatically executes
BaseReport#run, which calls back into FeedbackReport's
implementations of those methods.
Sample Response (varying rubrics)
{
"type": "feedback_response_map",
"authors": [{ "id": 7, "name": "bob", "full_name": "Bob Jones" }],
"review_response_ids": {
"round_1": [12, 15], "round_2": [18], "round_3": []
}
}
Teammate Review Report (teammate_review_response_map)
Streams TeammateReviewResponseMap records grouped by
reviewer_id.
The first occurrence per reviewer is kept using deduplication via early return if the key already exists in state. Reviewer associations are eagerly loaded.
Sample Response
{
"type": "teammate_review_response_map",
"reviewers": [
{ "reviewer_id": 5, "user_id": 2,
"name": "alice", "full_name": "Alice Smith" }
]
}
Bookmark Rating Report (bookmark_rating_response_map)
Streams BookmarkRatingResponseMap records, accumulating distinct
bookmark IDs into a Set.
Project topics are fetched once in finalize.
Bug Fixed During Port
The model's bookmark_response_report in Repo Y was incorrectly
calling:
.pluck(:reviewed_object_id)
This returns assignment IDs, since reviewed_object_id is the
foreign key to Assignment.
Bookmark IDs are stored in reviewee_id. Therefore, this was fixed
to:
.pluck(:reviewee_id)
Sample Response
{
"type": "bookmark_rating_response_map",
"bookmark_ids": [10, 14, 22],
"topics": [{ "id": 3, "topic_name": "Machine Learning" }]
}
Basic Report (basic)
Returns minimal assignment metadata.
No streaming is required since all data comes from the already-loaded
Assignment object.
This report is used as the default when no type parameter is provided.
Sample Response
{
"type": "basic",
"assignment_id": 1,
"assignment": {
"id": 1, "name": "Project 1",
"num_review_rounds": 2,
"varying_rubrics_by_round": true
}
}
File Structure
app/
controllers/
reports_controller.rb Entry point, REPORT_CLASSES dispatch
helpers/
report_formatter_helper.rb Empty namespace (logic moved to services)
services/
reports/
base_report.rb Abstract pipeline template
review_report.rb 3-pipeline coordinator
feedback_report.rb Single pipeline, round bucketing
teammate_review_report.rb Single pipeline
bookmark_rating_report.rb Single pipeline
basic_report.rb Simple struct
models/
review_response_map.rb
.review_response_report class method
feedback_response_map.rb
.feedback_response_report class method
teammate_review_response_map.rb
.teammate_response_report class method
bookmark_rating_response_map.rb
.bookmark_response_report (bug fixed)
Blocked Report Types
The following report types exist in Repo X but cannot yet be implemented in Repo Y due to missing database tables or models.
Each is ready to be added once its dependency is ported.
| Report Type | Missing Dependency | Repo X Location |
|---|---|---|
calibration
|
calibrate_to column on response_maps
|
report_formatter_helper.rb
|
self_review
|
SelfReviewResponseMap model
|
self_review_response_map.rb
|
survey
|
survey_deployments table
|
survey_response_map.rb
|
quiz
|
quiz_responses table
|
quiz_response_map.rb
|
answer_tagging
|
tag_prompt_deployments, answer_tags tables
|
tag_prompt_deployment.rb
|
To add a blocked report once its dependencies are available:
- Create
app/services/reports/<name>_report.rbinheritingBaseReport. - Define
source,grouper,initial_state,accumulate, andfinalize. - Add an entry to
ReportsController::REPORT_CLASSES.
Comparison with Repo X
| Concern | Repo X | Repo Y |
|---|---|---|
| Output format | ERB instance variables, such as @reviewers and @review_scores
|
JSON hash from report.run
|
| Loading strategy | All records loaded into arrays at once | find_each batched streaming
|
| Metrics location | compute_metrics in helper base
|
Each report owns accumulate and finalize
|
| Dispatch | send(@type.underscore, params, session)
|
REPORT_CLASSES[type].new(assignment).run
|
| N+1 on scores | response.maximum_score per row — questionnaire lookup each time
|
Precomputed round->max_score map, one query before pipeline
|
| Deduplication | Array#include? — O(n) per check
|
Set#include? — O(1) per check
|
| Additional features | LLM evaluation, CSV export, calibration, self-review, survey, quiz, answer tagging | Not yet ported; blocked on schema |
Author
| Name | Role |
|---|---|
| Aanand Sreekumaran Nair Jayakumari | Project contributor / developer |