Design Document for E1632. Collusion detection

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction to Expertiza

Expertiza is an open source project developed on Ruby on Rails framework. It is primarily developed as a platform for instructors to create, assign and grade assignments and quizzes and for the students to view, submit assignments and take quizzes. Other activities involved in a course can also be done via Expertiza, like creating teams, reviewing assignments among others.

Problem Statement

Collusion detection is a feature that Expertiza needs. Collusion detection is supposed to detect when peer reviewers “return favors” to each other by giving each other higher scores than they deserve. There are 2 types of collusion that instructors may worry about: small-circle collusion and pervasive collusion. Small-circle collusion refers to the practice that a few teams/authors reach an agreement that they give each other high scores if they get chances to review each other’s work. The number of the teams/authors could be any number. Pervasive collusion refers to the practice that giving each other high scores becomes a trend in some assignments. This practice has been observed by Dr. Luca de Alfaro in his paper:

... Once a nucleus of students starts to assign top grades to all the submissions they review, other initially honest students see what is happening, and join the colluders, both to save work in reviewing, and to avoid being penalized in the review precision as the only honest students who disagree with their colluding peers.

Issues Identified

  • The code in collusion_cycle.rb is not called anywhere. Even it were called, the code would not work (due to some nasty refactoring done previously). The algorithm was badly designed - creating separate methods for detecting x-node cycles.

Tasks Identified

The following tasks need to be accomplished at the end of this project:

  • Remove all the methods in this model. Redesign and recode the whole model from scratch.
  • In the new model we will create a new collusion cycle detection method which will take x as an input. Given the parameter x, the algorithm then can detect any collusion cycles consisting of x nodes.
  • We will use spread and the ratio spread/(# of peer reviews done) to measure the pervasive collusion in one assignment.

Here spread is the peer-review score difference a reviewer (max peer review score given - min peer review score given) A higher spread is better, because it indicates that the reviewer is discriminating between good and bad work. [Dr. Gehringer’s survey, 2014] Spread/(# of peer-review done) should work better than spread alone if the # of review done varies between reviewers.

Implementation

  • We will consider each student in an assignment a node and each peer-review record (namely, user A reviews user B, which is recorded in review_mappings table) as a directed edge. So finding small-circle collusion becomes a task of finding cycles within x nodes in a directed graph.
  • This algorithm can use depth-first search.
  • The idea of having a threshold :

If A gives B a low peer-review score (less than min_collusion_peer_review_score), we prune this A->B edge.

  • We will create a report page (small collusion cycles report) under “view review report” icon for instructors. The instructor can input x and run this algorithm to detect small collusion cycles. The instructor can also set the min_collusion_peer_review_score on this report page. The default of min_collusion_peer_review_score can be 95%.
  • We might also need a similar report page (pervasive collusion report) for pervasive collusion.

The Use Case diagram for such design is:

Proposed Code

We are planning to implement the modified DFS for finding a cycle of size "n". The basic algorithm for this looks like:

DFS(G)
{
    for each u in V{
        color[u]<-White
        pred[u]<-NULL
    }

    time = 0

    for each u in V{
        if(color[u]==White){
            DFSVisit(u)
        }
    }
}


DFSVisit(u){
    color[u]<-Grey
    d[u]=++time

    for each v in adj[u]{
        if(color[v]==White){
            pred[v]<-u
            DFSVisit(v)
        }else{
            //we have a cycle between current node u and already visited node v
            cycle_list<-(u,v)    //use later to find cycle length
        }
    }

    color[u]<-black
    f[u]= ++time
}

legend:
color:: it stores the information whether the node has been visited so far or not,
pred:: stores the parent information of the given node
adj:: adjacency list
d[u] :: gives you the discovery time
f[u]:: gives you the finish time, i.e all the adjacent nodes of u have been visited

This algorithm has to be implemented in "collusion_cycle.rb", and the implementation will roughly look like:

def cycle_detection (graph, n)
  white_set = set.new
  gray_set = set.new
  black_set = set.new
  vertex = vertex (graph) # get all the vertex from graph 
  vertex.each do |current|
    dfs_within_n (current, white_set, gray_set, black_set, [], n)
  end
end

def dfs_within_n (current, white_set, gray_set, black_set, recent_n_vertex, n)
  move_vertex (current, white_set, gray_set)
  neighbours = neighbours(current)
  neighbors.each do |neighbor|
    if black_set.contains(neighbor)
      continue
    end
    if gray_set.contains(neighbor) # a cycle detected
      if recent_n_vertex.contains(current) # a cycle within n vertex detected
        output_loop (recent_n_vertex, current)
      end
    end
    if recent_n_vertex.length==n
      recent_n_vertex.remove[0]
    end
    recent_n_vertex.add(neighbor)
    dfs_within_n (current, white_set, gray_set, black_set, recent_n_vertex, n)
  end
  move_vertex (current, gray_set, black_set)
end

def move_vertex(current, source_set, destination_set)
  source_set.remove(current)
  destination_set.add(current)
end

Test Cases

The test case will be written as "rspec"

  • We will need a test case, which returns the cycles detected in the graph, and confirms that the size of the cycles is no less or no more than the user specified size "n".
  • We will need to check the performance of this algorithm. In some cases when the graph becomes too large the algorithm should still perform in acceptable time frame.

Visual Representation

This is a soft commit for the project. We are planning to have a visually asthetic way of representing the collusion detected. For this purpose we have identified a few ways:

References

  1. Expertiza on GitHub
  2. GitHub Project Repository Fork
  3. The live Expertiza website
  4. Expertiza project documentation wiki
  5. Rspec Documentation
  6. Ruby Documentation
  7. viz.js
  8. Cytoscape.js
  9. D3.js