CSC/ECE 517 Fall 2013/oss ssv: Difference between revisions
Line 20: | Line 20: | ||
==== compare_vertices ==== | ==== compare_vertices ==== | ||
This method compares the vertices from across the two graphs(submission and revision) to identify matches and quantify various metrics. | This method compares the vertices from across the two graphs(submission and revision) to identify matches and quantify various metrics. Every vertex is compared with every other vertex to obtain the comparison results. | ||
==== compare_edges_non_syntax_diff ==== | ==== compare_edges_non_syntax_diff ==== |
Revision as of 05:48, 30 October 2013
E813. Refactoring and testing — degree_of_relevance.rb
Introduction
The class degree_of_relevance.rb is used to how relevant one piece of text is to another piece of text. It is used to evaluate the relevance between the submitted work and review(or metareview) for an assignment. It is important to evaluate the reviews for a submitted work to ensure that if the review is not relevant to the submission, it is considered to be invalid and does not impact student's grade. This class contains a lot of duplicated code and has long and complex methods. It has been assigned grade "F", according to metric like code complexity, duplication, lines of code per method etc. Since this class is important for the research on expertiza, it should be re-factored to reduce its complexity, duplication and introduce coding best practices. Our task for this project is to re-factor this class and test it thoroughly. This class can be found at the following location in expertiza source code - Expertiza\expertiza\app\models\automated_metareview
Theory of Relevance
Existing Design
The current design has a method get_relevance which is a single entry point to degree_of_relevance.rb. It takes as input submission and review graphs in array form along with other important parameters. The algorithm is broken down into different parts each of which is handled by a different helper method. The results obtained from these methods are used in the following formula to obtain degree of relevance.
Helper Methods
The helper methods used by get_relevance are:
compare_vertices
This method compares the vertices from across the two graphs(submission and revision) to identify matches and quantify various metrics. Every vertex is compared with every other vertex to obtain the comparison results.
compare_edges_non_syntax_diff
compare_edges_syntax_diff
compare_edges_diff_types
compare_SVO_edges
Implementation
New Design and Refactoring
New Design
We have taken ideas from the Template design pattern to improve the design of the class. Although we did not directly implement this design pattern on the class, the idea of defining a skeleton of an algorithm in one class, and defering some steps to subclasses, allowed us to come up with a similar design which segregates different functionality to different classes. The following is a brief outline of the changes made:
- Divided the code into 4 classes to segregate the functionality making a logical separation of code.
- These classes are -
- compare_graph_edges.rb
- compare_graph_svo_edges.rb
- compare_graph_vertices.rb
- degree_of_relevance.rb
- Extracted common code in methods that be re-used.
- After refactoring the grade according to codeclimate for the class is "C".
Refactoring
Design of classes
The main class degree_of_relevance.rb calculates the scaled relevance using the formula described above. It calculates the relevance based on comparison of submission and review graphs. As described above the following types of comparison is made between the graphs and various metrics is calculated which is used to calculate the relevance:
- Class compare_graph_edges.rb:
- Comparing edges of graphs with non syntax difference: In this SUBJECT-VERB edges are compred with SUBJECT-VERB matches where SUBJECT-SUBJECT and VERB-VERB or VERB-VERB and OBJECT-OBJECT comparisons are done.
- Comparing edges with syntax diff: Compares the edges from across the two graphs to identify matches and quantify various metrics. Compare SUBJECT-VERB edges with VERB-OBJECT matches and vice-versa where SUBJECT-OBJECT and VERB_VERB comparisons are done - same type comparisons.
- Comparing edges with diff types: Compares the edges from across the two graphs to identify matches and quantify various metrics compare SUBJECT-VERB edges with VERB-OBJECT matches and vice-versa SUBJECT-VERB, VERB-SUBJECT, OBJECT-VERB, VERB-OBJECT comparisons are done. (Different type comparisons)
All the above functions are grouped in one class - compare_graph_edges.rb.
- Class compare_graph_vertices.rb:
- Comparing vertices of the corresponding graphs: Every vertex is compared with every other vertex. Compares the vertices from across the two graphs to identify matches and quantify various metrics.
This method is factored out to the class - compare_graph_vertices.rb
- Class compare_graph_SVO_edges
- comparing SVO edges.
- compare SVO edges with different syntax.
These methods are grouped in the compare_graph_SVO_edges.rb class.
The main class degree_of_relevance.rb calls each of these methods to get the appropriate metrics required for evaluating relevance.
compare_vertices Before Refactoring def compare_vertices(pos_tagger, rev, subm, num_rev_vert, num_sub_vert, speller)
# puts("****Inside compare_vertices:: rev.length:: #{num_rev_vert} subm.length:: #{num_sub_vert}") #for double dimensional arrays, one of the dimensions should be initialized @vertex_match = Array.new(num_rev_vert){Array.new} wnet = WordnetBasedSimilarity.new cum_vertex_match = 0.0 count = 0 max = 0.0 flag = 0 for i in (0..num_rev_vert - 1) if(!rev.nil? and !rev[i].nil?) rev[i].node_id = i # puts("%%%%%%%%%%% Token #{rev[i].name} ::: POS tags:: rev[i].pos_tag:: #{rev[i].pos_tag} :: rev[i].node_id #{rev[i].node_id}") #skipping frequent words from vertex comparison if(wnet.is_frequent_word(rev[i].name)) next #ruby equivalent for continue end #looking for the best match #j tracks every element in the set of all vertices, some of which are null for j in (0..num_sub_vert - 1) if(!subm[j].nil?) if(subm[j].node_id == -1) subm[j].node_id = j end # puts("%%%%%%%%%%% Token #{subm[j].name} ::: POS tags:: subm[j].pos_tag:: #{subm[j].pos_tag} subm[j].node_id #{subm[j].node_id}") if(wnet.is_frequent_word(subm[j].name)) next #ruby equivalent for continue end #comparing only if one of the two vertices is a noun if(rev[i].pos_tag.include?("NN") and subm[j].pos_tag.include?("NN")) @vertex_match[i][j] = wnet.compare_strings(rev[i], subm[j], speller) #only if the "if" condition is satisfied, since there could be null objects in between and you dont want unnecess. increments flag = 1 if(@vertex_match[i][j] > max) max = @vertex_match[i][j] end end end end #end of for loop for the submission vertices if(flag != 0)#if the review edge had any submission edges with which it was matched, since not all S-V edges might have corresponding V-O edges to match with # puts("**** Best match for:: #{rev[i].name}-- #{max}") cum_vertex_match = cum_vertex_match + max count+=1 max = 0.0 #re-initialize flag = 0 end end #end of if condition end #end of for loop
avg_match = 0.0 if(count > 0) avg_match = cum_vertex_match/ count end return avg_match
end #end of compare_vertices