<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cmmcclen</id>
	<title>Expertiza_Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cmmcclen"/>
	<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Special:Contributions/Cmmcclen"/>
	<updated>2026-05-06T12:49:23Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81312</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81312"/>
		<updated>2013-10-30T20:48:52Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable. &lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally the biggest if-else statement to refactor is in the next_state() method as shown below:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
  return state&lt;br /&gt;
&lt;br /&gt;
The key to refactoring this code was recognizing that the next state of the sentence depended on the current state of the sentence and the current_token_type. Understanding this revealed that a better design would be to have SentenceState subclasses (PositiveState, NegativeWordState, etc). The superclass SentenceState would contain information about interim state variables such as interim_noun_verb and prev_negative_word and the current sentence clause state, while the subclasses would only know their relationship between themselves the current_token_type, and the next state of the sentence. The superclass SentenceState would also be in charge of making these states in a factory method using only the current state of the sentence. An example of one of the subclasses is shown below:&lt;br /&gt;
&lt;br /&gt;
 class NegativeDescriptorState &amp;lt; SentenceState&lt;br /&gt;
  def negative_word&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_WORD, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def positive&lt;br /&gt;
    set_interim_noun_verb(true)&lt;br /&gt;
    @state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
    #puts &amp;quot;next token is positive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_descriptor&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_DESCRIPTOR, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_phrase&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_PHRASE, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative phrase&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def suggestive&lt;br /&gt;
    @state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
                        #puts &amp;quot;next token is suggestive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def get_state&lt;br /&gt;
    #puts &amp;quot;negative_descriptor&amp;quot;&lt;br /&gt;
    NEGATED&lt;br /&gt;
  end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Every other subclass also has the same methods, so that each subclass can be responsible for knowing what to do for any current_token_type. (These methods are different in every subclass because the next state is different for every current state and current_token_type). The methods such as if_interim_then_state_is(thistype, elsethistype) are implemented in the superclass to remove duplicate code from the subclasses. &lt;br /&gt;
&lt;br /&gt;
This simplifies the next_state method so the superclass doesn't have to know anything about the relationships of the subclasses to find the next state of the sentence as shown below:&lt;br /&gt;
&lt;br /&gt;
 def next_state(current_token_type)&lt;br /&gt;
    method = {POSITIVE =&amp;gt; self.method(:positive), NEGATIVE_DESCRIPTOR =&amp;gt; self.method(:negative_descriptor), NEGATIVE_PHRASE =&amp;gt; self.method(:negative_phrase), SUGGESTIVE =&amp;gt; self.method(:suggestive), NEGATIVE_WORD =&amp;gt; self.method(:negative_word)}[current_token_type]&lt;br /&gt;
    method.call()&lt;br /&gt;
    if @state != POSITIVE&lt;br /&gt;
      set_interim_noun_verb(false) #resetting&lt;br /&gt;
    end&lt;br /&gt;
    @state&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The method variable calls the correct method in the subclass based on the current_token_type. Now the code is much more extensible. Instead of having to edit that awfully long if-else statement, now a programmer only has to make a new SentenceState subclass which defines all the relationships of that subclass with any possible current_token_types. Future refactoring would include changing the variable method to a more descriptive name. &lt;br /&gt;
&lt;br /&gt;
So now the sentence_state method has to be modified once more to use the new SentenceState subclasses:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(sentence_tokens) #str_with_pos_tags)&lt;br /&gt;
    #initialize state variables so that the original sentence state is positive&lt;br /&gt;
    @state = POSITIVE&lt;br /&gt;
    current_state = factory(@state)&lt;br /&gt;
    @@prev_negative_word = false&lt;br /&gt;
    @interim_noun_verb = false&lt;br /&gt;
    sentence_tokens.each_with_next do |curr_token, next_token|&lt;br /&gt;
      #get current token type&lt;br /&gt;
      current_token_type = get_token_type([curr_token, next_token])&lt;br /&gt;
      #Ask State class to get current state based on current state, current_token_type, and if there was a prev_negative_word&lt;br /&gt;
      current_state = factory(current_state.next_state(current_token_type))&lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      NEGATIVE_EMPHASIS_WORDS.each do |e|&lt;br /&gt;
        if curr_token.casecmp(e)&lt;br /&gt;
          @@prev_negative_word = true&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    current_state.get_state()&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The factory method is implemented so that the SentenceState class only has to know what type of state it wants to make. At any one time, a SentenceState instance only uses to one SentenceState subclass instance so that the interim instance variables are not being overridden by multiple subclasses. &lt;br /&gt;
&lt;br /&gt;
 def factory(state)&lt;br /&gt;
    {POSITIVE =&amp;gt; PositiveState, NEGATIVE_DESCRIPTOR =&amp;gt; NegativeDescriptorState, NEGATIVE_PHRASE =&amp;gt; NegativePhraseState, SUGGESTIVE =&amp;gt; SuggestiveState, NEGATIVE_WORD =&amp;gt; NegativeWordState}[state].new()&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code. Overall I think these refactorings and new designs make the code much more readable, extensible, and maintainable.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
4. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81310</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81310"/>
		<updated>2013-10-30T20:45:59Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable. &lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally the biggest if-else statement to refactor is in the next_state() method as shown below:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
  return state&lt;br /&gt;
&lt;br /&gt;
The key to refactoring this code was recognizing that the next state of the sentence depended on the current state of the sentence and the current_token_type. Understanding this revealed that a better design would be to have SentenceState subclasses (PositiveState, NegativeWordState, etc). The superclass SentenceState would contain information about interim state variables such as interim_noun_verb and prev_negative_word and the current sentence clause state, while the subclasses would only know their relationship between themselves the current_token_type, and the next state of the sentence. The superclass SentenceState would also be in charge of making these states in a factory method using only the current state of the sentence. An example of one of the subclasses is shown below:&lt;br /&gt;
&lt;br /&gt;
 class NegativeDescriptorState &amp;lt; SentenceState&lt;br /&gt;
  def negative_word&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_WORD, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def positive&lt;br /&gt;
    set_interim_noun_verb(true)&lt;br /&gt;
    @state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
    #puts &amp;quot;next token is positive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_descriptor&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_DESCRIPTOR, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_phrase&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_PHRASE, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative phrase&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def suggestive&lt;br /&gt;
    @state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
                        #puts &amp;quot;next token is suggestive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def get_state&lt;br /&gt;
    #puts &amp;quot;negative_descriptor&amp;quot;&lt;br /&gt;
    NEGATED&lt;br /&gt;
  end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Every other subclass also has the same methods, so that each subclass can be responsible for knowing what to do for any current_token_type. (These methods are different in every subclass because the next state is different for every current state and current_token_type). The methods such as if_interim_then_state_is(thistype, elsethistype) are implemented in the superclass to remove duplicate code from the subclasses. &lt;br /&gt;
&lt;br /&gt;
This simplifies the next_state method so the superclass doesn't have to know anything about the relationships of the subclasses to find the next state of the sentence as shown below:&lt;br /&gt;
&lt;br /&gt;
 def next_state(current_token_type)&lt;br /&gt;
    method = {POSITIVE =&amp;gt; self.method(:positive), NEGATIVE_DESCRIPTOR =&amp;gt; self.method(:negative_descriptor), NEGATIVE_PHRASE =&amp;gt; self.method(:negative_phrase), SUGGESTIVE =&amp;gt; self.method(:suggestive), NEGATIVE_WORD =&amp;gt; self.method(:negative_word)}[current_token_type]&lt;br /&gt;
    method.call()&lt;br /&gt;
    if @state != POSITIVE&lt;br /&gt;
      set_interim_noun_verb(false) #resetting&lt;br /&gt;
    end&lt;br /&gt;
    @state&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The method variable calls the correct method in the subclass based on the current_token_type. Now the code is much more extensible. Instead of having to edit that awfully long if-else statement, now a programmer only has to make a new SentenceState subclass which defines all the relationships of that subclass with any possible current_token_types. Future refactoring would include changing the variable method to a more descriptive name. &lt;br /&gt;
&lt;br /&gt;
So now the sentence_state method has to be modified once more to use the new SentenceState subclasses:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(sentence_tokens) #str_with_pos_tags)&lt;br /&gt;
    #initialize state variables so that the original sentence state is positive&lt;br /&gt;
    @state = POSITIVE&lt;br /&gt;
    current_state = factory(@state)&lt;br /&gt;
    @@prev_negative_word = false&lt;br /&gt;
    @interim_noun_verb = false&lt;br /&gt;
    sentence_tokens.each_with_next do |curr_token, next_token|&lt;br /&gt;
      #get current token type&lt;br /&gt;
      current_token_type = get_token_type([curr_token, next_token])&lt;br /&gt;
      #Ask State class to get current state based on current state, current_token_type, and if there was a prev_negative_word&lt;br /&gt;
      current_state = factory(current_state.next_state(current_token_type))&lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      NEGATIVE_EMPHASIS_WORDS.each do |e|&lt;br /&gt;
        if curr_token.casecmp(e)&lt;br /&gt;
          @@prev_negative_word = true&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    current_state.get_state()&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The factory method is implemented so that the SentenceState class only has to know what type of state it wants to make. At any one time, a SentenceState instance only uses to one SentenceState subclass instance so that the interim instance variables are not being overridden by multiple subclasses. &lt;br /&gt;
&lt;br /&gt;
 def factory(state)&lt;br /&gt;
    {POSITIVE =&amp;gt; PositiveState, NEGATIVE_DESCRIPTOR =&amp;gt; NegativeDescriptorState, NEGATIVE_PHRASE =&amp;gt; NegativePhraseState, SUGGESTIVE =&amp;gt; SuggestiveState, NEGATIVE_WORD =&amp;gt; NegativeWordState}[state].new()&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code. Overall I think these refactorings and new designs make the code much more readable, extensible, and maintainable.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
4. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81307</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81307"/>
		<updated>2013-10-30T20:43:41Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable. &lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally the biggest if-else statement to refactor is in the next_state() method as shown below:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
  return state&lt;br /&gt;
&lt;br /&gt;
The key to refactoring this code was recognizing that the next state of the sentence depended on the current state of the sentence and the current_token_type. Understanding this revealed that a better design would be to have SentenceState subclasses (PositiveState, NegativeWordState, etc). The superclass SentenceState would contain information about interim state variables such as interim_noun_verb and prev_negative_word and the current sentence clause state, while the subclasses would only know their relationship between themselves the current_token_type, and the next state of the sentence. The superclass SentenceState would also be in charge of making these states in a factory method using only the current state of the sentence. An example of one of the subclasses is shown below:&lt;br /&gt;
&lt;br /&gt;
 class NegativeDescriptorState &amp;lt; SentenceState&lt;br /&gt;
  def negative_word&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_WORD, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def positive&lt;br /&gt;
    set_interim_noun_verb(true)&lt;br /&gt;
    @state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
    #puts &amp;quot;next token is positive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_descriptor&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_DESCRIPTOR, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_phrase&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_PHRASE, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative phrase&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def suggestive&lt;br /&gt;
    @state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
                        #puts &amp;quot;next token is suggestive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def get_state&lt;br /&gt;
    #puts &amp;quot;negative_descriptor&amp;quot;&lt;br /&gt;
    NEGATED&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Every other subclass also has the same methods, so that each subclass can be responsible for knowing what to do for any current_token_type. (These methods are different in every subclass because the next state is different for every current state and current_token_type). The methods such as if_interim_then_state_is(thistype, elsethistype) are implemented in the superclass to remove duplicate code from the subclasses. &lt;br /&gt;
&lt;br /&gt;
This simplifies the next_state method so the superclass doesn't have to know anything about the relationships of the subclasses to find the next state of the sentence as shown below:&lt;br /&gt;
&lt;br /&gt;
 def next_state(current_token_type)&lt;br /&gt;
    method = {POSITIVE =&amp;gt; self.method(:positive), NEGATIVE_DESCRIPTOR =&amp;gt; self.method(:negative_descriptor), NEGATIVE_PHRASE =&amp;gt; self.method(:negative_phrase), SUGGESTIVE =&amp;gt; self.method(:suggestive), NEGATIVE_WORD =&amp;gt; self.method(:negative_word)}[current_token_type]&lt;br /&gt;
    method.call()&lt;br /&gt;
    if @state != POSITIVE&lt;br /&gt;
      set_interim_noun_verb(false) #resetting&lt;br /&gt;
    end&lt;br /&gt;
    @state&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The method variable calls the correct method in the subclass based on the current_token_type. Now the code is much more extensible. Instead of having to edit that awfully long if-else statement, now a programmer only has to make a new SentenceState subclass which defines all the relationships of that subclass with any possible current_token_types. Future refactoring would include changing the variable method to a more descriptive name. &lt;br /&gt;
&lt;br /&gt;
So now the sentence_state method has to be modified once more to use the new SentenceState subclasses:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(sentence_tokens) #str_with_pos_tags)&lt;br /&gt;
    #initialize state variables so that the original sentence state is positive&lt;br /&gt;
    @state = POSITIVE&lt;br /&gt;
    current_state = factory(@state)&lt;br /&gt;
    @@prev_negative_word = false&lt;br /&gt;
    @interim_noun_verb = false&lt;br /&gt;
    sentence_tokens.each_with_next do |curr_token, next_token|&lt;br /&gt;
      #get current token type&lt;br /&gt;
      current_token_type = get_token_type([curr_token, next_token])&lt;br /&gt;
      #Ask State class to get current state based on current state, current_token_type, and if there was a prev_negative_word&lt;br /&gt;
      current_state = factory(current_state.next_state(current_token_type))&lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      NEGATIVE_EMPHASIS_WORDS.each do |e|&lt;br /&gt;
        if curr_token.casecmp(e)&lt;br /&gt;
          @@prev_negative_word = true&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    current_state.get_state()&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The factory method is implemented so that the SentenceState class only has to know what type of state it wants to make. At any one time, a SentenceState instance only uses to one SentenceState subclass instance so that the interim instance variables are not being overridden by multiple subclasses. &lt;br /&gt;
&lt;br /&gt;
 def factory(state)&lt;br /&gt;
    {POSITIVE =&amp;gt; PositiveState, NEGATIVE_DESCRIPTOR =&amp;gt; NegativeDescriptorState, NEGATIVE_PHRASE =&amp;gt; NegativePhraseState, SUGGESTIVE =&amp;gt; SuggestiveState, NEGATIVE_WORD =&amp;gt; NegativeWordState}[state].new()&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code. Overall I think these refactorings and new designs make the code much more readable, extensible, and maintainable.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
4. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81303</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81303"/>
		<updated>2013-10-30T20:42:49Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable. &lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally the biggest if-else statement to refactor is in the next_state() method as shown below:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
   if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
  return state&lt;br /&gt;
&lt;br /&gt;
The key to refactoring this code was recognizing that the next state of the sentence depended on the current state of the sentence and the current_token_type. Understanding this revealed that a better design would be to have SentenceState subclasses (PositiveState, NegativeWordState, etc). The superclass SentenceState would contain information about interim state variables such as interim_noun_verb and prev_negative_word and the current sentence clause state, while the subclasses would only know their relationship between themselves the current_token_type, and the next state of the sentence. The superclass SentenceState would also be in charge of making these states in a factory method using only the current state of the sentence. An example of one of the subclasses is shown below:&lt;br /&gt;
&lt;br /&gt;
 class NegativeDescriptorState &amp;lt; SentenceState&lt;br /&gt;
  def negative_word&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_WORD, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def positive&lt;br /&gt;
    set_interim_noun_verb(true)&lt;br /&gt;
    @state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
    #puts &amp;quot;next token is positive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_descriptor&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_DESCRIPTOR, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def negative_phrase&lt;br /&gt;
    @state = if_interim_then_state_is(NEGATIVE_PHRASE, POSITIVE)&lt;br /&gt;
    #puts &amp;quot;next token is negative phrase&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def suggestive&lt;br /&gt;
    @state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
                        #puts &amp;quot;next token is suggestive&amp;quot;&lt;br /&gt;
  end&lt;br /&gt;
  def get_state&lt;br /&gt;
    #puts &amp;quot;negative_descriptor&amp;quot;&lt;br /&gt;
    NEGATED&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Every other subclass also has the same methods, so that each subclass can be responsible for knowing what to do for any current_token_type. (These methods are different in every subclass because the next state is different for every current state and current_token_type). The methods such as if_interim_then_state_is(thistype, elsethistype) are implemented in the superclass to remove duplicate code from the subclasses. &lt;br /&gt;
&lt;br /&gt;
This simplifies the next_state method so the superclass doesn't have to know anything about the relationships of the subclasses to find the next state of the sentence as shown below:&lt;br /&gt;
&lt;br /&gt;
 def next_state(current_token_type)&lt;br /&gt;
    method = {POSITIVE =&amp;gt; self.method(:positive), NEGATIVE_DESCRIPTOR =&amp;gt; self.method(:negative_descriptor), NEGATIVE_PHRASE =&amp;gt; self.method(:negative_phrase), SUGGESTIVE =&amp;gt; self.method(:suggestive), NEGATIVE_WORD =&amp;gt; self.method(:negative_word)}[current_token_type]&lt;br /&gt;
    method.call()&lt;br /&gt;
    if @state != POSITIVE&lt;br /&gt;
      set_interim_noun_verb(false) #resetting&lt;br /&gt;
    end&lt;br /&gt;
    @state&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The method variable calls the correct method in the subclass based on the current_token_type. Now the code is much more extensible. Instead of having to edit that awfully long if-else statement, now a programmer only has to make a new SentenceState subclass which defines all the relationships of that subclass with any possible current_token_types. Future refactoring would include changing the variable method to a more descriptive name. &lt;br /&gt;
&lt;br /&gt;
So now the sentence_state method has to be modified once more to use the new SentenceState subclasses:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(sentence_tokens) #str_with_pos_tags)&lt;br /&gt;
    #initialize state variables so that the original sentence state is positive&lt;br /&gt;
    @state = POSITIVE&lt;br /&gt;
    current_state = factory(@state)&lt;br /&gt;
    @@prev_negative_word = false&lt;br /&gt;
&lt;br /&gt;
    @interim_noun_verb = false&lt;br /&gt;
    sentence_tokens.each_with_next do |curr_token, next_token|&lt;br /&gt;
      #get current token type&lt;br /&gt;
      current_token_type = get_token_type([curr_token, next_token])&lt;br /&gt;
&lt;br /&gt;
      #Ask State class to get current state based on current state, current_token_type, and if there was a prev_negative_word&lt;br /&gt;
&lt;br /&gt;
      current_state = factory(current_state.next_state(current_token_type))&lt;br /&gt;
&lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      NEGATIVE_EMPHASIS_WORDS.each do |e|&lt;br /&gt;
        if curr_token.casecmp(e)&lt;br /&gt;
          @@prev_negative_word = true&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
&lt;br /&gt;
    current_state.get_state()&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
The factory method is implemented so that the SentenceState class only has to know what type of state it wants to make. At any one time, a SentenceState instance only uses to one SentenceState subclass instance so that the interim instance variables are not being overridden by multiple subclasses. &lt;br /&gt;
&lt;br /&gt;
 def factory(state)&lt;br /&gt;
    {POSITIVE =&amp;gt; PositiveState, NEGATIVE_DESCRIPTOR =&amp;gt; NegativeDescriptorState, NEGATIVE_PHRASE =&amp;gt; NegativePhraseState, SUGGESTIVE =&amp;gt; SuggestiveState, NEGATIVE_WORD =&amp;gt; NegativeWordState}[state].new()&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code. Overall I think these refactorings and new designs make the code much more readable, extensible, and maintainable.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
4. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81282</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81282"/>
		<updated>2013-10-30T20:20:40Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally the biggest if-else statement to refactor is in the next_state() method as shown below:&lt;br /&gt;
&lt;br /&gt;
 def next_state(...)&lt;br /&gt;
   if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
  return state&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81278</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81278"/>
		<updated>2013-10-30T20:18:11Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable because it moved duplicated code in five methods into a single method so there is only one place to read and understand the code. It also made it easily extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81274</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81274"/>
		<updated>2013-10-30T20:15:56Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable and extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81272</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81272"/>
		<updated>2013-10-30T20:15:26Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was refactored using the strategy design pattern. Lambda blocks get_word and get_phrase were created to handle parsing the current_token into a word or a phrase. An array called types was created which holds the relationship between a type of token, which array it searches through to check for that type, and if the values in the search array are words or phrases. Iterating through the types array calls get_word or get_phrase to parse the input current_token into a word or phrase, then checks if that word or phrase is in the word_or_phrase_array, and returns the type associated with that word_or_phrase_array if it is found. This refactored five methods with duplicated code into one method as shown below. &lt;br /&gt;
&lt;br /&gt;
 def get_token_type(current_token)&lt;br /&gt;
    #input parsers&lt;br /&gt;
    get_word = lambda { |c| c[0]}&lt;br /&gt;
    get_phrase = lambda {|c| c[1].nil? ? nil : c[0]+' '+c[1]}&lt;br /&gt;
&lt;br /&gt;
    #types holds relationships between word_or_phrase_array_of_type =&amp;gt; [input parser of type, type]&lt;br /&gt;
    types = {NEGATED_WORDS =&amp;gt; [get_word, NEGATIVE_WORD], NEGATIVE_DESCRIPTORS =&amp;gt; [get_word, NEGATIVE_DESCRIPTOR], SUGGESTIVE_WORDS =&amp;gt; [get_word, SUGGESTIVE], NEGATIVE_PHRASES =&amp;gt; [get_phrase,NEGATIVE_PHRASE], SUGGESTIVE_PHRASES =&amp;gt; [get_phrase, SUGGESTIVE]}&lt;br /&gt;
    current_token_type = POSITIVE&lt;br /&gt;
    types.each do |word_or_phrase_array, type_definition|&lt;br /&gt;
      get_word_or_phrase, word_or_phrase_type = type_definition[0], type_definition[1]&lt;br /&gt;
      token = get_word_or_phrase.(current_token)&lt;br /&gt;
      unless token.nil?&lt;br /&gt;
        word_or_phrase_array.each do |word_or_phrase|&lt;br /&gt;
            if token.casecmp(word_or_phrase) == 0&lt;br /&gt;
              current_token_type = word_or_phrase_type&lt;br /&gt;
              break&lt;br /&gt;
            end&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    current_token_type&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This refactoring made the method more readable and extensible. If you want to check for a new type, just add the relationship to the types array. If you need a different input, just make a new input_parser lambda. &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81260</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81260"/>
		<updated>2013-10-30T20:11:03Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement and ambiguous variable names. Digging deeper into each of the methods called by the if-else conditions, there was a lot of duplicated code. Each method called iterated through an array of words, and returned one token_type if the current_token was found in that array, and token_type = POSITIVE if it was not. The only major differences between the methods was the array that was searched, the type that was returned, and whether the input was a word or a phrase (1 or 2 tokens, respectively.) An example of one of these methods is shown below: &lt;br /&gt;
&lt;br /&gt;
 def is_negative_word(word)  &amp;lt;== input could be word or phrase&lt;br /&gt;
  not_negated = POSITIVE         &amp;lt;== type always POSITIVE&lt;br /&gt;
  for i in (0..NEGATED_WORDS.length - 1)      &amp;lt;== different array of words&lt;br /&gt;
    if(word.casecmp(NEGATED_WORDS[i]) == 0)&lt;br /&gt;
      not_negated = NEGATIVE_WORD      &amp;lt;== different type matching array&lt;br /&gt;
      break&lt;br /&gt;
    end&lt;br /&gt;
  end&lt;br /&gt;
  return not_negated&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81250</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81250"/>
		<updated>2013-10-30T20:06:09Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Design Smells */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Often it was possible to remove these smells using the Strategy Design Pattern, as will be explained in more detail below. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement, bad variable names, and duplicated code in the methods it was calling.  &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81246</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81246"/>
		<updated>2013-10-30T20:04:33Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Design Smells */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. These responsibility of parsing the sentence and sentence tokens can be split into another class because this functionality could be useful elsewhere in the future and should be kept decoupled from SentenceState. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement, bad variable names, and duplicated code in the methods it was calling.  &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81243</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81243"/>
		<updated>2013-10-30T20:03:15Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement, bad variable names, and duplicated code in the methods it was calling.  &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81242</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81242"/>
		<updated>2013-10-30T20:02:30Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===What Does SentenceState Do?===&lt;br /&gt;
The responsibility of SentenceState is to determine the state of each clause of a sentence. The possible states include positive, negative, and suggestive. In order to find the state of the sentence, SentenceState first splits the sentence into sentence clauses, then splits each clause into sentence tokens (words). Then it iterates through the tokens, and determines the new state of the sentence dependent on the previous state and the token state.&lt;br /&gt;
&lt;br /&gt;
Take for example the sentence_state_test Identify State 8:&lt;br /&gt;
&lt;br /&gt;
sentence = “We are not not musicians.”&lt;br /&gt;
&lt;br /&gt;
First token: We =&amp;gt; Positive, state = &amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Second token: are =&amp;gt; Positive and prev_state = &amp;gt;positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Third token: not =&amp;gt; Negative and prev_state =&amp;gt; positive, state =&amp;gt; negative&lt;br /&gt;
&lt;br /&gt;
Fourth token: not =&amp;gt; Negative and prev_state =&amp;gt; negative, state =&amp;gt; positive (double &lt;br /&gt;
negative!)&lt;br /&gt;
&lt;br /&gt;
Fifth token: musicians =&amp;gt; positive and prev_state =&amp;gt; positive, state =&amp;gt; positive&lt;br /&gt;
&lt;br /&gt;
Therefore the sentence state is positive.&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement, bad variable names, and duplicated code in the methods it was calling.  &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81237</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81237"/>
		<updated>2013-10-30T20:00:13Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
The next method to refactor was get_token_type(current_token) in the SentenceState class as shown below. &lt;br /&gt;
&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
Problems with this code included a nested if else statement, bad variable names, and duplicated code in the methods it was calling.  &lt;br /&gt;
&lt;br /&gt;
Finally, there were some other simple refactorings which I did across the code. This included changing for loops into iterations over objects, and changing variable names to make them more readable for other programmers, and finally removing &amp;quot;return&amp;quot; from the end of methods, because this is implied in ruby code.&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81228</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81228"/>
		<updated>2013-10-30T19:54:17Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81227</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81227"/>
		<updated>2013-10-30T19:53:45Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
This does not shorten the code but it makes it more readable and extendable. If anyone wants to check for an additional punctuation, they do not have to update the if-else statement which could easily make a bug, instead they only have to update the punctuation array with the new punctuation. Also, the variables are defined so that the reader can understand their functionality. Future refactoring could include moving the line sp = sp[0..sp.index(tag)-1] into a lambda called remove_tag_from_sp[sp, tag] so that the code is even more readable.&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81223</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81223"/>
		<updated>2013-10-30T19:49:50Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research. plagiarism_check.rb and sentence_state.rb are used to check whether the reviews are copied from other places. To check whether plagiarism happens is important because the reviewers are tends to game with the automated review system to get a high score instead of writing a high quality review. The classes compare the review text with text from Internet to determine whether and copy-paste happens&amp;lt;ref&amp;gt; [http://www.lib.ncsu.edu/resolver/1840.16/8813 Automated Assessment of Reviews]&amp;lt;/ref&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
One of the problems of this code is the duplication of ps[0..ps.index(punctuation)-1] and ps.gsub(punctuation, &amp;quot;&amp;quot;) because of the use of an if else statement. To remove this duplication, the strategy design pattern can be used to make each of the duplicated functions into lambda blocks, or commands, and iterate over a punctuation array to remove undesired punctuation. Also, after inspecting the code it was seen that tokens[i] is only truly updated when there is not a punctuation in the string because i is not increased unless it gets to the end of the if-else statement. To fix this we can set a valid_token boolean if no punctuation is found, and then save that value into tokens. This refactored code is shown below. &lt;br /&gt;
&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    sentence_pieces = str_with_pos_tags.split(' ')&lt;br /&gt;
    num_tokens = 0&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
&lt;br /&gt;
    tag = '/'&lt;br /&gt;
    punctuation = %w(. , ! ;)&lt;br /&gt;
    sentence_pieces.each do |sp|&lt;br /&gt;
      #remove tag from sentence word&lt;br /&gt;
      if sp.include?(tag)&lt;br /&gt;
        sp = sp[0..sp.index(tag)-1]&lt;br /&gt;
      end&lt;br /&gt;
&lt;br /&gt;
      valid_token = true&lt;br /&gt;
      punctuation.each do |p|&lt;br /&gt;
        if sp.include?(p)&lt;br /&gt;
          valid_token = false&lt;br /&gt;
          break&lt;br /&gt;
        end&lt;br /&gt;
      end&lt;br /&gt;
      if valid_token&lt;br /&gt;
        tokens[num_tokens] = sp&lt;br /&gt;
        num_tokens+=1&lt;br /&gt;
      end&lt;br /&gt;
    end&lt;br /&gt;
    #end of the for loop&lt;br /&gt;
    tokens&lt;br /&gt;
  end&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future Work=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Through refactoring we've made the code easier to understand with design patterns involved, which meets the requirements of this project. But from our perspective, there should be more work to do in order to improve the whole performance of the code, which includes:&lt;br /&gt;
&lt;br /&gt;
1. There are some bugs in the initial method compare_reviews_with_questions_responses and google_search_response, which can not be implemented so far. We hope that people who are responsible for this project can fix it and make the method do the expected function well.&lt;br /&gt;
&lt;br /&gt;
2. Based on 1, we can do more tests regarding plagiarism, which makes the code development better.&lt;br /&gt;
&lt;br /&gt;
3. Through running tests, we've found there are some errors within the method of text_preprocssing.rb file,  which may cause a conflict with the function of plagiarism-check. Bug-fixing is needed.&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81208</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81208"/>
		<updated>2013-10-30T19:41:56Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now had two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be useful elsewhere, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
&lt;br /&gt;
Once this change was done, the next step was to refactor each of the three new methods that were created earlier to clean up the sentence_state method, because these still contain deeply nested if statements. The first method created was parse_sentence_tokens(str_with_pos_tags). Originally this method was as shown below:&lt;br /&gt;
 def parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    return tagged_tokens, tokens&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81201</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81201"/>
		<updated>2013-10-30T19:36:28Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb], and the NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81198</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81198"/>
		<updated>2013-10-30T19:36:04Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/constants.rb constants.rb] and [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/negations.rb negations.rb] and NEGATIVE_DESCRIPTORS definition was incomplete in negations.rb file. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81196</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81196"/>
		<updated>2013-10-30T19:33:44Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
[https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb sentence_state.rb]&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81195</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81195"/>
		<updated>2013-10-30T19:33:19Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens. Now when SentenceState is called, it makes a new TaggedSentence and then calls break_at_coordinating_conjunctions which returns the sentence clauses as arrays of sentence tokens. See the new TaggedSentence class here: [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/tagged_sentence.rb tagged_sentence.rb]&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81190</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81190"/>
		<updated>2013-10-30T19:30:44Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens.&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81189</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81189"/>
		<updated>2013-10-30T19:30:16Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens.&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81187</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81187"/>
		<updated>2013-10-30T19:29:52Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
===Classes===&lt;br /&gt;
Classes we are going to refactor are plagiarism_check.rb (155 lines).&lt;br /&gt;
sentence_state.rb (293 lines)&lt;br /&gt;
&lt;br /&gt;
===What it Does===&lt;br /&gt;
The two class files performs functions which are needed in NLP analysis of reviews in the research.&lt;br /&gt;
&lt;br /&gt;
===Our Job===&lt;br /&gt;
The code has many code smells. First, the methods are long and complex, and the codes are not structured well, which makes it not readable and understandable. Second, extremely long if-else branch and loop exists everywhere. Third, the responsibility of sentence_state.rb is heavy, some functions of sentence_state should be given to others. Finally, there are some duplicate codes.&lt;br /&gt;
&lt;br /&gt;
Our job is to refactor the two classes and make it more O-O style and have a clear structure. To eliminate the code smells, we need removing the duplicate code, reconstruct long if-else branch and loop, generating new methods and classes to encapsulate functionality of other complex methods, clear the responsibility of classes and methods. After refactoring, we also need to test the two classes throughoutly without error.&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;    &lt;br /&gt;
&lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end&lt;br /&gt;
    end #end of for loop&lt;br /&gt;
&lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
This was much easier to read and it revealed that the class SentenceState had too many responsibilities. The class now has two methods which parse the sentence: break_at_coordinating_conjunctions and parse_sentence_tokens. However, parsing a sentence into its sentence clauses and individual tokens (words) could potentially be used again in the future, so it is better to decouple this responsibility from SentenceState into a new class. So these two methods were refactored into a TaggedSentence class with the methods break_at_coordinating_conjunctions and parse_sentence_tokens.&lt;br /&gt;
&lt;br /&gt;
**Future note. The TaggedSentence class could be refactored by allowing a sentence to be either broken into sentence clauses or into sentence clause arrays of parsed sentence tokens.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81154</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81154"/>
		<updated>2013-10-30T19:19:00Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from 164 lines of if-else and for statements to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81146</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81146"/>
		<updated>2013-10-30T19:13:39Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
{| class=&amp;quot;wikitable collapsible&amp;quot;&lt;br /&gt;
! Simple collapsible table&lt;br /&gt;
|-&lt;br /&gt;
| Lorem ipsum dolor sit amet&lt;br /&gt;
|}&lt;br /&gt;
text&lt;br /&gt;
{{Collapse bottom}}&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81144</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81144"/>
		<updated>2013-10-30T19:12:41Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
{{Collapse top|title=This is the title text}}&lt;br /&gt;
text&lt;br /&gt;
{{Collapse bottom}}&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81135</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81135"/>
		<updated>2013-10-30T19:08:58Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;mw-customtoggle-myTable wikia-menu-button&amp;quot;&amp;gt;Show/Hide table&amp;lt;/div&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable mw-collapsible&amp;quot; data-expandtext=&amp;quot;Illuminate&amp;quot; data-collapsetext=&amp;quot;Deluminate&amp;quot;&lt;br /&gt;
!My || Header&lt;br /&gt;
|-&lt;br /&gt;
| Some || content&lt;br /&gt;
|-&lt;br /&gt;
| and || stuff.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81132</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81132"/>
		<updated>2013-10-30T19:08:21Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable mw-collapsible&amp;quot; data-expandtext=&amp;quot;Illuminate&amp;quot; data-collapsetext=&amp;quot;Deluminate&amp;quot;&lt;br /&gt;
!My || Header&lt;br /&gt;
|-&lt;br /&gt;
| Some || content&lt;br /&gt;
|-&lt;br /&gt;
| and || stuff.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81130</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81130"/>
		<updated>2013-10-30T19:07:29Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;mw-collapsible&amp;quot;&amp;gt;&lt;br /&gt;
This text is collapsible.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81119</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81119"/>
		<updated>2013-10-30T19:04:53Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81115</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81115"/>
		<updated>2013-10-30T19:04:11Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #getting next_state of the sentence clause      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, tagged_tokens)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord      &lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    &lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      &lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 &lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  &lt;br /&gt;
 #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81110</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81110"/>
		<updated>2013-10-30T19:01:01Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code. This changed the code from:&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
to:&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    tagged_tokens, tokens = parse_sentence_tokens(str_with_pos_tags)&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..tokens.length-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      current_token_type = get_token_type(tokens[j..tokens.length-1)&lt;br /&gt;
      &lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      state = next_state(state, current_token_type, prev_negative_word, interim_noun_verb)&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81079</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81079"/>
		<updated>2013-10-30T18:53:08Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
{{collapse top}}&lt;br /&gt;
hello&lt;br /&gt;
{{collapse bottom}}&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81076</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81076"/>
		<updated>2013-10-30T18:52:15Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
{{Collapse|1=Discussion text to be put into box.}}&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81071</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81071"/>
		<updated>2013-10-30T18:49:49Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with [http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller MVC design pattern]. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;collapsible&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;title&amp;quot;&amp;gt;Title&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;body&amp;quot;&amp;gt;&lt;br /&gt;
Hideable content&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81064</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81064"/>
		<updated>2013-10-30T18:47:58Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source project, for each year, students in the course of CSC517-Object Oriented Programmning of North Carolina State University will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism_check.rb and sentence_state.rb of the Expertiza project. Expertiza is built using Ruby on Rails with MVC design pattern. plagiarism_check.rb and sentence_state.rb are parts of the automated_metareview functionality inside models. The responsibility of sentence_state.rb is to determine the state of each clause of a sentence, and the responsibility of plagiarism_check.rb is to determine whether the reviews are just copied from other sources.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
[[collapsible show=&amp;quot;+ Show whatever&amp;quot; hide=&amp;quot;- Hide whatever&amp;quot;]]&lt;br /&gt;
Whatever text to show/hide.&lt;br /&gt;
[[/collapsible]]&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
===Main Responsibility ===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism.&lt;br /&gt;
&lt;br /&gt;
===Design Ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81053</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81053"/>
		<updated>2013-10-30T18:45:12Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source projects, for each year, students in the course of CSC517-Object Oriented Programmning will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism and sentence_state.rb of the Expertiza project.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
[[collapsible]]&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
[[/collapsible]]&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
===Main funcitions===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
===Design ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor steps in general===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Please see code after refactoring in detail on this [https://github.com/shanfangshuiyuan/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb page].&lt;br /&gt;
&lt;br /&gt;
All the tests have been passed without failures since refactoring.&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81046</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81046"/>
		<updated>2013-10-30T18:43:15Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source projects, for each year, students in the course of CSC517-Object Oriented Programmning will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism and sentence_state.rb of the Expertiza project.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
[[collapsible show=&amp;quot;+ show me the hidden content&amp;quot; hide=&amp;quot;- hide this content&amp;quot;]]&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
[[/collapsible]]&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
===Main funcitions===&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
===Design ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor steps in general===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81040</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81040"/>
		<updated>2013-10-30T18:41:03Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Refactor Steps */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source projects, for each year, students in the course of CSC517-Object Oriented Programmning will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism and sentence_state.rb of the Expertiza project.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
 def sentence_state(str_with_pos_tags)&lt;br /&gt;
    state = POSITIVE&lt;br /&gt;
    #checking single tokens for negated words&lt;br /&gt;
    st = str_with_pos_tags.split(&amp;quot; &amp;quot;)&lt;br /&gt;
    count = st.length&lt;br /&gt;
    tokens = Array.new&lt;br /&gt;
    tagged_tokens = Array.new&lt;br /&gt;
    i = 0&lt;br /&gt;
    interim_noun_verb  = false #0 indicates no interim nouns or verbs&lt;br /&gt;
        &lt;br /&gt;
    #fetching all the tokens&lt;br /&gt;
    for k in (0..st.length-1)&lt;br /&gt;
      ps = st[k]&lt;br /&gt;
      #setting the tagged string&lt;br /&gt;
      tagged_tokens[i] = ps&lt;br /&gt;
      if(ps.include?(&amp;quot;/&amp;quot;))&lt;br /&gt;
        ps = ps[0..ps.index(&amp;quot;/&amp;quot;)-1] &lt;br /&gt;
      end&lt;br /&gt;
      #removing punctuations &lt;br /&gt;
      if(ps.include?(&amp;quot;.&amp;quot;))&lt;br /&gt;
        tokens[i] = ps[0..ps.index(&amp;quot;.&amp;quot;)-1]&lt;br /&gt;
      elsif(ps.include?(&amp;quot;,&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;!&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;!&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      elsif(ps.include?(&amp;quot;;&amp;quot;))&lt;br /&gt;
        tokens[i] = ps.gsub(&amp;quot;;&amp;quot;, &amp;quot;&amp;quot;)&lt;br /&gt;
      else&lt;br /&gt;
        tokens[i] = ps&lt;br /&gt;
        i+=1&lt;br /&gt;
      end     &lt;br /&gt;
    end#end of the for loop&lt;br /&gt;
    &lt;br /&gt;
    #iterating through the tokens to determine state&lt;br /&gt;
    prev_negative_word =&amp;quot;&amp;quot;&lt;br /&gt;
    for j  in (0..i-1)&lt;br /&gt;
      #checking type of the word&lt;br /&gt;
      #checking for negated words&lt;br /&gt;
      if(is_negative_word(tokens[j]) == NEGATED)  &lt;br /&gt;
        returned_type = NEGATIVE_WORD&lt;br /&gt;
      #checking for a negative descriptor (indirect indicators of negation)&lt;br /&gt;
      elsif(is_negative_descriptor(tokens[j]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_DESCRIPTOR&lt;br /&gt;
      #2-gram phrases of negative phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp; &lt;br /&gt;
        is_negative_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == NEGATED)&lt;br /&gt;
        returned_type = NEGATIVE_PHRASE&lt;br /&gt;
        j = j+1      &lt;br /&gt;
      #if suggestion word is found&lt;br /&gt;
      elsif(is_suggestive(tokens[j]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
      #2-gram phrases suggestion phrases&lt;br /&gt;
      elsif(j+1 &amp;lt; count &amp;amp;&amp;amp; !tokens[j].nil? &amp;amp;&amp;amp; !tokens[j+1].nil? &amp;amp;&amp;amp;&lt;br /&gt;
         is_suggestive_phrase(tokens[j]+&amp;quot; &amp;quot;+tokens[j+1]) == SUGGESTIVE)&lt;br /&gt;
        returned_type = SUGGESTIVE&lt;br /&gt;
        j = j+1&lt;br /&gt;
      #else set to positive&lt;br /&gt;
      else&lt;br /&gt;
        returned_type = POSITIVE&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #----------------------------------------------------------------------&lt;br /&gt;
      #comparing 'returnedType' with the existing STATE of the sentence clause&lt;br /&gt;
      #after returnedType is identified, check its state and compare it to the existing state&lt;br /&gt;
      #if present state is negative and an interim non-negative or non-suggestive word was found, set the flag to true&lt;br /&gt;
      if((state == NEGATIVE_WORD or state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_PHRASE) and returned_type == POSITIVE)&lt;br /&gt;
        if(interim_noun_verb == false and (tagged_tokens[j].include?(&amp;quot;NN&amp;quot;) or tagged_tokens[j].include?(&amp;quot;PR&amp;quot;) or tagged_tokens[j].include?(&amp;quot;VB&amp;quot;) or tagged_tokens[j].include?(&amp;quot;MD&amp;quot;)))&lt;br /&gt;
          interim_noun_verb = true&lt;br /&gt;
        end&lt;br /&gt;
      end &lt;br /&gt;
      &lt;br /&gt;
      if(state == POSITIVE and returned_type != POSITIVE)&lt;br /&gt;
        state = returned_type&lt;br /&gt;
      #when state is a negative word&lt;br /&gt;
      elsif(state == NEGATIVE_WORD) #previous state&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          #these words embellish the negation, so only if the previous word was not one of them you make it positive&lt;br /&gt;
          if(prev_negative_word.casecmp(&amp;quot;NO&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NEVER&amp;quot;) != 0 and prev_negative_word.casecmp(&amp;quot;NONE&amp;quot;) != 0)&lt;br /&gt;
            state = POSITIVE #e.g: &amp;quot;not had no work..&amp;quot;, &amp;quot;doesn't have no work..&amp;quot;, &amp;quot;its not that it doesn't bother me...&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;no it doesn't help&amp;quot;, &amp;quot;no there is no use for ...&amp;quot;&lt;br /&gt;
          end  &lt;br /&gt;
          interim_noun_verb = false #resetting         &lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR or returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = POSITIVE #e.g.: &amp;quot;not bad&amp;quot;, &amp;quot;not taken from&amp;quot;, &amp;quot;I don't want nothing&amp;quot;, &amp;quot;no code duplication&amp;quot;// [&amp;quot;It couldn't be more confusing..&amp;quot;- anomaly we dont handle this for now!]&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          #e.g. &amp;quot; it is not too useful as people could...&amp;quot;, what about this one?&lt;br /&gt;
          if(interim_noun_verb == true) #there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD&lt;br /&gt;
          else&lt;br /&gt;
            state = SUGGESTIVE #e.g.:&amp;quot;I do not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative descriptor&lt;br /&gt;
      elsif(state == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g: &amp;quot;hard(-) to understand none(-) of the comments&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;He hardly not....&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_DESCRIPTOR #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is hardly confusing..&amp;quot;, but what about &amp;quot;it is a little confusing..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_PHRASE #e.g:&amp;quot;there is barely any code duplication&amp;quot;&lt;br /&gt;
          else &lt;br /&gt;
            state = POSITIVE #e.g.:&amp;quot;it is hard and appears to be taken from&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I hardly(-) suggested(S) ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is a negative phrase&lt;br /&gt;
      elsif(state == NEGATIVE_PHRASE)&lt;br /&gt;
        if(returned_type == NEGATIVE_WORD)&lt;br /&gt;
          if(interim_noun_verb == true)#there are some words in between&lt;br /&gt;
            state = NEGATIVE_WORD #e.g.&amp;quot;It is too short the text and doesn't&amp;quot;&lt;br /&gt;
          else&lt;br /&gt;
            state = POSITIVE #e.g.&amp;quot;It is too short not to contain..&amp;quot;&lt;br /&gt;
          end&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR #e.g.&amp;quot;It is too short barely covering...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE #e.g.:&amp;quot;it is too short, taken from ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        elsif(returned_type == SUGGESTIVE)&lt;br /&gt;
          state = SUGGESTIVE #e.g.:&amp;quot;I too short and I suggest ...&amp;quot;&lt;br /&gt;
          interim_noun_verb = false #resetting&lt;br /&gt;
        end&lt;br /&gt;
      #when state is suggestive&lt;br /&gt;
      elsif(state == SUGGESTIVE) #e.g.:&amp;quot;I might(S) not(-) suggest(S) ...&amp;quot;&lt;br /&gt;
        if(returned_type == NEGATIVE_DESCRIPTOR)&lt;br /&gt;
          state = NEGATIVE_DESCRIPTOR&lt;br /&gt;
        elsif(returned_type == NEGATIVE_PHRASE)&lt;br /&gt;
          state = NEGATIVE_PHRASE&lt;br /&gt;
        end&lt;br /&gt;
        #e.g.:&amp;quot;I suggest you don't..&amp;quot; -&amp;gt; suggestive&lt;br /&gt;
        interim_noun_verb = false #resetting&lt;br /&gt;
      end&lt;br /&gt;
      &lt;br /&gt;
      #setting the prevNegativeWord&lt;br /&gt;
      if(tokens[j].casecmp(&amp;quot;NO&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NEVER&amp;quot;) == 0 or tokens[j].casecmp(&amp;quot;NONE&amp;quot;) == 0)&lt;br /&gt;
        prev_negative_word = tokens[j]&lt;br /&gt;
      end  &lt;br /&gt;
          &lt;br /&gt;
    end #end of for loop&lt;br /&gt;
    &lt;br /&gt;
    if(state == NEGATIVE_DESCRIPTOR or state == NEGATIVE_WORD or state == NEGATIVE_PHRASE)&lt;br /&gt;
      state = NEGATED&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    return state&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
===Main funcitions===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
===Design ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor steps in general===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81037</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81037"/>
		<updated>2013-10-30T18:40:03Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work&amp;lt;ref&amp;gt; [https://github.com/expertiza/expertiza Expertiza github]&amp;lt;/ref&amp;gt;. Expertiza also supports team projects and any document type of submission is acceptable&amp;lt;ref&amp;gt; [http://wikis.lib.ncsu.edu/index.php/Expertiza Expertiza wiki]&amp;lt;/ref&amp;gt;. Expertiza has been deployed for years to help professors and students engaging in the learning process. Expertiza is an open source projects, for each year, students in the course of CSC517-Object Oriented Programmning will contributes to this project along with teaching assistant and professor.&lt;br /&gt;
&lt;br /&gt;
For this year, we are responsible for refactoring plagiarism and sentence_state.rb of the Expertiza project.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
===Refactor Steps===&lt;br /&gt;
The first step in refactoring is to get the tests to pass. This required some debugging to find that some constants were defined in two different files, and one of the definitions was incomplete. After updating this, all 18 original tests in sentence_state_test.rb passed.&lt;br /&gt;
&lt;br /&gt;
The first place to refactor was the longest method in SentenceState, the method sentence_state(str_with_pos_tags). There were three for loops, each with deeply nested if-else statements inside of them. To make this method more readable, I extracted three for or if-else statements into their own method to clean up the code.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Main funcitions===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
===Design ideas===&lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are: compare_reviews_with_submissions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_questions, &lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_responses, and&lt;br /&gt;
&lt;br /&gt;
compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
===Refactor Step===&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 ...&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81009</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=81009"/>
		<updated>2013-10-30T18:33:28Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Design Smells */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work. Expertiza also supports team projects and any document type of submission is acceptable. Expertiza has been deployed for years to help professors and students engaging in the learning process.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements and duplicated code. Another design smell was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The main responsibility of SentenceState should be just to determine the state of a sentence, and another class should be in charge of knowing the parts of the sentence. The worst problem was the a deeply nested if-else statement which determined the next state of the sentence clause based on the previous state and the next sentence token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between there own state and any sentence token type.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next thing to do is extract the long loop or if-else sentence to a individual method in order to make the initial method too long or confused for others.&lt;br /&gt;
&lt;br /&gt;
Take the 1st method compare_reviews_with_submissions as example, we noticed that the this part: &lt;br /&gt;
&lt;br /&gt;
 if(array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
          rev_len+=1&lt;br /&gt;
          next&lt;br /&gt;
        end&lt;br /&gt;
        &lt;br /&gt;
        #generating the sentence segment you'd like to compare&lt;br /&gt;
 rev_phrase = array[rev_len]&lt;br /&gt;
&lt;br /&gt;
can be extracted and made a new method skip_empty_array, since these lines focus on the function of generating the array without backspaces to make comparisons. Once we extract the method, the initial code of the compare_reviews_with_submissions changed:&lt;br /&gt;
&lt;br /&gt;
expertiza/app/models/automated_metareview/plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
 …….&lt;br /&gt;
 review_text.each do |review_arr| #iterating through the review's sentences&lt;br /&gt;
    review = review_arr.to_s&lt;br /&gt;
    subm_text.each do |subm_arr|&lt;br /&gt;
      #iterating though the submission's sentences&lt;br /&gt;
      submission = subm_arr.to_s&lt;br /&gt;
      rev_len = 0&lt;br /&gt;
      #review's tokens, taking 'n' at a time&lt;br /&gt;
      array = review.split(&amp;quot; &amp;quot;)&lt;br /&gt;
      while(rev_len &amp;lt; array.length) do&lt;br /&gt;
        rev_len, rev_phrase = skip_empty_array(array, rev_len)&lt;br /&gt;
      ...&lt;br /&gt;
 def skip_empty_array(array, rev_len)&lt;br /&gt;
  if (array[rev_len] == &amp;quot; &amp;quot;) #skipping empty&lt;br /&gt;
    rev_len+=1&lt;br /&gt;
 end&lt;br /&gt;
  #generating the sentence segment you'd like to compare&lt;br /&gt;
  rev_phrase = array[rev_len]&lt;br /&gt;
  return rev_len, rev_phrase&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
*https://github.com/expertiza/expertiza&lt;br /&gt;
*http://wikis.lib.ncsu.edu/index.php/Expertiza&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80995</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80995"/>
		<updated>2013-10-30T18:26:18Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
Expertiza is a web application, which allows students to submit assignments and do peer review of each other's work. Expertiza also supports team projects and any document type of submission is acceptable. Expertiza has been deployed for years to help professors and students engaging in the learning process.&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
===Design Smells===&lt;br /&gt;
The original code had several design smells, mostly deeply-nested if-else statements, and duplicated code. Another problem was that SentenceState had too many responsibilities. It had to first parse the sentence into separate sentence clauses and then separate the sentence clauses into tokens before iterating through the tokens to determine the state of the sentence. The main responsibility of SentenceState should be just to determine the state of a sentence, and another class should be in charge of knowing the parts of the sentence. The worst problem was the if-else statement which determined the next state of the sentence clause based on the previous state and the next token. Instead of the SentenceState class being responsible for all of these relationships, it is better to have subclasses of SentenceState which each know the relationship between themselves and any token type.&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80983</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80983"/>
		<updated>2013-10-30T18:23:59Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* sentence_state.rb */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
To see the original code please go to this link: &lt;br /&gt;
https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80980</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80980"/>
		<updated>2013-10-30T18:22:46Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Link to VCL */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
To avoid such things to happen, we extract this part and let it be a method to check the state of plagiarism : &lt;br /&gt;
&lt;br /&gt;
 def check_plagiarism_state(auto_metareview, count_copies, rev_array, scores)&lt;br /&gt;
   if count_copies &amp;gt; 0 #resetting review_array only when plagiarism was found&lt;br /&gt;
     auto_metareview.review_array = rev_array&lt;br /&gt;
     if count_copies == scores.length&lt;br /&gt;
       return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
     else&lt;br /&gt;
       return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
     end&lt;br /&gt;
   end&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80976</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80976"/>
		<updated>2013-10-30T18:22:08Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Git Forked Repository URL */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. If the first one does not work, please use this one. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza Expertiza fork]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80974</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80974"/>
		<updated>2013-10-30T18:21:52Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Git Forked Repository URL */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. If the first one does not work, please use this one. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [https://github.com/shanfangshuiyuan/expertiza &amp;quot;Expertiza fork&amp;quot;]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80973</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80973"/>
		<updated>2013-10-30T18:21:27Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* Git Forked Repository URL */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. If the first one does not work, please use this one. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; Expertiza fork [https://github.com/shanfangshuiyuan/expertiza]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80971</id>
		<title>CSC/ECE 517 Fall 2013/oss E816 cyy</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2013/oss_E816_cyy&amp;diff=80971"/>
		<updated>2013-10-30T18:21:02Z</updated>

		<summary type="html">&lt;p&gt;Cmmcclen: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Introduction to Refactoring plagiarism_check.rb and sentence_state.rb =&lt;br /&gt;
&lt;br /&gt;
=Project description=&lt;br /&gt;
&lt;br /&gt;
=Design=&lt;br /&gt;
==sentence_state.rb==&lt;br /&gt;
&lt;br /&gt;
==plagiarism_check.rb==&lt;br /&gt;
&lt;br /&gt;
To see the original code please go to this [https://github.com/expertiza/expertiza/blob/master/app/models/automated_metareview/plagiarism_check.rb link].&lt;br /&gt;
&lt;br /&gt;
The main responsibility of Plagiarism_Check is to determine whether the reviews are just copied from other sources. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Basically, there are four kinds of plagiarism need to be check : &lt;br /&gt;
&lt;br /&gt;
1. whether the review is copied from the submissions of the assignment&lt;br /&gt;
 &lt;br /&gt;
2. whether the review is copied from the review questions&lt;br /&gt;
&lt;br /&gt;
3. whether the review is copied from other reviews&lt;br /&gt;
&lt;br /&gt;
4. whether the review is copied from the Internet or other sources, this may be detected through google search&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For example, in the test file: expertiza/test/unit/automated_metareview/plagiarism_check_test.rb,&lt;br /&gt;
&lt;br /&gt;
The 1st test shows:&lt;br /&gt;
&lt;br /&gt;
 test &amp;quot;check for plagiarism true match&amp;quot; do&lt;br /&gt;
    review_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
    subm_text = [&amp;quot;The sweet potatoes in the vegetable bin are green with mold. These sweet potatoes in the vegetable bin are fresh.&amp;quot;]&lt;br /&gt;
   &lt;br /&gt;
    instance = PlagiarismChecker.new&lt;br /&gt;
    assert_equal(true, instance.check_for_plagiarism(review_text, subm_text))&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
The check_for_plagiarism method compares the review text with submission text. In this case, the review text does not quote the words as well as sentences properly and the reviewer just copies what the author says, which cause a plagiarism. &lt;br /&gt;
&lt;br /&gt;
From above point of view, the refactoring needs to be done with 4 fundamental methods and each method only does one thing correctly.  So as the initial file Plagiarism_check.rb indicates, the compare_reviews_with_questions_responses method has roughly 2 functions : compare reviews with review questions  as well as compare reviews with others’ responses, which makes us confused.  As the refactoring goes, we need to split the two functions up, and make sure such bad smells disappear.&lt;br /&gt;
&lt;br /&gt;
The first thing to do is based on the above statement, we need to define 4 methods with different functions. &lt;br /&gt;
&lt;br /&gt;
They are compare_reviews_with_submissions, compare_reviews_with_questions, &lt;br /&gt;
compare_reviews_with_responses, compare_reviews_with_google_search, each method has its specific functions.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As showed above, we have to split the method compare_reviews_with_questions _responses up to 2 methods: &lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_questions(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
 def compare_reviews_with_responses(auto_metareview, map_id)&lt;br /&gt;
 …&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
Next we need to extract the same part from the long method and make the part a individual method which can be called in class. For example in the method compare_reviews_with_questions and compare_reviews_with_responses they have the common parts: to check whether the reviews are copied fully from the responses/questions,&lt;br /&gt;
&lt;br /&gt;
 if(count_copies &amp;gt; 0) #resetting review_array only when plagiarism was found&lt;br /&gt;
       auto_metareview.review_array = rev_array&lt;br /&gt;
    end&lt;br /&gt;
    &lt;br /&gt;
    if(count_copies &amp;gt; 0 and count_copies == scores.length)&lt;br /&gt;
      return ALL_RESPONSES_PLAGIARISED #plagiarism, with all other metrics 0&lt;br /&gt;
    elsif(count_copies &amp;gt; 0)&lt;br /&gt;
      return SOME_RESPONSES_PLAGIARISED #plagiarism, while evaluating other metrics&lt;br /&gt;
 end&lt;br /&gt;
&lt;br /&gt;
=Test Our Code=&lt;br /&gt;
==Link to VCL==&lt;br /&gt;
The purpose of running the VCL server is to let you make sure that expertiza is still working properly using our refactored code. The first VCL link is seeded with the expertiza-scrubbed.sql file which includes questionnaires and courses and assignments so that it is easy to verify that reviews work. You only need to make users and then have them do reviews on one another. The second link is only using the test.sql file but you can still verify that the functionality of expertiza works. If neither of these links work, please do not do your review in a hurry, shoot us an email, we will fix it as soon as possible. (yhuang25@ncsu.edu, ysun6@ncsu.edu, grimes.caroline@gmail.com). Thank you so much!&lt;br /&gt;
&lt;br /&gt;
1. http://152.46.20.30:3000/  Username: admin, password:password&lt;br /&gt;
&lt;br /&gt;
2. If the first one does not work, please use this one. http://vclv99-129.hpc.ncsu.edu:3000 Username: admin, password: admin&lt;br /&gt;
&lt;br /&gt;
==Git Forked Repository URL==&lt;br /&gt;
https://github.com/shanfangshuiyuan/expertiza &amp;lt;ref&amp;gt; [Expertiza fork https://github.com/shanfangshuiyuan/expertiza]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Steps to Setup Project==&lt;br /&gt;
1. Clone the git repository shown above.&lt;br /&gt;
&lt;br /&gt;
2. Use ruby 1.9.3&lt;br /&gt;
&lt;br /&gt;
3. Setup mysql and start server&lt;br /&gt;
&lt;br /&gt;
4. Command line: bundle install&lt;br /&gt;
&lt;br /&gt;
5. Download from http://dev.mysql.com/get/Downloads/Connector-C/mysql-connector-c-noinstall-6.0.2-win32.zip/from/pick and copy all files from the lib folder from the download into &amp;lt;Ruby193&amp;gt;\bin&lt;br /&gt;
&lt;br /&gt;
6. Change /config/database.yml according your mysql root password and mysql port.&lt;br /&gt;
&lt;br /&gt;
7. Command line: db:create:all&lt;br /&gt;
&lt;br /&gt;
8. Command line: mysql -u root -p &amp;lt;YOUR_PASSWORD&amp;gt; pg_development &amp;lt; expertiza-scrubbed_2013_07_10.sql&lt;br /&gt;
&lt;br /&gt;
9. Command line: rake db:migrate&lt;br /&gt;
&lt;br /&gt;
10. Command line: rails server&lt;br /&gt;
&lt;br /&gt;
==Test Our Code==&lt;br /&gt;
1. Set up the project following the steps above&lt;br /&gt;
&lt;br /&gt;
2. Command line: db:test:prepare&lt;br /&gt;
&lt;br /&gt;
3. Run plagiarism_check_test.rb and sentence_state_test.rb, they are under /test/unit/automated_metareview. After refactoring, all tests passed without error.&lt;br /&gt;
&lt;br /&gt;
4. Review the refactored files: sentence_state.rb and plagiarism_check.rb are under /app/models/automated_metareview. Other changed files are shown below.&lt;br /&gt;
&lt;br /&gt;
==Files Changed==&lt;br /&gt;
1. text_preprocessing.rb&lt;br /&gt;
&lt;br /&gt;
2. plagiarism_check.rb&lt;br /&gt;
&lt;br /&gt;
3. sentence_state.rb&lt;br /&gt;
&lt;br /&gt;
4. tagged_sentence.rb&lt;br /&gt;
&lt;br /&gt;
5. constants.rb&lt;br /&gt;
&lt;br /&gt;
6. negations.rb&lt;br /&gt;
&lt;br /&gt;
7. plagiarism_check_test.rb&lt;br /&gt;
&lt;br /&gt;
=Future work=&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmmcclen</name></author>
	</entry>
</feed>