CSC/ECE 517 Spring 2015 E1527 SWAR: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<font size="5"><b>E1527. Refactor Autometareviews gem and migration to Web-Service</b></font><br>
<font size="5"><b>E1527. Refactor Autometareviews gem and migration to Web-Service</b></font><br>
= Introduction to Autometareview project [https://github.com/lramach/autometareviews0.1]=
= Introduction to Autometareview project <ref>https://github.com/lramach/autometareviews0.1</ref>=
This project is developed as part of Expertiza project [http://wikis.lib.ncsu.edu/index.php/Expertiza]. <br />
This project is developed as part of Expertiza project <ref>http://wikis.lib.ncsu.edu/index.php/Expertiza</ref>. <br />
The automated metareview tool identifies the quality of a review using natural language processing and machine learning techniques (completely automated). Feedback is provided to reviewers on the following metrics:
The automated metareview tool identifies the quality of a review using natural language processing and machine learning techniques (completely automated). Feedback is provided to reviewers on the following metrics:
<ol>
<ol>
Line 7: Line 7:
<li>Review Content Type: This metric identifies whether the review contains 'summative content' -- positive feedback, problem detection content' -- problems identified by reviewers in the author's work or 'advisory content' -- content indicating suggestions or advice provided by reviewers. A numeric feedback on the scale of 0--1 is provided for each content type to indicate whether the review contains that type of content. </li>
<li>Review Content Type: This metric identifies whether the review contains 'summative content' -- positive feedback, problem detection content' -- problems identified by reviewers in the author's work or 'advisory content' -- content indicating suggestions or advice provided by reviewers. A numeric feedback on the scale of 0--1 is provided for each content type to indicate whether the review contains that type of content. </li>
<li>Review Coverage: This metric indicates the extent to which a review covers the main points of a submission. Numeric value in the range of 0--1 indicates the coverage of a review. </li>
<li>Review Coverage: This metric indicates the extent to which a review covers the main points of a submission. Numeric value in the range of 0--1 indicates the coverage of a review. </li>
<li>Plagiarism: Indicates the presence of plagiarism in the review text.</li>
<li>Plagiarism<ref>http://www.plagiarism.org/plagiarism-101/what-is-plagiarism/</ref>: Indicates the presence of plagiarism in the review text.</li>
<li>Tone: The metric indicates whether a review has a positive, negative or neutral tone. </li>  
<li>Tone: The metric indicates whether a review has a positive, negative or neutral tone. </li>  
<li>Quantity: Indicates the number of unique words used by the reviewer in the review. </li>
<li>Quantity: Indicates the number of unique words used by the reviewer in the review. </li>
</ol>
</ol>
<br/>
__TOC__
__TOC__


= Problem Statement =
= Problem Statement =


Currently, Autometareviews project is used as a gem in Expertiza project. Purpose of this project is to migrate this gem to a web service and expose its methods on web, which can
Currently, Autometareviews project is used as a gem<ref>http://guides.rubygems.org/what-is-a-gem/</ref> in Expertiza project. Purpose of this project is to migrate this gem to a web service<ref>http://en.wikipedia.org/wiki/Web_service</ref> and expose its methods on web, which can be consumed by any application as web service. Older gem was dependent old libraries<ref>http://en.wikipedia.org/wiki/Library_%28computing%29</ref>
be consumed by any application as web service. Older gem was dependent old libraries
such as Stanford-core-nlp, rwordnet, etc. We will migrate them to new libraries without breaking the existing feature-set. We are also going to refactor the source code of this gem file to promote readability, reduced complexity, and code redundancies. We will fix
such as Stanford-core-nlp, rwordnet, etc. We will migrate them to new libraries without
any bug or bottleneck that we can find to improve the performance of this service. We will not add any new feature to the existing feature set provided by the gem. Before making any modification to the existing features, we will present them before Dr.
breaking the existing feature-set. We are also going to refactor the source code of this
gem file to promote readability, reduced complexity, and code redundancies. We will fix
any bug or bottleneck that we can find to improve the performance of this service.
We will not add any new feature to the existing feature set provided by the gem. Before
making any modification to the existing features, we will present them before Dr.
Gehringer and his Expertiza team.
Gehringer and his Expertiza team.


= Scope =
= Scope =
The scope of this project includes migration of existing gem application to a web-service, refactoring the existing classes and migrating to newer libraries, wherever possible.
There are three separate scope items in this project -
The classes that we propose to refactor are tone.rb, degree_of_relevance.rb, wordnet_based_similarity.rb, sentence_state.rb, cluster_generation.rb, plagiarism_check.rb, graph_generator.rb, predict_class.rb, and review_coverage.rb.
:* Migration of existing gem application to a web service
:* Refactoring the existing ruby classes
:* Migrating to newer libraries, wherever possible.
 
The classes that we propose to refactor are -
:* tone.rb  
:* degree_of_relevance.rb  
:* wordnet_based_similarity.rb  
:* sentence_state.rb  
:* cluster_generation.rb  
:* plagiarism_check.rb  
:* graph_generator.rb  
:* predict_class.rb  
:* review_coverage.rb  
 
No new feature will be developed as part of this project. Any major code change due to inclusion of newer libraries will be communicated to Expertiza project team. Existing code will be tested to ensure the functionality does not change.
No new feature will be developed as part of this project. Any major code change due to inclusion of newer libraries will be communicated to Expertiza project team. Existing code will be tested to ensure the functionality does not change.
=Standards =
=Standards =
All developed code will adhere to the ruby on rails coding guidelines.[https://docs.google.com/document/d/1qQD7fcypFk77nq7Jx7ZNyCNpLyt1oXKaq5G-W7zkV3k/edit]
All developed code will adhere to the ruby on rails coding guidelines<ref>https://docs.google.com/document/d/1qQD7fcypFk77nq7Jx7ZNyCNpLyt1oXKaq5G-W7zkV3k/edit</ref>.


= List of Tasks =
= List of Tasks =
Metioned below are the tasks we will perform as part of this project.[https://docs.google.com/document/d/10JTdEjCiRTre3nO4j_czBzhcxkqOSzmAT8jyiJHNz4c/edit#]
Metioned below are the tasks we will perform as part of this project.<ref>https://docs.google.com/document/d/10JTdEjCiRTre3nO4j_czBzhcxkqOSzmAT8jyiJHNz4c/edit#</ref>
 
This system is still in nascent stage and have many performance related issues. It takes a long time (about 2 minutes) to generate single meta-review. This is an unacceptable performance statistics for Expertiza. We propose to re-factor code and identify the areas that affect the overall performance of the system. Few areas we identified in preliminary review are: <br />
:* Reading seed data from csv in each pass takes  up a lot of time. We can move this data into Mysql and use ActiveRecords to speed data fetch.
:* WordNet based semantic matching takes a lot of time. We will review the method used and present our finding about areas of concern.
 
== 1. Refactor Code==
== 1. Refactor Code==
<ol>
<ol>
Line 56: Line 72:
The libraries used by gem are very old. We plan to migrate the dependent libraries to their latest versions.<br />
The libraries used by gem are very old. We plan to migrate the dependent libraries to their latest versions.<br />
Libraries, we have identified are: <br />
Libraries, we have identified are: <br />
<ul>
:* stanford-core-nlp <ref>http://nlp.stanford.edu/software/corenlp.shtml</ref>
<li>stanford-core-nlp <ref>http://nlp.stanford.edu/software/corenlp.shtml</ref></li>
:* rwordnet <ref>https://rubygems.org/gems/rwordnet</ref>  
<li>rwordnet[https://rubygems.org/gems/rwordnet] </li>
:* rjb <ref>https://rubygems.org/gems/rjb</ref>
<li>rjb [https://rubygems.org/gems/rjb]</li>
:* bind-it<ref>https://rubygems.org/gems/bind-it</ref>
<li>bind-it[https://rubygems.org/gems/bind-it]</li>
 
</ul>
We will also migrate the project to use Java 8<ref>http://java.com/en/download/whatis_java.jsp</ref>.
We will also migrate the project to use Java 8[http://java.com/en/download/whatis_java.jsp].


== 3. Migrate gem to Web service ==
== 3. Migrate gem to Web service ==
Web service will expose calculate_metareview_metric method over web, which will consume review, submission, an array of rubrics. It will return the autometareview result as JSON.
Expertiza system tries to evaluate each review using an automated meta-review system. This system is packaged as a library and used by Expertiza. Automated Metareview system is an independent entity and can be used by other peer review systems as well.  There are many other peer review systems, which can benefit from this system, if this is available for them to evaluate their rubrics. We are working on migrating this system from a library to a web service.
 
===3.1 Design ===
 
The Metareview system is Natural Language Processing based system that compares the reviews written with the original article. The webservice will expose the "AutomatedMetareview" method.
 
[[File:System Design.jpg|frame|center|upright=0.5|Web Service Design]]
 
 
The request JSON object to the method will have the following parameters :
:* original article
:* review written for this article
:* rubric used during article review.
 
The web service will return the meta-review as a JSON object. The response JSON object will have the parameters mentioned below: <br />
:* plagiarism
:* relevance
:* content_summative
:* content_problem
:* content_advisory
:* coverage
:* tone_positive
:* tone_negative
:* tone_neutral
:* quantity
 
[[File:Workflow.jpg|frame|center|Interaction between Client and Web Service]]
 
===3.2 Assumptions===
For the project, the code that is being modified is assumed to be correct and meet all feature requirements of the system. Interactions modified due to refactoring will not change the underline system definitions.
 
== 4. Testing==
We will be using the existing test suite used by gem to test any new code modification. We will be writing new test cases for web service implementation and any new public method exposed by existing classes.


= References =
= References =
<references/>
<references/>

Latest revision as of 03:05, 9 April 2015

E1527. Refactor Autometareviews gem and migration to Web-Service

Introduction to Autometareview project <ref>https://github.com/lramach/autometareviews0.1</ref>

This project is developed as part of Expertiza project <ref>http://wikis.lib.ncsu.edu/index.php/Expertiza</ref>.
The automated metareview tool identifies the quality of a review using natural language processing and machine learning techniques (completely automated). Feedback is provided to reviewers on the following metrics:

  1. Review relevance: This metric tells the reviewer how relevant the review is to the content of the author's submission. Numeric feedback in the scale of 0--1 is provided to indicate a review's relevance.
  2. Review Content Type: This metric identifies whether the review contains 'summative content' -- positive feedback, problem detection content' -- problems identified by reviewers in the author's work or 'advisory content' -- content indicating suggestions or advice provided by reviewers. A numeric feedback on the scale of 0--1 is provided for each content type to indicate whether the review contains that type of content.
  3. Review Coverage: This metric indicates the extent to which a review covers the main points of a submission. Numeric value in the range of 0--1 indicates the coverage of a review.
  4. Plagiarism<ref>http://www.plagiarism.org/plagiarism-101/what-is-plagiarism/</ref>: Indicates the presence of plagiarism in the review text.
  5. Tone: The metric indicates whether a review has a positive, negative or neutral tone.
  6. Quantity: Indicates the number of unique words used by the reviewer in the review.


Problem Statement

Currently, Autometareviews project is used as a gem<ref>http://guides.rubygems.org/what-is-a-gem/</ref> in Expertiza project. Purpose of this project is to migrate this gem to a web service<ref>http://en.wikipedia.org/wiki/Web_service</ref> and expose its methods on web, which can be consumed by any application as web service. Older gem was dependent old libraries<ref>http://en.wikipedia.org/wiki/Library_%28computing%29</ref> such as Stanford-core-nlp, rwordnet, etc. We will migrate them to new libraries without breaking the existing feature-set. We are also going to refactor the source code of this gem file to promote readability, reduced complexity, and code redundancies. We will fix any bug or bottleneck that we can find to improve the performance of this service. We will not add any new feature to the existing feature set provided by the gem. Before making any modification to the existing features, we will present them before Dr. Gehringer and his Expertiza team.

Scope

There are three separate scope items in this project -

  • Migration of existing gem application to a web service
  • Refactoring the existing ruby classes
  • Migrating to newer libraries, wherever possible.

The classes that we propose to refactor are -

  • tone.rb
  • degree_of_relevance.rb
  • wordnet_based_similarity.rb
  • sentence_state.rb
  • cluster_generation.rb
  • plagiarism_check.rb
  • graph_generator.rb
  • predict_class.rb
  • review_coverage.rb

No new feature will be developed as part of this project. Any major code change due to inclusion of newer libraries will be communicated to Expertiza project team. Existing code will be tested to ensure the functionality does not change.

Standards

All developed code will adhere to the ruby on rails coding guidelines<ref>https://docs.google.com/document/d/1qQD7fcypFk77nq7Jx7ZNyCNpLyt1oXKaq5G-W7zkV3k/edit</ref>.

List of Tasks

Metioned below are the tasks we will perform as part of this project.<ref>https://docs.google.com/document/d/10JTdEjCiRTre3nO4j_czBzhcxkqOSzmAT8jyiJHNz4c/edit#</ref>

This system is still in nascent stage and have many performance related issues. It takes a long time (about 2 minutes) to generate single meta-review. This is an unacceptable performance statistics for Expertiza. We propose to re-factor code and identify the areas that affect the overall performance of the system. Few areas we identified in preliminary review are:

  • Reading seed data from csv in each pass takes up a lot of time. We can move this data into Mysql and use ActiveRecords to speed data fetch.
  • WordNet based semantic matching takes a lot of time. We will review the method used and present our finding about areas of concern.

1. Refactor Code

  1. Efficient Loop constructs.
    Description: Many loops over models are implemented using generic “for” loops. Solution: As specified by Ruby guideline, we plan to use efficient ruby loops, such as “each” and “find_each”.
  2. Very large methods
    Description: Several methods have huge amount of code, which makes them difficult to understand and debug. Solution: In most cases, large methods can be shortened through the use of smaller helper methods. Such methods could be reused across different components.
  3. Ambiguous method names
    Description: Many methods have ambiguity between the name used for them and the feature implemented by them. Solution: We will rename such methods to clearly state the feature implemented by them.
  4. Legacy Code
    Description: As the system has been modified for bug fixes and enhancements, unnecessary code has accumulated. Solution: Isolate and remove all dead code.
  5. Code beautification
    Description: Coding style used in gem is not based on Ruby on Rails style, which makes it difficult to read for any Ruby programmer. Solution: Beautify the code with a consistent standard of documentation, and style.

2. Upgrade system to use latest dependent ruby gems

The libraries used by gem are very old. We plan to migrate the dependent libraries to their latest versions.
Libraries, we have identified are:

We will also migrate the project to use Java 8<ref>http://java.com/en/download/whatis_java.jsp</ref>.

3. Migrate gem to Web service

Expertiza system tries to evaluate each review using an automated meta-review system. This system is packaged as a library and used by Expertiza. Automated Metareview system is an independent entity and can be used by other peer review systems as well. There are many other peer review systems, which can benefit from this system, if this is available for them to evaluate their rubrics. We are working on migrating this system from a library to a web service.

3.1 Design

The Metareview system is Natural Language Processing based system that compares the reviews written with the original article. The webservice will expose the "AutomatedMetareview" method.

Web Service Design


The request JSON object to the method will have the following parameters :

  • original article
  • review written for this article
  • rubric used during article review.

The web service will return the meta-review as a JSON object. The response JSON object will have the parameters mentioned below:

  • plagiarism
  • relevance
  • content_summative
  • content_problem
  • content_advisory
  • coverage
  • tone_positive
  • tone_negative
  • tone_neutral
  • quantity
Interaction between Client and Web Service

3.2 Assumptions

For the project, the code that is being modified is assumed to be correct and meet all feature requirements of the system. Interactions modified due to refactoring will not change the underline system definitions.

4. Testing

We will be using the existing test suite used by gem to test any new code modification. We will be writing new test cases for web service implementation and any new public method exposed by existing classes.

References

<references/>