Expertiza_Wiki - User contributions [en]

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:55:06Z

Eleill: /* Automated Testing within Rails */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagrams ===
Typical overall system operation:
 
[[File:Simicheck_API_-_SimiCheck.png]]
 
Class heirarchy in the fetchers:
 
[[File:SimiCheck_Fetchers.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "SimiCheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== Automated Testing within Rails ===
We wrote unit tests for the new functionality that we implemented, this included models, helpers, SimiCheck logic etc.. In order to properly unit test we mocked all interfaces and black box tested the new functionality. Our test cases can be found in the following locations:
*spec/models/website_fetcher_spec.rb

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:51:25Z

Eleill: /* Testing Strategy */

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:51:01Z

Eleill: /* User Interface */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagrams ===
Typical overall system operation:
 
[[File:Simicheck_API_-_SimiCheck.png]]
 
Class heirarchy in the fetchers:
 
[[File:SimiCheck_Fetchers.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "SimiCheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== Automated Testing within Rails ===
We wrote unit tests for the new functionality that we implemented, this included models, helpers, simicheck logic etc.. In order to properly unit test we mocked all interfaces and black box tested the new functionality. Our test cases can be found in the following locations:
 spec/models/website_fetcher_spec.rb

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:50:33Z

Eleill: /* API Testing during Development */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagrams ===
Typical overall system operation:
 
[[File:Simicheck_API_-_SimiCheck.png]]
 
Class heirarchy in the fetchers:
 
[[File:SimiCheck_Fetchers.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== Automated Testing within Rails ===
We wrote unit tests for the new functionality that we implemented, this included models, helpers, simicheck logic etc.. In order to properly unit test we mocked all interfaces and black box tested the new functionality. Our test cases can be found in the following locations:
 spec/models/website_fetcher_spec.rb

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:45:34Z

Eleill: /* Diagrams */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagrams ===
Typical overall system operation:
 
[[File:Simicheck_API_-_SimiCheck.png]]
 
Class heirarchy in the fetchers:
 
[[File:SimiCheck_Fetchers.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:45:11Z

Eleill: /* Diagram */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagrams ===
Typical overall system operation:
 
[[File:Simicheck_API_-_SimiCheck.png]]

Class heirarchy in the fetchers:
[[File:SimiCheck_Fetchers.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

File:SimiCheck Fetchers.png

2017-04-30T03:43:54Z

Eleill:

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:34:15Z

Eleill: /* Code Sample(s) */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:32:20Z

Eleill: /* Scheduled task expires, hook is called */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample(s) ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:29:22Z

Eleill: /* Scheduled task expires, hook is called */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample(s) ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:28:25Z

Eleill: /* Code Sample(s) */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample(s) ===
====Scheduled task expires, hook is called====
The following code was added to app/mailers/delayed_mailer.rb/perform:
<code>
if (self.deadline_type == "compare_files_with_simicheck")
perform_simicheck_comparisons(self.assignment_id)
end
</code>

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:20:12Z

Eleill: /* Task Triggering */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
In Expertiza there already existed functionality to schedule or queue tasks for the task system. We have hooked into that system by adding a new task type declared as "compare_files_with_simicheck" and then providing the correct date/time configuration. When a task deadline occurs, there is a method that invokes logic based on the task type. Once this task type is detected on a scheduled task, the SimiCheck comparison is initiated.

=== Code Sample(s) ===
''Coming soon...''

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:10:52Z

Eleill: /* Potential Hurdles */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task; we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:07:54Z

Eleill: /* Task Triggering */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task; we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T03:07:03Z

Eleill: /* User Interface */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI has been modified to contain 2 new select boxes. These select boxes determine how long to delay the Plagiarism Checker after an assignment's due date, and on what similarity percent to filter the Plagiarism Checker Comparison results.

After the results have been aggregated they can be viewed in a results report page. This report includes the submission names, the responsible teams, the similarity percentage, and a link to view the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last two fields on the New Assignment page say "SimiCheck Delay" and "SimiCheck Similarity Threshold". In "Simicheck Delay", select a value between 0 and 100 to enable the Plagiarism Checker. In "SimiCheck Similarity Threshold", select a percentage value to filter the Plagiarism Checker Comparison results. The percentage refers to the percent of same text between two documents.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:59:14Z

Eleill: /* PlagiarismCheckerComparison */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, the Team IDs, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:57:52Z

Eleill: /* Assignment - SimiCheck delay added */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay and threshold added ====
We added "simicheck" and "simicheck_threshold" properties to the the existing Assignment model.

The "simicheck" property accommodates the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "simicheck" is -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

The "simicheck_threshold" property is a percentage that filters the Plagiarism Checker's Similarity results. The threshold refers to the percentage of text that is the same between two documents.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:53:05Z

Eleill: /* Expertiza Modifications */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current New Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:51:17Z

Eleill: /* Background */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and was not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current new Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:50:21Z

Eleill: /* Background */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. The completed code from previous projects did not clearly demonstrate successful integration with SimiCheck from Expertiza, and were not deemed production worthy. Based on this feedback, we started our development from scratch and utilized the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current new Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:46:13Z

Eleill: /* Expertiza Modifications */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
The majority of the updates are handled in new background tasks. Therefore, there weren't many modifications to existing Expertiza files. The current new Assignment interface has two new configuration parameters, which have also been added to the Assignment model. SimiCheck Delay (hours) and SimiCheck Similarity Threshold (percentage) were added. The Plagiarism Comparison Report was added to the "Review Report" interface's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-30T02:41:45Z

Eleill: /* Diagram */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_SimiCheck.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

File:Simicheck API - SimiCheck.png

2017-04-30T02:40:48Z

Eleill:

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:35:36Z

Eleill: /* User Interface */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:

[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:35:15Z

Eleill: /* User Interface */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:
 
[[File:SimiCheck_View_Mockup.png]]

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:34:46Z

Eleill: /* Expertiza Modifications */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:34:07Z

Eleill: /* User Interface */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
The current assignment configuration UI will be modified to contain another field where an integer can be selected. This will configure how long to wait after an assignment's due date before running the comparison task.

After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. The Plagiarism Checker Report UI looks similar to this:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:31:27Z

Eleill: /* Assignment */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment - SimiCheck delay added ====
We added a "simicheck" property to the the existing Assignment model. This willl accommodate the number of hours to delay the execution of the Plagiarism Checker after the assignment's due date. "Simicheck" will be -1 if there is no Plagiarism Checker scheduled, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T20:05:17Z

Eleill: /* Expertiza Modifications */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

To view the interface changes, login to Expertiza as an instructor; navigate to Manage... Assignments. Click "New public/private assignment". The last field on the New Assignment page says "SimiCheck Hour Delay". Enter a value between 0 and 100 to enable the Plagiarism Checker.

After the assignment ends and the delay period has passed, you can view the Plagiarism Report. Click the "View review report" icon containing a magnifying glass and two people (in the third row of per-assignment icons). Select "Plagiarism Checker Report" from the select box, and click "Submit". If there is any plagiarism to report, it will load.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment ====
To accommodate the number of hours to wait after the assignment due date, before executing the Plagiarism Checking Comparison task, we added a "simicheck" property to the the existing Assignment model. "Simicheck" will be -1 if there is no Plagiarism Checking, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

File:SimiCheck View Mockup.png

2017-04-29T19:37:10Z

Eleill: uploaded a new version of "File:SimiCheck View Mockup.png"

First version of the SimiCheck UI

File:SimiCheck View Mockup.png

2017-04-29T19:33:57Z

Eleill: uploaded a new version of "File:SimiCheck View Mockup.png"

First version of the SimiCheck UI

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T06:18:25Z

Eleill: /* AssignmentComparisonWait */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== Assignment ====
To accommodate the number of hours to wait after the assignment due date, before executing the Plagiarism Checking Comparison task, we added a "simicheck" property to the the existing Assignment model. "Simicheck" will be -1 if there is no Plagiarism Checking, and between 0 and 100 (hours) if the assignment is to have a Plagiarism Checker Report.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T06:12:53Z

Eleill: /* AssignmentComparisonWait */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as its foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T05:42:04Z

Eleill: /* Expertiza Modifications */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a SimiCheck Delay parameter. The Plagiarism Comparison Report will be added to the "Review Report" ui's select box for a selected assignment.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T05:35:42Z

Eleill: /* Requirements */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** We will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** We will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a comparison wait parameter. New links will need to be added to current assignment status view to enable navigation to the new comparison report page.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-29T05:32:56Z

Eleill: /* Problem Statement */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html SimiCheck]. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. .

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** Will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** Will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a comparison wait parameter. New links will need to be added to current assignment status view to enable navigation to the new comparison report page.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-28T20:56:19Z

Eleill: /* Requirements */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called SimiCheck. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. More information on the SimiCheck API can be found [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html here].

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** Will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** Will edit view/review_mapping/response_report.html.haml to include a new PlagiarismCheckerReport type and point to the plagiarism_checker_report partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a comparison wait parameter. New links will need to be added to current assignment status view to enable navigation to the new comparison report page.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-28T20:55:38Z

Eleill: /* Requirements */

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called SimiCheck. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. More information on the SimiCheck API can be found [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html here].

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** Will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** Will edit view/review_mapping/response_report.html.haml to include an new PlagiarismCheckerReport type and pointer to the new partial.
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a comparison wait parameter. New links will need to be added to current assignment status view to enable navigation to the new comparison report page.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-28T12:14:50Z

Eleill:

Team BEND - Bradford Ingersoll, Erika Eill, Nephi Grant, David Gutierrez

== Problem Statement ==
Given that submissions to Expertiza are digital in nature, the opportunity exists to utilize tools that automate plagiarism validation. One of these such tools is called SimiCheck. SimiCheck has a web service API that can be used to compare documents and check them for plagiarism. The goal of this project is to integrate the SimiCheck API into Expertiza in order to allow for an automated plagiarism check to take place once a submission is closed. More information on the SimiCheck API can be found [https://simicheck.com/simicheck/static/swagger-ui/dist/index.html here].

== Background ==
This project has been worked on before in previous semesters. However, the outcomes from these projects did not clearly demonstrate successful integration with SimiCheck from Expertiza and were not deemed production worthy. As a result, part of our project involves researching the previous submission and learning from it. In addition, based on this feedback so far we have decided to start our development from scratch and utilize the previous project as a resource for lessons learned.

== Requirements ==
* Create a scheduled background task that starts a configurable amount of time after the due date of an assignment has passed
* The scheduled task should do the following:
** Fetch the submission content using links provided by the student in Expertiza from only these sources:
*** External website or MediaWiki
**** GET request to the URL then strip HTML from the content
*** Google Doc (not sheet or slides)
**** Google Drive API
*** GitHub project (not pull requests)
**** GitHub API, will only use the student or group’s changes
** Categorize the submission content as either a text submission or source code
** Convert the submission content to raw text format to facilitate comparison
** Use the SimiCheck API to check similarity among the assignment’s submissions
*** Notify the instructor that a comparison has started
*** Send the content of the submissions for each submission category
**** We will experiment with how many documents to send at a time
**** Note that each file is limited in size to 1 MB by SimiCheck
*** Wait for the SimiCheck comparison process to complete
**** We will provide SimiCheck with a callback URL to notify when the comparison is complete, however if this doesn’t work well will revert to polling
**** Will also provide an “Update Status” link to manually poll comparison status
*** Notify the assignment’s instructor that a comparison has been completed
* Visualize the results in report
** Will create this file: view/review_mapping/response_report.html.erb
** Since SimiCheck has already implemented file diffs, links will be provided in the Expertiza view that lead to the SimiCheck website for each file combination
** Comparison results for each category will be displayed within Expertiza in a table
*** Each row is a file combination with similiarity, file names, team names, and diff link
*** Sorted in descending similarity
*** Uses available data from the SimiCheck API’s summary report

== Expertiza Modifications ==
Due to the fact that the majority of the updates will be a handled in a new background task, there won't be many modifications to files in Expertiza that currently exist. The current assignment configuration view will need to be expanded to add a configuration parameter and the Assignment model will need to be expanded to include a comparison wait parameter. New links will need to be added to current assignment status view to enable navigation to the new comparison report page.

== Design ==
=== Pattern(s) ===
Given that this project revolves around integration with several web services, our team is planning to follow the [https://en.wikipedia.org/wiki/Facade_pattern facade design pattern] to allow Expertiza to make REST requests to several APIs include SimiCheck, GitHub, Google Drive, etc. This pattern is commonly used to abstract complex functionalities into a simpler interface for use as well as encapsulate API changes to prevent having to update application code if the service changes or a different one is used. We feel this is appropriate based on the requirements because we can create an easy-to-use interface within Expertiza that hides the actual API integration behind the scenes. With this in place, current and future Expertiza developers can use our simplified functionality without needing to understand the miniscule details of the API’s operation.

=== Model(s) ===
''There is no need to store the raw content sent to SimiCheck.

==== AssignmentComparisonWait ====
This model "belongs_to" the current Assignment model that already exists. It will contain as it's foreign key an assignment as well as a field for how many hours to wait after the assignment due date before executing the comparison task.

==== PlagiarismCheckerComparison ====
This model '''belongs_to''' the PlagiarismCheckerAssignmentSubmission model. It stores the file IDs returned from SimiCheck, the percent similarity between them, and a URL to a detailed comparison (diff).

==== PlagiarismCheckerAssignmentSubmission ====
This model '''has_many''' PlagiarismCheckerComparison models. It represents the results of the comparison among submissions for the assignment. As such it will contain all of the relevant fields that are shown in the view as described in the requirements.

=== Diagram ===
Typical overall system operation is shown here:
 
[[File:Simicheck_API_-_Page_1.png]]

=== User Interface ===
In order to be able to configure how long to wait to run the comparison task the current assignment configuration UI will be modified to contain another field where an integer can be input.
After the results have been aggregated they will be able to be viewed via a results report page. This report will include the submissions, their corresponding teams, and the similarity results. 
Here is a mock up of our proposed UI:
 
[[File:SimiCheck_View_Mockup.png]]

=== Task Triggering ===
The majority of this project is implemented as a background task and as such we can think of it as a separate application or process apart from Expertiza. Ideally this process will be running in another thread to allow for concurrency. We will be exploring during development different options for handling the task with respect to languages and trigger methodologies. For languages we will be exploring Ruby, Python, Node, and potentially others. From a trigger methodology standpoint we will explore both polling and/or event based triggering.

=== Code Sample(s) ===
''Coming soon...''

=== Potential Hurdles ===
GiHub has rate limits with respect to how many requests we are allowed to make per auth token per hour. According to GitHub's documentation we are allowed 5000 authenticated requests per hour with an additional 60 requests per hour for unauthenticated requests. This documentation can be found [https://developer.github.com/v3/#rate-limiting here]. Due to the fact that this is a background task and immediate results are not required(will likely be an overnight task), we will ensure that if the need arises to exceed 5000 requests per hour that we will delay the subsequent quests until the next hour. This should satisfy the number of checks that need to occur within a reasonable time frame.

== Testing Strategy ==
=== API Testing during Development ===
API testing can be done outside of our code using applications like [https://www.getpostman.com/ Postman]. We will use this to determine which API calls to use and what our expected output will be. In addition, this will lay the groundwork for testing within Rails for the direct API calls that the Expertiza server will do.

=== Automated Testing within Rails ===
We plan on writing unit tests for any controllers and models that we create. High level testing of the controllers, models, and view together will be done with Capybara. The tests we write will be alongside the existing test framework in Expertiza.

CSC/ECE 517 Spring 2016/E1738 Integrate Simicheck Web Service

2017-04-06T21:25:30Z

Eleill:

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T21:16:53Z

Eleill: /* To Run The App Locally */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human-readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing. All unit tests are in tests.py.

{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Test Purpose
! Function Tested
! Parameters
|-
| Add a record to a database
| db.add
| params[:self, :test_file, :platform, :builder, :number, :fail_date]
|-
| Delete a record from database
| db.remove
| params[:test_file]
|-
| Record a new Intermittent failure
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
| Query the Intermittent failure records
| handlers.query
| params[:db, :test_file]
|-
| Record a new Intermittent failure, test invalid values - 4 tests for blanks for each input item
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
|}

====Running Unit Tests and the App====

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo].

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* To launch the app, go to http://localhost:5000

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T21:16:21Z

Eleill: /* Running Unit Tests and the App */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human-readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing. All unit tests are in tests.py.

{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Test Purpose
! Function Tested
! Parameters
|-
| Add a record to a database
| db.add
| params[:self, :test_file, :platform, :builder, :number, :fail_date]
|-
| Delete a record from database
| db.remove
| params[:test_file]
|-
| Record a new Intermittent failure
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
| Query the Intermittent failure records
| handlers.query
| params[:db, :test_file]
|-
| Record a new Intermittent failure, test invalid values - 4 tests for blanks for each input item
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
|}

====Running Unit Tests and the App====

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo].

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T21:16:07Z

Eleill: /* Datastore */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human-readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing. All unit tests are in tests.py.

{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Test Purpose
! Function Tested
! Parameters
|-
| Add a record to a database
| db.add
| params[:self, :test_file, :platform, :builder, :number, :fail_date]
|-
| Delete a record from database
| db.remove
| params[:test_file]
|-
| Record a new Intermittent failure
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
| Query the Intermittent failure records
| handlers.query
| params[:db, :test_file]
|-
| Record a new Intermittent failure, test invalid values - 4 tests for blanks for each input item
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
|}

====Running Unit Tests and the App====

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T15:49:43Z

Eleill: /* Unit Testing */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing. All unit tests are in tests.py.

{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Test Purpose
! Function Tested
! Parameters
|-
| Add a record to a database
| db.add
| params[:self, :test_file, :platform, :builder, :number, :fail_date]
|-
| Delete a record from database
| db.remove
| params[:test_file]
|-
| Record a new Intermittent failure
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
| Query the Intermittent failure records
| handlers.query
| params[:db, :test_file]
|-
| Record a new Intermittent failure, test invalid values - 4 tests for blanks for each input item
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
|}

====Running Unit Tests and the App====

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T14:43:55Z

Eleill: /* Test Plan */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing. All unit tests are in tests.py.

{| class="wikitable"
! colspan="3" | Unit Test Summary
|-
! Test Purpose
! Functional Tested
! Parameters
|-
| Add a record to a database
| db.add
| params[:self, :test_file, :platform, :builder, :number, :fail_date]
|-
| Delete a record from database
| db.remove
| params[:test_file]
|-
| Record a new Intermittent failure
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
| Query the Intermittent failure records
| handlers.query
| params[:db, :test_file]
|-
| Record a new Intermittent failure, test invalid values - 4 tests for blanks for each input item
| handlers.record
| params[:db, :test_file, :platform, :builder, :number]
|-
|}

====Running Unit Tests and the App====

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T13:52:47Z

Eleill: /* Submission/Pull Requests */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing.

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. In the interim, we shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T13:51:37Z

Eleill: /* Submission/Pull Requests */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing.

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub. We shared our repo with Josh, whose reply was "this looks really great! Thanks for tackling it!"

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T02:48:36Z

Eleill: /* Test Plan */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The Unit Tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename, the retrieval of a record, and the assertion that a record will not be added if any of the record parameters (test_file, platform, builder, number) is missing.

Before attempting either of the following, clone the [https://github.com/adamw17/csc517ossproject repo]

=====To Run Unit Tests=====
* In the cloned repo folder, use the command <code>python test.py</code>

=====To Run The App Locally=====
* In the cloned repo folder, use the command <code>python -m flask_server</code>
* Go to http://localhost:5000 to launch the app

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub.

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-30T02:32:36Z

Eleill: /* Functional Testing */

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

=== Functional Testing ===
As a convenience to the testers included in this code base is a set of [http://csc517oss.zachncst.com/ testing web applications] and is only for illustrating the project's functionality.
This simple set of forms allow a tester to exercise the functionality of the [https://en.wikipedia.org/wiki/Representational_state_transfer REST] endpoints without having to write any REST code.
The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.
All usable for thorough integration testing.

===Unit Testing===
The unit tests included in the code exercise the major functions of this system. The tests exercise the addition of a record into the database, the removal of a record given a filename and finally the retrieval of a record.

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub.

CSC/ECE 517 Spring 2017/OSS M1706 Tracking intermittent test failures over time

2017-03-29T18:52:21Z

Eleill:

== Introduction ==
This wiki provides details on new functionality programmed for the Servo OSS project.

===Background===
"[https://github.com/servo/servo/wiki/Design Servo] is a project to develop a new Web browser engine. Our goal is to create an architecture that takes advantage of parallelism at many levels while eliminating common sources of bugs and security vulnerabilities associated with incorrect memory management and data races." Servo can be used through Browser.html, embedded in a website, or natively in Mozilla Firefox. It is designed to load web pages more efficiently and more securely.

===Motivation===
This project is a request from the Servo OSS project to reduce the impact intermittent test failures have on the software. The [https://github.com/servo/servo/wiki/Tracking-intermittent-failures-over-time-project request] made is for a [http://flask.pocoo.org/docs/0.12/ Flask] service using [https://en.wikipedia.org/wiki/Python_(programming_language) Python 2.7]. The intermittent test failure tracker stores information regarding a test that fails intermittently and also provides means to quickly query for tests that have failed.

===Tasks===
The intermittent test failure tracker initial steps (for the OSS project) include:
* Build a Flask service
* Use a JSON file to store information
* Record required parameters: Test file, platform, test machine (builder), and related GitHub pull request number
* Query the store results given a particular test file name
* Use the known intermittent issue tracker as an example of a Simple flask server

Subsequent steps (for the final project) include:
* Add ability to query the service by a date range, to find out which were occurred the most often
* Build an HTML front-end to the service that queries using JS and reports the results
** Links to GitHub
** Sorting
* Make [https://github.com/servo/servo/blob/master/python/servo/testing_commands.py#L508-L574 filter-intermittents] command record a separate failure for each intermittent failure encountered
* Propogate the required information for recording failures in [https://github.com/servo/saltfs/issues/597 saltfs]

== Design ==

===Design Pattern===

The Servo and this project's code follow a [https://en.wikipedia.org/wiki/Service_layers_pattern Service Layer] design pattern. This design pattern breaks up functionality into smaller "services" and applies the services to the topmost "layer" of the project for which they are needed.

===Application Flow===

==== Saving a Test ====
The Servo build agent calls a webhook (a way for an app to provide other applications with real-time information) inside the test tracker. The webhook then calls a handler that contains any business logic necessary to transform the request. Finally the handler persists the request into the db, in this case a json file. This flow can be seen in the graph below.

<pre>
+---------------------------------------------+
| Intermittent Test Failure Tracker |
| |
+--------------+ | +-----------+ +---------+ +------+ |
| | | | | | | | | | +--------+
| Servo | | | | | | | | | | |
| Build +------> webhook +------> handler +----> db +---------> json |
| Server | | | | | | | | | | file |
| | | | | | | | | | | |
+--------------+ | +-----------+ +---------+ +------+ | +--------+
| |
+---------------------------------------------+

</pre>

== Implementation ==
The implementation is entirely influenced by the request, the Servo team clearly defines what the service should do and how it would be made.

=== Data model ===
The model for an intermittent test is defined mostly by the request with a few additions to help with querying in later steps of the OSS request.

{| class="wikitable"
|-
! Name
! Type
! Description
|-
| test_file
| String
| Name of the intermittent test file
|-
| platform
| String
| Platform the test failed on
|-
| builder
| String
| The test machine (builder) the test failed on
|-
| number
| Integer
| The GitHub pull request number
|-
| fail_date
| ISO date (String)
| Date of the failure
|}
=== Datastore ===
To store the intermittent test failures, a library called [https://tinydb.readthedocs.io/en/latest/ TinyDB] is used. This library is a native python library that provides convenient [https://en.wikipedia.org/wiki/SQL SQL] command like helpers around a [https://www.w3schools.com/js/js_json_syntax.asp JSON] file to more easily use it like a database. The format of the JSON file is simply an array of JSON objects, making the file easily human readable.

=== Flask Service ===
[http://flask.pocoo.org/ Flask] is a [https://en.wikipedia.org/wiki/Microservices microservice] framework written in Python. A flask service is a REST (representational state transfer) API that maps URL and HTTP verbs to python functions. Some basic examples of flask routes:

@app.route('/')
def index():
return 'Index page'

@app.route('/user/<username>')
def show_user(username):
return db.lookup(username)

The first method returns 'index page' at the root URL. The second method accepts a URL param after user and returns the user from a database.

== Test Plan ==

The sample [http://csc517oss.zachncst.com/ index page] is for illustrating the project's functionality only. The links on the page lead to demonstrations of the query and record handlers, as well as a display of the JSON file containing all the Intermittent Test Failure records.

== Submission/Pull Requests ==

There is no Pull Request because Servo manager Josh Matthews requested that we start a new (non-branched) repository for this project. The work has been started in a new GitHub repo located [https://github.com/adamw17/csc517ossproject/tree/832969c1cf01d94be340731c744854c25fdbb441 here]. When Servo developers are ready, the project will be pulled in to the Servo project on GitHub.