CSC/ECE 517 Fall 2020 - SQLFE. Refactor Submission.java

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

The SQLFE (SQL File Evaluation) open source software system automatically grades a set of files containing SQL queries. This project, written in Java, focuses on redesigning and rewriting one large method that parses one student submission file and builds a data structure for later evaluation.

About SQLFE

SQLFE is an open source software tool to assist computer science instructors by automatically grading assignments or tests involving multiple submitted files consisting of SQL queries added to a template submission file. Developed by Paul Wagner and others at the University of Wisconsin – Eau Claire, SQLFE allows instructors to specify a weighted set of tests for each assignment question, which are then used to evaluate a submitted query and/or compare that submitted query against an instructor-supplied solution query.
SQLFE also generates a detailed output file for each submission, which can be returned as feedback. SQLFE reduces instructor grading time for SQL queries and allows quicker, more detailed feedback as compared to hand grading. SQLFE source code, sample submission and instructor-generated files, and documentation can be found at https://github.com/wagnerpj42/SQL-File-Evaluation .
SQLFE is written entirely in Java. Since SQLFE executes SQL queries as part of its evaluation, it must be connected to an SQL database management system (DBMS). Currently Oracle and MySQL DBMSs are supported.
Here is what the home page of SQLFE looks like:
On the left side are the settings to connect to a DB server of your choice, either Oracle or MySQL. The right side menu is used to select which student submissions to read.

Team Members

Mentor: Dr. Paul Wagner
  1. Jack Maccdonald (jmmacdo4)
  2. Nick Garner (nrgarner)
  3. Sumitosh Pal (spal3)
  4. Abhishek Gupta (agupta38)

Problem Statement

Submission.java is the main file in SQLFE responsible for parsing student submissions. As the capabilities of SQLFE have increased, so has the complexity and responsibilities of the readSubmission method. At over 200 LoC, this method has become bloated and cumbersome. The challenge for this team is to refactor readSubmission and Submission.java to maintain proper functionality with proper code structure that will lend itself to future development.

  1. Restructure and rewrite the readSubmission() method in the Submission.java class, adding any new sub-methods desired, to accurately and efficiently parse any student submission file that follows the assignment instructions.
    • Each question number and submitted SQL query for that question should be placed in a new QuestionAnswer object in the class-level answers arraylist variable.
    • Any user SQL comments (either -- single line SQL comments or /* */ multi-line SQL comments) should be written out to the AAA_student_comments.out file through the commWriter object. Each user SQL comment line may be written out separately, though if possible it would be fine to write out any user SQL comment as a whole, especially for multi-line /* */ comments.
    • Any parsing problems that cannot be handled should be identified and written out to the AAA_parse_problems.out file through the parseWriter object.
  2. Optionally use utility methods in the Utilities.java class without modifing or removing any existing methods in Utilities.java as they are used elsewhere. Any new or modified methods should be given new method names if they remain in the Utilities.java class file.
  3. Place sub-methods created as part of the new readSubmission() method functionality in either Submission.java or Utilities.java as design logic dictates.
  4. Good functionality and good code commenting is expected.
    • Optionally provide suggestions for improving the Submission class overall.
    • Report any issues or possible improvements in working on this project by communicating them to Paul Wagner

General Approach

  • Make use of the BufferedReader class to parse input files on a per-character basis. This will allow us to detect things like comment syntax, improper formatting of solutions, and many other criteria to control the parsing exactly as desired by the project mentor.
  • Use a Finite State Machine to build submissions. One of the challenges with any parsing method is the variety of input available that can create tricky edge cases. We intend to help control this with a well defined finite state machine to keep track of what sort of text we are currently parsing. For example, if we parse the opening to a comment, we will treat all subsequent text as a comment until we parse a comment close. This will also allow us to define discrete transition methods and states to drastically reduce the current readSubmission method's code complexity.
  • Abstract commonly used code to helper methods in Utility.java. We will be looking closely for opportunities to abstract out commonly used to code to helper functions. These can be used in conjunction with the aforementioned boolean flags to help parse large chunks of input depending on surrounding syntax.
  • Create unit and user tests for Submission.java. Currently, only the classes dealing with grading have attached unit tests. We will be creating unit tests as well as user tests through the UI to help ensure that our refactor of readSubmission is working properly. More details below in Test Plan.

Rationale

Most of the requirements from Dr. Wagner are related to making the readSubmission() method in Submission.java modular i.e refactor/split the function accordingly to meet standard function requirements. These can involve using helper functions from Utilities.java as well. Secondary requirements are handling several scenarios in the submission document that might appear due to students’ mistakes or other factors.

Implementation Strategy

Initially, we forked the repository and created another branch where we committed our changes. Dr Wagner had agreed to add us team members as collaborators for open source SQLFE, so we were able to create a branch and add changes to it. We also added unit tests for the refactored method using JUnit4. After all the changes have been reviewed, we created a pull request to merge the changes back to main/master.

Changes introduced

  1. The readSumissions function contains 6 new functions defined within Submission.java and 3 new functions defined within Utilities.java. The function now follows the SOLID and DRY principles which are achieved by doing the following.
    1. Extracting lines of code into the methods getFileMetadata and reachFirstQuestion to make them adhere to single responsibility.
    2. Extracting the regex code that makes comparison for a new question in method Utilities.isQuestionFound and achieving interface segregation and DRY.
  2. The method skipExtraQueries and getQuestionNumber are added in Utilities.java to parsing anomalies in file such as extra statements before first question and add pattern support for special characters('.' and ')') before a question.
  3. The methods getAnswerQuery and ParseComments replace the lines of code which optimize file reading using buffered reader, improving performance and also improving code readability.
  4. The file SubmissionTests.java is added in JUnit folder to test the new readsubmissions method and provides a 100 percent coverage.
  5. Travis CI is created for the original open source repo for providing feature to run build and provide coverage whenever a commit is pushed into the repository.

Implementation

Flowchart

The diagram below shows the overall flow of the SQLFE java application from starting the application to closing the window. The part highlighted in red contains the section that our project focused upon primarily.

Finite State Machine

The diagram below shows our plan for controlling the flow of parsing in readSubmission. We used states with controlled transitions to track where we are in the submission and to build valid QuestionAnswer objects to interact with the grading utility.

Code updates

The main focus of our refactoring is Submission.java. The file changes are given below and the entire diff is available in the pull request added in references.

readSubmission method Changes

1. The general changes include removing usage of several boolean flags and declaring all called variables at the start of the function.


2. These changes are added to encapsulate the part of readSubmissions that gets the file headers present at the beginning of the line and also parses everything before reaching the first question.


3. The method isQuestionFound is used to replace redeclaration of pattern and matcher. The definition is added in Utilities.java and details are added below in this document.


4. The method getQuestionNumber replaces the code that was used to store the question number in variable qNumStr.


5. The methods below replace the lines of code that parse the comments to get the question and answer in the file and store it in QuestionAnswer arrayList. Also the method reachNewQuestion replaces the code that finds the next question.


Method Definitions Implemented


parseComments - The method definition to parse all comments using the BufferedReader instance before reaching the AnswerQuery.

Segregation of parsing user comments functionality into above method of one of the most important and critical tasks of refactoring. Diff Comparison in figure shown below highlights how some of the code from the original single monolith function/method readSubmission was refactored (Trimmed) by above new method

getanswerQuery - The method definition to get AnswerQuery from the line and return the answer as a string.

Segregation of this parsing answer query into getanswerQuery is of one of the most important task of refactoring. Diff Comparison in figure shown below highlights how some of the code from the original single monolith function/method readSubmission was refactored(Trimmed) by above new method


getFileMetadata The method definition that abstracts out the code to get the file name and the student's name.

reachFirstQuestion The method definition that abstracts out the code that parses file to reach the first question.


reachNextQuestion The method definition that is abstracting out the code that parses file for comments before reaching the next question.


writeToQuestionAnswerList The method definition that is abstracting out the code that writes the Question and Answer pair in the QuestionAnswer arraylist.


The secondary file that is refactored is Utilities.java. The file changes are below:


getQuestionNumber The method is implemented to read the line and extract the string that marks the question number using regex pattern matching.


isQuestionFound The method that is used to extract declaration of pattern match statements. This helps avoid the use of regex in the file and provides encapsulation like other helper functions.


skipExtraQueries The method is implemented to remove any extra queries or anomalies in the file such as incompatible format in instructor comment. This works by parsing the line and matching the regex for instructor comment and not saving any information that does not match the regex.

Test Plan

Currently, our main focus with testing is the creation of unit and user tests for the Submission.java class. This will involve creating test submission files with edge case text in them to try and break the parser.

Examples include:

  • Student answers inline with instructor comments
  • Student comments written with instructor comment syntax
  • Student answers with no terminating semicolon
  • Student answers with inline comments
  • Student comments with no termination
  • Non-ASCII characters
  • Submissions with question numbering different from the assignment

Here are some of our unit tests. The first tests a simple multi line join command. The second one makes sure we pick up a more complicated command with different indentation and a nested cross join.

           // Setup
           String assignmentName = "testAssignment";
           PrintWriter commWriter = new PrintWriter("./testEvaluations/commFileReadSubmissionTests.out", "UTF-8");
           PrintWriter parseWriter = new PrintWriter("./testEvaluations/parseFileReadSubmissionTests.out", "UTF-8");
           // output general comment file information
           commWriter.println("Assignment  : " + assignmentName);
           commWriter.println("");
           // output general parse file information
           parseWriter.println("Assignment  : " + assignmentName);
           parseWriter.println("");
           // Run Tests
           Submission s = new Submission();
           String folderPath = "./files-sample-MySQL/";
           String fileName = "lt_s66.sql";
           s.readSubmission(folderPath + fileName, commWriter, parseWriter);
           assertEquals(folderPath + fileName, s.getSubmissionFileName());
           ArrayList<QuestionAnswer> answers = s.getAnswers();
           assertNotNull(answers);
           assertEquals(7, answers.size());
           QuestionAnswer a = answers.get(0);
           assertEquals("1", a.getQNumStr());
           String qString = "SELECT CustID, FName, LName, AccClosedDate\n" +
                   "FROM Customer C\n" +
                   "JOIN Account A ON (C.CustID = A.Customer)\n" +
                   "WHERE A.AccOpenLocation = 'Central' AND A.AccClosedDate >= '2017-03-01'";
           assertEquals(qString, a.getActualQuery().toString());
           //Test cross join
           a = answers.get(5);
           qString = "SELECT CustID, FName, LName\n" +
                   "FROM Customer\n" +
                   "WHERE CustID NOT IN(\n" +
                   "    SELECT CustID\n" +
                   "    FROM Customer C2\n" +
                   "    JOIN Account A2 ON (C2.CustID = A2.Customer)\n" +
                   "    WHERE AccStatus = 'Active')";
           assertEquals(qString, a.getActualQuery().toString());
           //Test nonexistant file
           commWriter = new PrintWriter("./testEvaluations/commFileReadSubmissionTestsFail.out", "UTF-8");
           parseWriter = new PrintWriter("./testEvaluations/parseFileReadSubmissionTestsFail.out", "UTF-8");
           s.readSubmission("garbage", commWriter, parseWriter);
           assertEquals("Cannot find file garbage\n", errContent.toString());
       }

Additionally, we have achieved 100% line coverage in the readSubmission() method, which was the target of our refactoring.

In addition, we devised a series of blackbox user tests to ensure that our updates to readSubmission do not break any functionality in the UI. This will involve users attempting use cases such as creating an assignment, grading a single submission, and grading a batch of submissions.

Beyond testing, we will be implementing some repository features to help improve the project’s accessibility for Open Source work. This includes setting up continuous integration features like TravisCI and Coveralls, to ensure that proposed changes do not break the build or significantly lower code coverage.

Results

A small gif is provided to give a brief view of how parsing and evaluation is done after the changes are implemented. Once the application is running, it start parsing the student answer files and then moves to the evaluation state. After evaluation is done, the grade summary can be seen in the grade_summary file generated within evaluations folder.


Impact

  1. Refactored readSubmission from 154 LoC to 43 LoC.
  2. Abstracted commonly used behavior to maintain DRY principle by creating 7 methods which improves code maintainability.
  3. Created 2 methods to solve issue in parsing files with unusual occourences and increasing robustness of application.
  4. Improved coverage of method readSubmission to 100%.

Added Features

  • Maven build wrapper has been added to the project. This allows the project to be compiled and tests to be run from the command line, along with management of project dependencies. The configuration for this functionality is defined in pom.xml below, with necessary Maven dependencies added to .project. The changes in .classpath allow for the Maven build to manage dependencies defined in pom.xml automatically upon building, removing the burden from the user of manually installing jar files.



  • Travis CI support has been started for this project. When fully implemented, this continuous integration platform will run the JUnit tests in the sqlfe.junit package for every commit and pull request on the project. Configuration for the Travis build is below in .travis.yml. There is currently an outstanding bug with the Travis build preventing the test server from making a connection with the MySQL database that will have to be fixed by a future group to finish implementing this feature.
  • Codecov support has been added to the project. This service will receive test coverage information from the Travis CI build via the JaCoCo coverage library and will report on each commit and pull request the current test coverage as well as percentage increase or decrease in coverage, if applicable. The setup for this feature can be seen in Lines 22 - 40 of pom.xml and Line 15 of .travis.yml.

Proposed Future Scope

  1. Improve support for Travis CI and integrate with the main repository.
  2. Replace hard coded character matching with general regex matching for all methods in Utilites.java like skipInstructorComments, processUserComments, etc.
  3. Include option to perform JUnit testing with MySQL DB connnector which is currently exclusive for Oracle DB.
  4. Migrate the code to support the latest eclipse IDE and drivers and make JavaScript replacement for Nashorn Engine.

References

  1. Repository of the main project - https://github.com/wagnerpj42/SQL-File-Evaluation
  2. Project Board - https://github.com/nickrgarner/SQL-File-Evaluation/projects/1
  3. Link to Main Pull Request - https://github.com/wagnerpj42/SQL-File-Evaluation/pull/23
  4. Link to CI Pull Request - https://github.com/wagnerpj42/SQL-File-Evaluation/pull/24
  5. Youtube link of the screencast - https://www.youtube.com/watch?v=MGX64bZCVJs