CSC/ECE 517 Fall 2020 - SQLFE. Refactor Submission.java

From Expertiza_Wiki
Revision as of 19:41, 15 November 2020 by Spal3 (talk | contribs) (→‎Code updates)
Jump to navigation Jump to search

Introduction

The SQLFE (SQL File Evaluation) open source software system automatically grades a set of files containing SQL queries. This project, written in Java, focuses on redesigning and rewriting one large method that parses one student submission file and builds a data structure for later evaluation.

About SQLFE

SQLFE is an open source software tool to assist computer science instructors by automatically grading assignments or tests involving multiple submitted files consisting of SQL queries added to a template submission file. Developed by Paul Wagner and others at the University of Wisconsin – Eau Claire, SQLFE allows instructors to specify a weighted set of tests for each assignment question, which are then used to evaluate a submitted query and/or compare that submitted query against an instructor-supplied solution query.
SQLFE also generates a detailed output file for each submission, which can be returned as feedback. SQLFE reduces instructor grading time for SQL queries and allows quicker, more detailed feedback as compared to hand grading. SQLFE source code, sample submission and instructor-generated files, and documentation can be found at https://github.com/wagnerpj42/SQL-File-Evaluation .
SQLFE is written entirely in Java. Since SQLFE executes SQL queries as part of its evaluation, it must be connected to an SQL database management system (DBMS). Currently Oracle and MySQL DBMSs are supported.
Here is what the home page of SQLFE looks like:
On the left side are the settings to connect to a DB server of your choice, either Oracle or MySQL. The right side menu is used to select which student submissions to read.

Team Members

Mentor: Dr. Paul Wagner
  1. Jack Maccdonald (jmmacdo4)
  2. Nick Garner (nrgarner)
  3. Sumitosh Pal (spal3)
  4. Abhishek Gupta (agupta38)

Problem Statement

Submission.java is the main file in SQLFE responsible for parsing student submissions. As the capabilities of SQLFE have increased, so has the complexity and responsibilities of the readSubmission method. At over 200 LoC, this method has become bloated and cumbersome. The challenge for this team is to refactor readSubmission and Submission.java to maintain proper functionality with proper code structure that will lend itself to future development.

  1. Restructure and rewrite the readSubmission() method in the Submission.java class, adding any new sub-methods desired, to accurately and efficiently parse any student submission file that follows the assignment instructions.
    • Each question number and submitted SQL query for that question should be placed in a new QuestionAnswer object in the class-level answers arraylist variable.
    • Any user SQL comments (either -- single line SQL comments or /* */ multi-line SQL comments) should be written out to the AAA_student_comments.out file through the commWriter object. Each user SQL comment line may be written out separately, though if possible it would be fine to write out any user SQL comment as a whole, especially for multi-line /* */ comments.
    • Any parsing problems that cannot be handled should be identified and written out to the AAA_parse_problems.out file through the parseWriter object.
  2. Optionally use utility methods in the Utilities.java class without modifing or removing any existing methods in Utilities.java as they are used elsewhere. Any new or modified methods should be given new method names if they remain in the Utilities.java class file.
  3. Place sub-methods created as part of the new readSubmission() method functionality in either Submission.java or Utilities.java as design logic dictates.
  4. Good functionality and good code commenting is expected.
    • Optionally provide suggestions for improving the Submission class overall.
    • Report any issues or possible improvements in working on this project by communicating them to Paul Wagner

Initial Approach

  • Make use of the BufferedReader class to parse input files on a per-character basis. This will allow us to detect things like comment syntax, improper formatting of solutions, and many other criteria to control the parsing exactly as desired by the project mentor.
  • Use a Finite State Machine to build submissions. One of the challenges with any parsing method is the variety of input available that can create tricky edge cases. We intend to help control this with a well defined finite state machine to keep track of what sort of text we are currently parsing. For example, if we parse the opening to a comment, we will treat all subsequent text as a comment until we parse a comment close. This will also allow us to define discrete transition methods and states to drastically reduce the current readSubmission method's code complexity.
  • Abstract commonly used code to helper methods in Utility.java. We will be looking closely for opportunities to abstract out commonly used to code to helper functions. These can be used in conjunction with the aforementioned boolean flags to help parse large chunks of input depending on surrounding syntax.
  • Create unit and user tests for Submission.java. Currently, only the classes dealing with grading have attached unit tests. We will be creating unit tests as well as user tests through the UI to help ensure that our refactor of readSubmission is working properly. More details below in Test Plan.

Rationale

Most of the requirements from Dr. Wagner are related to making the readSubmission() method in Submission.java modular i.e refactor/split the function accordingly to meet standard function requirements. These can involve using helper functions from Utilities.java as well. Secondary requirements are handling several scenarios in the submission document that might appear due to students’ mistakes or other factors.

Implementation Strategy

Initially, we would fork the repository and create another branch where we will commit our changes. Dr Wagner has agreed to add us team members as collaborators for open source SQLFE, so we can create a branch and add changes to it. We also intend to add unit tests using JUnit4. After all the changes have been reviewed we would merge the changes back to main/master.

Changes introduced

  1. The readSumissions function contains 6 new functions defined within Submission.java and 3 new functions defined within Utilities.java. The function now follows the SOLID and DRY principles which are achieved by doing the following.
    1. Extracting lines of code into the methods getFileMetadata and reachFirstQuestion to make them adhere to single responsibility.
    2. Extracting the regex code that makes comparison for a new question in method Utilities.isQuestionFound and achieving interface segregation and DRY.
  2. The method skipExtraQueries and getQuestionNumber are added in Utilities.java to parsing anomalies in file such as extra statements before first question and add pattern support for special characters('.' and ')') before a question.
  3. The methods getAnswerQuery and ParseComments replace the lines of code which optimize file reading using buffered reader, improving performance and also improving code readability.
  4. The file SubmissionTests.java is added in JUnit folder to test the new readsubmissions method and provides a 100 percent coverage.
  5. Travis CI is created for the original open source repo for providing feature to run build and provide coverage whenever a commit is pushed into the repository.

Implementation

Flowchart

The diagram below shows the overall flow of the SQLFE java application from starting the application to closing the window. The part highlighted in red contains the section that our project will focus upon primarily.

Finite State Machine

The diagram below shows our plan for controlling the flow of parsing in readSubmission. We will be using states with controlled transitions to track where we are in the submission and to build valid QuestionAnswer objects to interact with the grading utility.

Code updates

The main file that needs to be refactored is Submission.java. You can see it highlighted below. You can also see in the reference github link


It has a function public void readSubmission(String submissionFileName, PrintWriter commWriter, PrintWriter parseWriter) which is the main work horse but is around 280 lines long. Below is snippet of the function

You can also view the complete function in the reference github link

The initial approach is to divide the code logic into functions to give it more modularity. Below you could see many line of logic where SQL query is being read from students answer is refactored into a separate function. This is not yet final but still will give reader an idea of what we are planning to do. In future as we move forward this block would be modified accordingly

Test Plan

Currently, our main focus with testing is the creation of unit and user tests for the Submission.java class. This will involve creating test submission files with edge case text in them to try and break the parser.

Examples include:

  • Student answers inline with instructor comments
  • Student comments written with instructor comment syntax
  • Student answers with no terminating semicolon
  • Student answers with inline comments
  • Student comments with no termination
  • Non-ASCII characters
  • Submissions with question numbering different from the assignment

Here are some simple unit test examples. The first tests a simple multi line join command. The second one makes sure we pick up a more complicated command with different indentation and a nested cross join.

QuestionAnswer a = answers.get(0);
assertEquals("1", a.getQNumStr());
   
String qString = "SELECT CustID, FName, LName, AccClosedDate\n" +
      "FROM Customer C\n" +
      "JOIN Account A ON (C.CustID = A.Customer)\n" +
      "WHERE A.AccOpenLocation = 'Central' AND A.AccClosedDate >= '01-MAR-2017'";
assertEquals(qString, a.getActualQuery().toString());
   //Test cross join
   
a = answers.get(5);
qString = "SELECT CustID, FName, LName\n" +
    "FROM Customer\n" +
    "WHERE CustID IN(\n" +
    "    SELECT CustID\n" +
    "    FROM Customer C1\n" +
    "    CROSS JOIN Account A1\n" +
    "    MINUS\n" +
    "    SELECT CustID\n" +
    "    FROM Customer C2\n" +
    "    JOIN Account A2 ON (C2.CustID = A2.Customer)\n" +
    "    WHERE AccStatus = 'Active')";
assertEquals(qString, a.getActualQuery().toString());

In addition, we will be devising a series of blackbox user tests to ensure that our updates to readSubmission do not break any functionality in the UI. This will involve users attempting use cases such as creating an assignment, grading a single submission, and grading a batch of submissions.

Beyond testing, we will be implementing some repository features to help improve the project’s accessibility for Open Source work. This includes setting up continuous integration features like TravisCI and Coveralls, to ensure that proposed changes do not break the build or significantly lower code coverage.

Results

Impact

  1. Refactored readSubmission from 154 LoC to 43 LoC.
  2. Abstracted commonly used behavior to maintain DRY principle by creating 7 methods which improves code maintainability.
  3. Created 2 methods to solve issue in parsing files with unusual occourences and increasing robustness of application.
  4. Improved coverage of method readSubmission to 100%.

Added Features

Continuous integration: TravisCI and Coveralls run unit tests on every commit to maintain code quality.

Proposed Future Scope

TBD

References

  1. Repository of the main project - https://github.com/wagnerpj42/SQL-File-Evaluation
  2. Project Board - https://github.com/nickrgarner/SQL-File-Evaluation/projects/1
  3. Youtube link of the screencast - TBD