CSC/ECE 517 Spring 2018- Project M1803: Implement a web page fuzzer to find rendering mismatches (Part 2)
By Alexander Simpson(adsimps3), Abhay Soni (asoni3), Dileep badveli (dbadvel) and Jake Batty(jbatty)
Introduction
This Mozilla project was broken in to 2 main parts: the initial and subsequent steps. The initial steps were finished as a part of the OSS project. The goal of this final project is to complete the subsequent steps. As a part of the OSS project (explained more below) we created a tool which generates random valid HTML files and automated servo. Now, as a part of this project we are supposed to extend the program to also control Firefox, compare the resulting screenshots, and expand upon the page generation tool.
Background
TODO: explain servo
Previous Work (Part of the OSS Project)
As per the project description, we were expected to complete the initial steps. The implementation is explained below for each of these steps.
- 1) In a new repository, create a program that can generate a skeleton HTML file with a doctype, head element, and body element, print the result to stdout
- - Here is the link to the repository which contains code_generation.py file which will be used to generate random valid HTML files.
- 2) Add a module to the program that enable generating random content specific to the <head> element (such as inline CSS content inside of a <style> element) and add it to the generated output
- - The file code_generation.py, contains the code which generates random content specific to the head element and adds style on top of it. As seen in this code, after generating random content to the file, we will add CSS elements on top of this content. We have established a list of commonly used styles, weights, fonts, font_styles, and alignments which will be used at random. For practical purposes, we are limiting the number of options.
- 3) Add a module to the program that enables generating random content specific to the <body> element (such as a
block that contains randomly generated text) and add it to the generated output
- 4) Generate simple random CSS that affects randomly generated content (ie. if there is an element with an id foo, generate a CSS selector like #foo that applies a style like colorto it)
- 5) Create a program under Servo's etc/ that launches Servo and causes it to take a screenshot of a particular URL - use this to take screenshots of pages randomly generated by the previous program
- Sample Screenshot:
Work to be done
Below is a list of the tasks to be done as a part of our final project. Below each task we have described what we think it will take to complete the respective task.
- 1) Extend the program that controls Servo to also control Firefox using geckodriver
- Task 1 is relatively simple. It just involves downloading geckodriver and running it. Geckodriver is an open source software engine that allows us to render marked content on a web browser. It should allow us to take screenshots of a particular URL just like task 5 in the previous work section.
- 2) Compare the resulting screenshots and report the contents of the generated page if the screenshots differ
- This task involves automating Firefox to use geckodriver and the current servo program. They both will create 2 different screenshots. If servo and Firefox render it differently, we will report that file and mark the differences.
- To actually make the comparisons we will use OpenCV. OpenCV will allow us to not only make comparisons but will also mark where the differences in. To actually do the comparisons we will first convert the images to grayscale and then call compare_ssim(). This will get us a score and diff values. The score represents how close the to images are to each other and the diff tells us where the differences are. If the score indicates that the images are different we will use the diff value to findContours() and then draw a rectangle around the differences.
- 3) Extend the page generation tool with a bunch of additional strategies, such as:
- -Generating elements trees of arbitrary depth
- -Generating sibling elements
- -Extending the set of CSS properties that can be generated (display, background, float, padding, margin, border, etc.)
- -Extending the set of elements that can be generated (span, div, header elements, table (and associated table contents), etc.)
- -Randomly choose whether to generate a document in quirks mode or not
- For task 3 there are several different parts, but the main goal is to increase the complexity of our randomly generated pages. First we will increase the tree depth to an arbitrary depth. We will then generate sibling elements and increase the CSS styling options. Finally, we will increase the amount of HTML elements that can be generated and randomly choose whether to generate a document in quirks mode or not.
- The code_generation.py file will be adapted to provide these functions. The function used to add random sections will have recursion added to generate elements within elements, resulting in a tree of html elements. The amounting of nesting will be random but limited. Within the sections, tables, lists, spans, and divs may potentially be created. This will involve writing new functions for creating each element, and allowing them to be chosen at random by the function that creates sections of the HTML document.
Testing
Because of the nature of this project, testing the newly added features will be very simple. Task 2 actually involves us comparing screenshots. By comparing the actual screenshots we will test that things work. Additionally, we will be able to visually test that task 3 is complete (seeing that the new page generation features work). While we could probably automate some of the testing for task 3 (to test all possible scenarios) it is out of the scope of this project.
Conclusion
The previously completed work allows us to generate simple html documents with a randomized structure, render the page in Servo, and take a screenshot of the page. We plan on furthering this work by rendering the pages in both Servo and Firefox, taking screenshots of the pages within both browsers, and reporting differences between the two. Doing so will allow users to evaluate Servo’s ability to load web pages. To make this testing even more informative, we plan to increase the complexity of the structure and styling of the randomly generated html documents.