CSC/ECE 517 Spring 2018- Project M1803: Implement a web page fuzzer to find rendering mismatches (Part 2)

From Expertiza_Wiki
Jump to navigation Jump to search

By Alexander Simpson(adsimps3), Abhay Soni (asoni3), Dileep badveli (dbadvel) and Jake Batty(jbatty)

Introduction

This Mozilla project was broken in to 2 main parts: the previous work and the work to be done. The previous work was finished as a part of the OSS project. The goal of this final project is to complete the work to be done. As a part of the OSS project (explained more below) we created a tool which generates random valid HTML files and automated servo. Servo is research project developed by Mozilla to "to create a new layout engine using a modern programming language". By automating servo we were able to quickly see if servo could render the randomly generated pages. Now, as a part of this project we are supposed to extend the program to also control Firefox, compare the resulting screenshots (from servo and Firefox), and finally expand upon the page generation tool.

Background

Servo is research project developed by Mozilla to "to create a new layout engine using a modern programming language".

Previous Work (Part of the OSS Project)

As per the project description, we were expected to complete the initial steps. The implementation is explained below for each of these steps.

1) In a new repository, create a program that can generate a skeleton HTML file with a doctype, head element, and body element, print the result to stdout
- Here is the link to the repository which contains code_generation.py file which will be used to generate random valid HTML files.
2) Add a module to the program that enable generating random content specific to the <head> element (such as inline CSS content inside of a <style> element) and add it to the generated output
- The file code_generation.py, contains the code which generates random content specific to the head element and adds style on top of it. As seen in this code, after generating random content to the file, we will add CSS elements on top of this content. We have established a list of commonly used styles, weights, fonts, font_styles, and alignments which will be used at random. For practical purposes, we are limiting the number of options.
3) Add a module to the program that enables generating random content specific to the <body> element (such as a

block that contains randomly generated text) and add it to the generated output

4) Generate simple random CSS that affects randomly generated content (ie. if there is an element with an id foo, generate a CSS selector like #foo that applies a style like colorto it)
5) Create a program under Servo's etc/ that launches Servo and causes it to take a screenshot of a particular URL - use this to take screenshots of pages randomly generated by the previous program
Sample Screenshot:

Work to be done

Below is a list of the tasks to be done as a part of our final project. Below each task we have described what we think it will take to complete the respective task.

1) Extend the program that controls Servo to also control Firefox using geckodriver
Task 1 is relatively simple. It just involves downloading geckodriver and running it. Geckodriver is an open source software engine that allows us to render marked content on a web browser. It should allow us to take screenshots of a particular URL just like task 5 in the previous work section.
2) Compare the resulting screenshots and report the contents of the generated page if the screenshots differ
This task involves automating Firefox to use geckodriver and the current servo program. They both will create 2 different screenshots. If servo and Firefox render it differently, we will report that file and mark the differences.
To actually make the comparisons we will use OpenCV. OpenCV will allow us to not only make comparisons but will also mark where the differences in. To actually do the comparisons we will first convert the images to grayscale and then call compare_ssim(). This will get us a score and diff values. The score variables represents how close the to images are to each other and the diff variables tells us where the differences are. If the score indicates that the images are different we will use the diff value to findContours() and then draw a rectangle around the differences.
3) Extend the page generation tool with a bunch of additional strategies, such as:
-Generating elements trees of arbitrary depth
-Generating sibling elements
-Extending the set of CSS properties that can be generated (display, background, float, padding, margin, border, etc.)
-Extending the set of elements that can be generated (span, div, header elements, table (and associated table contents), etc.)
-Randomly choose whether to generate a document in quirks mode or not
For task 3 there are several different parts, but the main goal is to increase the complexity of our randomly generated pages. The code_generation.py file will be adapted to provide these functions. The function used to add random sections will have recursion added to generate elements within elements, resulting in a tree of html elements. The amounting of nesting will be random but limited. Within the sections, tables, lists, spans, and divs may potentially be created. This will involve writing new functions for creating each element, and allowing them to be chosen at random by the function that creates sections of the HTML document. Additionally, the function which provides randomized CSS styling will be extended to generate additional properties.

Testing

Because of the nature of this project, testing the newly added features will be very simple. For task 1, once we are actually controlling Firefox using geckodriver we will know it is working. Task 2 actually involves us comparing screenshots. By running the code with images that are the same and images that are different, we will be able to test to see if the comparisons work. Additionally, we will be able to visually test that task 3 is complete (seeing that the new page generation features work). While we could probably automate some of the testing for task 2 and 3 (to test all possible scenarios) it is out of the scope of this project.

Conclusion

The previously completed work allows us to generate simple html documents with a randomized structure, render the page in Servo, and take a screenshot of the page. We plan on furthering this work by rendering the pages in both Servo and Firefox, taking screenshots of the pages within both browsers, and reporting differences between the two. Doing so will allow users to evaluate Servo’s ability to load web pages. To make this testing even more informative, we plan to increase the complexity of the structure and styling of the randomly generated html documents.