CSC/ECE 517 Spring 2018- Project M1803: Implement a web page fuzzer to find rendering mismatches (Part 2)
By Alexander Simpson(adsimps3), Abhay Soni (asoni3), Dileep badveli (dbadvel) and Jake Batty(jbatty)
Introduction
This Mozilla project was broken in to 2 main parts: the previous work and the work to be done. The previous work was finished as a part of the OSS project. As a part of the OSS project (explained more below) we created a tool which generates random valid HTML files and automated servo. Servo is experimental web browser developed by Mozilla to "to create a new layout engine using a modern programming language". By automating servo (the web browser) we were able to quickly see if servo could render the randomly generated pages.
The goal of this final project was to build off the original project while adding more features. The work we completed as a part of this project is split into a couple main parts. First, we extended the program created in the OSS project to also control Firefox. By also controlling Firefox, we then had 2 screenshots of the randomly generated content - Servo and Firefox . After getting both screenshots, we then compared them using Python and OpenCV. Finally, we were able to expand upon the page generation tool to allow the randomly generated web pages to have more properties/be more complex.
Previous Work (Part of the OSS Project)
As per the project description, we were expected to complete the initial steps. The implementation is explained below for each of these steps.
- 1) In a new repository, create a program that can generate a skeleton HTML file with a doctype, head element, and body element, print the result to stdout
- - Here is the link to the repository which contains code_generation.py file which will be used to generate random valid HTML files.
- 2) Add a module to the program that enable generating random content specific to the <head> element (such as inline CSS content inside of a <style> element) and add it to the generated output
- - The file code_generation.py, contains the code which generates random content specific to the head element and adds style on top of it. As seen in this code, after generating random content to the file, we will add CSS elements on top of this content. We have established a list of commonly used styles, weights, fonts, font_styles, and alignments which will be used at random. For practical purposes, we are limiting the number of options.
- 3) Add a module to the program that enables generating random content specific to the <body> element (such as a
block that contains randomly generated text) and add it to the generated output
- 4) Generate simple random CSS that affects randomly generated content (ie. if there is an element with an id foo, generate a CSS selector like #foo that applies a style like colorto it)
- 5) Create a program under Servo's etc/ that launches Servo and causes it to take a screenshot of a particular URL - use this to take screenshots of pages randomly generated by the previous program
- Sample Screenshot:
Lists of Tasks
Below is a list of the tasks that were completed as a part of our final project. Below each task we have described how exactly it was implemented with code examples.
- 1) Extend the program that controls Servo to also control Firefox using geckodriver
Task 1 is relatively simple. It just involves downloading geckodriver and running it. Geckodriver is an open source software engine that allows us to render marked content on a web browser. It should allow us to take screenshots of a particular URL, just like task 5 in the previous work section, but for Firefox instead.
- 2) Compare the resulting screenshots and report the contents of the generated page if the screenshots differ
This task involves automating Firefox to use geckodriver and the current servo program. They both will create 2 different screenshots. If servo and Firefox render it differently, we will report that file and mark the differences.
- 3) Extend the page generation tool with a bunch of additional strategies, such as:
- -Generating elements trees of arbitrary depth
- -Generating sibling elements
- -Extending the set of CSS properties that can be generated (display, background, float, padding, margin, border, etc.)
- -Extending the set of elements that can be generated (span, div, header elements, table (and associated table contents), etc.)
- -Randomly choose whether to generate a document in quirks mode or not
- UML Diagram of Tasks:
Below is a UML diagram of the tasks of the overall system. As you can see, it covers all 3 tasks. It starts with the code generation and then it splits and takes screenshots on both Firefox and Servo. It then compares then screenshots and reports the distance.
Implementation of Tasks
- 1) Extend the program that controls Servo to also control Firefox using geckodriver
- 2) Compare the resulting screenshots and report the contents of the generated page if the screenshots differ
- To actually make the comparisons we use OpenCV. OpenCV allows us to not only make comparisons but will also mark where the differences in. To actually do the comparisons we will first read in the images, convert the images to grayscale, and then call compare_ssim().
imageA = cv2.imread(image1) imageB = cv2.imread(image2) grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY) grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY) (score, diff) = compare_ssim(grayA, grayB, full=True)
This gets us a score and diff values. The score variables represents how close the to images are to each other and the diff variables tells us where the differences are. If the score indicates that the images are different we then use the diff value to findContours() and then draw a rectangle around the differences.
thresh = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1] contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) contours = contours[0] if imutils.is_cv2() else contours[1]
After getting contours, we get a rectangle for each contour by calling "(x, y, w, h) = cv2.boundingRect(c)". We then combine any rectangles that are inside each other or very close to each other. After combining the rectangles, we then add the rectangles to each image. Lastly we then save image A (with the rectangles to show the differences)
for rrr in rectangles: x = rrr[0] y = rrr[1] w = rrr[2] h = rrr[3] cv2.rectangle(imageA, (x, y), (x + w, y + h), (0, 0, 255), 2) cv2.rectangle(imageB, (x, y), (x + w, y + h), (0, 0, 255), 2) cv2.imwrite(filename,imageA)
- 3) Extend the page generation tool with a bunch of additional strategies, such as:
- For task 3 there are several different parts, but the main goal was to increase the complexity of our randomly generated pages. The code_generation.py file was adapted to provide these functions. The implementation of each subtask is explained below.
Generating elements trees of arbitrary depth
- this
Generating sibling elements
Extending the set of CSS properties that can be generated (display, background, float, padding, margin, border, etc.)
- Implementing this was relatively simple. We just did the same thing we did before but with more CSS properties. Below is the code we used to do this. It does not include creating the arrays with different property values but shows us accessing the different arrays.
string_css += 'display: ' + displayTypes[random.randint(0, 3)] + ';\r\n' string_css += 'background-color: rgb(' + str(random.randint(0, 255)) + ',' + str(random.randint(0, 255)) + ',' + str(random.randint(0, 255)) +');\r\n' string_css += 'float: ' + floatTypes[random.randint(0, 3)] + ';\r\n' string_css += 'padding: ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px;\r\n' string_css += 'margin: ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px;\r\n' string_css += 'border-width: ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px ' + str(random.randint(0, 20)) + 'px;\r\n' string_css += 'border-style: ' + borderTypes[random.randint(0, 4)] + ';\r\n'
Extending the set of elements that can be generated (span, div, header elements, table (and associated table contents), etc.)
Randomly choose whether to generate a document in quirks mode or not
- Quirks mode is enable whenever an HTML document does not have a DOCTYPE. To randomly enable Quirks mode we only added a DOCTYPE 50% of the time.
Testing
Because of the nature of this project, testing the newly added features will be very simple.
- * For task 1, once we are actually controlling Firefox using geckodriver we know its working (by getting a resulting screenshot).
- * Task 2 actually involves us comparing screenshots. By running the code with images that are the same and images that are different, we have proven that this comparison works.
- * Additionally, we have visually tested that task 3 is complete (seeing that the new page generation features work). While we could probably automate some of the testing for task 2 and 3 (to test all possible scenarios) it is out of the scope of this project.
Conclusion
The previously completed work allows us to generate simple html documents with a randomized structure, render the page in Servo, and take a screenshot of the page. This project furthered this work by rendering the pages in both Servo and Firefox, taking screenshots of the pages within both browsers, and reporting differences between the two. This project also expanded on the random content generator. This work will now allow users to evaluate Servo’s ability to load web pages. To make this testing even more informative.