CSC/ECE 517 Fall 2015 M1503 Integrate xml5ever XML parser

From Expertiza_Wiki
Jump to navigation Jump to search

Rust

Rust is a general-purpose, compiled programming language developed by Mozilla Research. The syntax of Rust is somewhat similar to C and C++, with blocks of code delimited by curly brackets, and control flow and structure. Rust does not use automatic garbage collection mechanism similar to java. It accomplishes the goals of memory safe without using garbage collection and it supports concurrency and parallelism in building platforms.

Servo

Servo is web browser layout engine developed by Mozilla Research. It was developed in Rust. Servo handles parallel environments such as rendering, layout, image decoding as a separate tasks. Servo provides APIs, JavaScript support. Servo was not developed explicitly to create full web browser but to achieve maximum parallelism.

Compilation and Build

Servo's build system automatically downloads a snapshot Rust compiler to build itself. This is normally a specific revision of Rust upstream, but sometimes has a backported patch or two.

Code link: https://github.com/servo/servo/ . This repository is forked to https://github.com/ronak6892/servo .

Servo is built with Cargo, the Rust package manager. We also use Mozilla's Mach tools to orchestrate the build and other tasks.

Normal build

To build Servo in development mode. This is useful for development, but the resulting binary is very slow.

git clone https://github.com/servo/servo
cd servo
./mach build --dev
./mach run tests/html/about-mozilla.html

For benchmarking, performance testing, or real-world use, add the --release flag to create an optimized build:

./mach build --release
./mach run --release tests/html/about-mozilla.html

Building for Android target

git clone https://github.com/servo/servo
cd servo
ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android
cd ports/android
ANDROID_SDK=/path/to/sdk make install

Rather than setting the ANDROID_* environment variables every time, you can also create a .servobuild file and then edit it to contain the correct paths to the Android SDK/NDK tools:

cp servobuild.example .servobuild
# edit .servobuild

Running

./mach run [url] 

Project Description

Background information

Servo uses a custom HTML5 parser written in Rust, called HTML5ever. Servo currently lacks a parser for XML documents, which prevents it from running XHTML tests and implementing APIs that rely on it. XML5ever which is an experimental XML parser that works on a modified specification of XML called XML5, which drops certain properties of XML like well-formedness for better compatibility with HTML and better error recovery. XML5ever is based largely on HTML5ever parser.

Goal

The goal of the project is to integrate XML5ever parser into Servo for parsing of XML documents which is currently not present in Servo. After the project, Servo will differentiate between HTML and XML documents and parse them accordingly using their respective parser which is currently lacking in it.

Steps done as part of the OSS Project

To achieve project goal we have done following initial steps in our OSS project (which have already been merged in Servo's master branch).

  • Complied servo and added xml5ever as a dependency to the script crate using cargo package manager. To do this we edited Cargo.toml located at components/script by adding xml5ever as a dependency.
  • Added xml.rs at components/script/parse with parse_xml() as a function. Declared xml as public module in mod.rs in order to declare file.
  • declared an empty ServoXMLParser interface in a webidl file located at located at components/script/dom/webidls.
  • Implemented ServoXMLParser interface with necessary stubs in servoxmlparser.rs located at components/script/dom. Also declared servoxmlparser as public module in mod.rs located at components/script/dom.
  • Called parse_xml from domparser.rs located at components/script/dom this will help compile.

Design

To integrate XML5ever, dependency was added for XML5ever parser similar to HTML5ever.A separate interface was defined in ServoXMLParser webidl file and this interface was implemented in its corresponding rust file along with necessary stubs to parse XML.

UML diagram

Design pattern

The Adapter Design Pattern was applied to enhance parsing mechanism for XML5 in Servo. Interface defined using adapter pattern closely resembles servoHTMLParser interface as this will facilitate parsing or modifying any code for both XML and HTML documents and future reader don't have to understand code for both separately as they are related in their functionality.

Implementation

  • Modify Script::load in scipt_task.rs to check whether document being parsed is of type text/xml. When content type of the document is text/xml, parse_xml method which we was defined in OSS project will be called. Earlier it called parse_html for all the documents but now since we are integrating the XML parser, it will call parse_xml instead of parse_html, while passing the appropriate flag to the Document constructor.
  • Sink was implemented for ServoXMLParser in which a utility function is defined get_or_create which searches for a child and if not found then creates a new one and returns it.
  • XML5Ever defines an interface of TreeSink for integrating it. Implementation of this TreeSink was provided in xml.rs. It included implementation of following functions:
    • get_document : returns the xml document being parsed
    • elem_name : returns the name of node of the XML Document specified in the argument
    • create_element : creates a new elements, sets its attributes provided in the argument and returns newly created element
    • create_comment : creates a new comment node using the text specified in the argument
    • append(parent, child) : fins the child using get_or_create function defined in Sink(previous step) and appends it to to the parent node
    • append_doctype_to_document : creates doctype using public_id and system_id provided in the arguments and appends it to the parent node
    • create_pi : creates processing instructions using the target and data in the arguments and returns in reference
  • To use the code added in previous step, new function was defined for ServoXMLParser which consisted creation of Sink, XmlTreeBuilder, XMLTokenizer and provided these attributes to its constructor and returned a DOM Object. To call actual parser, implementation of parse_xml was provided in the xml.rs which created new ServoXMLParser using the arguments provided and passed the input document to parse_chunk method.
  • After implementing above steps, some tests for "text/xml" content-type crashed. To stop that, implementation of resume, suspend, is_suspended and parse_sync methods was provided in ServoXMLParser which used tokenizer methods of XML5Ever and run the parser on input data. This prevented the tests from crashing and behave as expected.
  • When a web page JS creates an XMLHttpRequest the code in XMLHttpRequest.rs file is invoked. The 'send' method initiates a network request and the network response pass through the methods of XHRContext structure. We have implemented the handling of Document type response. We check the final MIME type of the response, make an appropriate document type object and pass the document object and extracted bytes from the response to an appropriate parser. Our work particularly handling of text/html MIME type overlapped with project M1504. So we collaborated on a single pull request. Our part was then concentrated on handling the XML mime type.
  • Implement XMLDocument API: Since new requirements were added of using the newly added code to integrate the actual parser, scope of our project was restricted to the first 3 tasks and last task was removed after contacting TA and Josh Matthews.

Challenges

Primary challenge in this project is to continuously sync with the latest commits of servo and ensuring build success after integrating each step. Adding necessary stubs in the newly added files as per the changes in parsing mechanism will also need continuous and careful efforts. Parsing of XML document incorporates many input arguments so error checking for all the edge cases will also be a challenge while implementing TreeSink and support for XML HttpRequest.

Testing

To integrate the model of XML parser into current servo code, no extra test cases were added. We have added interface files and method stubs, XML parser functionality will be implemented as subsequent steps in final project. Integration success is tested by successful compilation and build after adding our changes. Following commands are used to check that all test cases were passed.

./mach run tests/html/about-mozilla.html

./mach test-tidy

All the modifications which were suggested by servo community through comments on pull request, have been incorporated and pull request has been merged successfully.

References

Servo Documentation - http://doc.servo.org/servo/index.html

Project Definition - https://github.com/servo/servo/wiki/Integrate-xml5ever

Rust Documentation - https://doc.rust-lang.org/nightly/index.html

XML specs - https://xhr.spec.whatwg.org/#document-response

YouTube - https://youtu.be/i8dONOzYwlc