CSC/ECE 517 Fall 2015/oss/M1503/IntegrateXMLParser

From Expertiza_Wiki
Revision as of 04:24, 10 November 2015 by Rghadiy (talk | contribs)
Jump to navigation Jump to search

Rust

Rust is a general-purpose, compiled programming language developed by Mozilla Research. The syntax of Rust is somewhat similar to C and C++, with blocks of code delimited by curly brackets, and control flow and structure. Rust does not use automatic garbage collection mechanism similar to java. It accomplishes the goals of memory safe without using garbage collection and it supports concurrency and parallelism in building platforms.

Servo

Servo is web browser layout engine developed by Mozilla Research. It was developed in Rust. Servo handles parallel environments such as rendering, layout, image decoding as a separate tasks. Servo provides APIs, JavaScript support. Servo was not developed explicitly to create full web browser but to achieve maximum parallelism.

Compilation and Build

Code link: https://github.com/servo/servo/ . This repository is forked to https://github.com/ronak6892/servo . Build success is ensured by commands ./mach build --dev and ./mach run tests/html/about-mozilla.html.

Project Description

Servo currently lacks a parser for XML documents, which prevents implementation of several APIs. The goal of the project is to integrate xml5ever parser into Servo.

Design

To integrate XML5ever, dependency was added for XML5ever parser similar to HTML5ever.A separate interface was defined in ServoXMLParser webidl file and this interface was implemented in its corresponding rust file along with necessary stubs to parse XML. The Adapter Design Pattern was applied to enhance parsing mechanism for XML5 in Servo. Interface defined using adapter pattern closely resembles servoHTMLParser interface as this will facilitate parsing or modifying any code for both XML and HTML documents and future reader don't have to understand code for both separately as they are related in their functionality.

Initial steps

To achieve project goal we have done following initial steps.

Complied servo and add xml5ever as a dependency to the script using cargo package manager. To do this we edited Cargo.toml located at components/script by adding xml5ever as a dependency.

Added xml.rs at components/script/parse with parse_xml() as a function. mod.rs also need to be modified in order to declare file.

Added ServoXMLParser interface with necessary stubs in servoxmlparser.rs located at components/script/dom. Also servoxmlparser need to be declared in mod.rs located at components/script/dom.

Called parse_xml from domparser.rs located at components/script/dom this will help compile.

Implementation

  • Modify Script::load in scipt_task.rs to check whether documents we are parsing isof type text/xml. We the content type of the document is text/xml then we have to call parse_xml which we defined in OSS project. Earlier it called parse_html for all the documents but now since we are integrating the XML parser, it will call parse_xml instead of parse_html as appropriate, while passing the appropriate flag to the Document constructor.
  • Implement a TreeSink for the XML parser which will pick nodes from XML document and append them to XML tree in the hierarchy. Serializable module also needs to be implemented in order to implement TeeSink.
  • Support for XML document responses needs to be implemented. We will check the response Document and its MIME type. If either one of them is null then function will return null. Otherwise if MIME type is text/html then we will check its charset and if its null then set it to UTF-8 and if MIME type is text/xml then we will define document to be a Document that represents the result of running the XML parser with XML scripting support disabled on bytes. At the end end we will set document’s encoding to charset, content type to final MIME type and url to document’s URL and return response document object.
  • Implement XMLDocument API:
  1. adding the new IDL file at components/script/dom/webidls/XMLDocument.webidl;
  2. creating components/script/dom/XMLDocument.rs;
  3. listing XMLDocument .rs in components/script/dom/mod.rs;
  4. defining the DOM struct XMLDocument with a #[dom_struct] attribute, a superclass or Reflector member, and other members as appropriate;
  5. implementing the dom::bindings::codegen::Bindings::XMLDocumentBindings::XMLDocumentMethods trait for &'a XMLDocument.
  6. In XMLDocument.webidl file, implement the load method.
  partial interface XMLDocument {
	boolean load(DOMString url);
  };

The load(url) method must run the following steps:

  1. Let document be the XMLDocument object on which the method was invoked.
  2. Resolve the method's first argument, relative to the API base URL specified by the entry settings object. If this is not successful, throw a SyntaxError exception and abort these steps. Otherwise, let url be the resulting absolute URL.
  3. If the origin of url is not the same as the origin of document, throw a SecurityError exception and abort these steps.
  4. Remove all child nodes of document, without firing any mutation events.
  5. Set the current document readiness of document to "loading".
  6. Run the remainder of these steps in parallel, and return true from the method.
  7. Let result be a Document object.
  8. Let success be false.
  9. Let request be a new request whose url is url, client is entry settings object, destination is "subresource", synchronous flag is set, mode is "same- 1. origin", credentials mode is "same-origin", and whose use-URL-credentials flag is set.
  10. Let response be the result of fetching request.
  11. If response's Content-Type metadata is an XML MIME type, then run these substeps:
    1. Create a new XML parser associated with the result document.
    2. Pass this parser response's body.
    3. If there is an XML well-formedness or XML namespace well-formedness error, then remove all child nodes from result. Otherwise let success be true.
  12. Queue a task to run the following steps.
    1. Set the current document readiness of document to "complete".
    2. Replace all the children of document by the children of result (even if it has no children), firing mutation events as if a DocumentFragment containing the new children had been inserted.
    3. Fire a simple event named load at document.

Challenges

Primary challenge is to continuously sync with the latest commits of servo and ensuring build success after integrating each step. And keep on adding of necessary stubs as per the new changes in parsing mechanism also need continuous and careful efforts.

Testing

To integrate the model of XML parser into current servo code, no extra test cases were added. We have added interface files and method stubs, XML parser functionality will be implemented as subsequent steps in final project. Integration success is tested by successful compilation and build after adding our changes. Following commands are used to check that all test cases were passed.

./mach run tests/html/about-mozilla.html

./mach test-tidy

All the modifications which were suggested by servo community through comments on pull request, have been incorporated and pull request has been merged successfully.

External Links

HTML5ever - https://github.com/servo/html5ever/

XML5ever - https://github.com/Ygg01/xml5ever

Pull request - https://github.com/servo/servo/pull/8278

YouTube - https://youtu.be/i8dONOzYwlc

References

Servo Documentation - http://doc.servo.org/servo/index.html

Project Definition - https://github.com/servo/servo/wiki/Integrate-xml5ever

Rust Documentation - https://doc.rust-lang.org/nightly/index.html

XML specs - https://xhr.spec.whatwg.org/#document-response