Integrating an XML Parser

An important part of loading web pages is the process of turning HTML source into a DOM. Currently Servo have parsers to do this but the ability to turn XHTML into DOM is missing. The goal of our project is to integrate an xml parser into servo such that it can parse XHTML as well. <ref>https://github.com/servo/servo/wiki/XML-parser-student-project</ref>.

Project Description

Background Information

Servo is an experimental web browser layout engine being developed by Mozilla Research. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. The project has a symbiotic relationship with the Rust programming language, in which it is being developed.

Rust

Rust is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. <ref name=Rust>Overview of Mozilla Research Projects</ref>

Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory. <ref name=Rust>Overview of Mozilla Research Projects</ref>

Servo

Servo is a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, Mozilla are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.

Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.<ref name=Rust></ref>

Task Supervision Diagram

Task Projection Diagram

• Each box represents a Rust task.

• Blue boxes represent the primary tasks in the browser pipeline.

• Gray boxes represent tasks auxiliary to the browser pipeline.

• White boxes represent worker tasks. Each such box represents several tasks, the precise number of which will vary with the workload.

• Dashed lines indicate supervisor relationships.

• Solid lines indicate communication channels.

Prerequisites

Installing Dependencies for Servo

The list of commands to install the dependencies on various platforms are<ref name = github>Servo Github</ref>

On OS X (homebrew):

   brew install automake pkg-config python glfw3 cmake
   pip install virtualenv

On OS X (MacPorts):

   sudo port install python27 py27-virtualenv cmake

On Debian-based Linuxes:

   sudo apt-get install curl freeglut3-dev \
   libfreetype6-dev libgl1-mesa-dri libglib2.0-dev xorg-dev \
   msttcorefonts gperf g++ cmake python-virtualenv \
   libssl-dev libglfw-dev

On Fedora:

   sudo yum install curl freeglut-devel libtool gcc-c++ libXi-devel \
       freetype-devel mesa-libGL-devel glib2-devel libX11-devel libXrandr-devel gperf \
       fontconfig-devel cabextract ttmkfdir python python-virtualenv expat-devel \
       rpm-build openssl-devel glfw-devel cmake
   pushd .
   cd /tmp
   wget http://corefonts.sourceforge.net/msttcorefonts-2.5-1.spec
   rpmbuild -bb msttcorefonts-2.5-1.spec
   sudo yum install $HOME/rpmbuild/RPMS/noarch/msttcorefonts-2.5-1.noarch.rpm
   popd

On Arch Linux:

   sudo pacman -S base-devel git python2 python2-virtualenv mesa glfw ttf-font cmake

Building Servo

Normal Build

   git clone https://github.com/servo/servo
   cd servo
   ./mach build
   ./mach run tests/html/about-mozilla.html

Building for Android target

   git clone https://github.com/servo/servo
   cd servo
   ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android
   cd ports/android
   ANDROID_SDK=/path/to/sdk make install

Project Implementation

As part of our OSS project , we have created a new Parser trait and have implemented it for ServoHTMLParser struct as follows.

→ Created a new file mod.rs in the following directory: servo/components/script/parse/mod.rs

→ Created a Parser Trait with the parse_chunk method in this mod.rs

   pub mod html;
   pub trait Parser {
       fn parse_chunk(&self,input: String);
       fn finish(&self);
   }

→ Implementation of this Parser trait for the ServoHTML Parser struct is in the servo /components/script/dom/servohtmlparser.rs

   impl Parser for ServoHTMLParser{
       fn parse_chunk(&self, input: String) {
 	    self.tokenizer().borrow_mut().feed(input);
       }
       fn finish(&self){
 	    self.tokenizer().borrow_mut().end();
       }  
   }

→ Modified servo/components/script/parse/html.rs to invoke the parse_chunk method appropriately.

    InputString(s) => {
           parser.parse_chunk(s);
       }

   InputUrl(url) => {
           let load_response = load_response.unwrap();
           match load_response.metadata.content_type {
               Some((ref t, _)) if t.as_slice().eq_ignore_ascii_case("image") => {
                   let page = format!("<html><body><img src='{:s}' /></body></html>", base_url.as_ref().unwrap().serialize());
                   parser.parse_chunk(page);
               },
   
   Payload(data) => {
                               // FIXME: use Vec<u8> (html5ever #34)
                               let data = UTF_8.decode(data.as_slice(), DecodeReplace).unwrap();
                               parser.parse_chunk(data);
                           }

Design Patterns/Principles

We have followed “Programming to Interfaces, not an implementation” design principle while implementing initial step of our project. This principle is really about dependency relationships which have to be carefully managed . When you depend on interfaces only, you're decoupled from the implementation. That means the implementation can vary, and that's a healthy dependency relationship . We have modified the code of html.rs to invoke Parser trait’s parse_chunk method instead of directly invoking the methods of ServoHtmlParser.Thereby in future,if in case we use any other parser instance instead of ServoHtmlParser , we need not modify the invocation logic in html.rs function.(Traits are similar to interfaces in other languages).

References

CSC/ECE 517 Fall 2014/oss M1452 jns

Integrating an XML Parser

Contents

Background Information

Rust

Servo

Prerequisites

Building Servo

Project Implementation

Design Patterns/Principles

References

Navigation menu

CSC/ECE 517 Fall 2014/oss M1452 jns

Integrating an XML Parser

Background Information

Rust

Servo

Prerequisites

Building Servo

Project Implementation

Design Patterns/Principles

References

Navigation menu

Search