CSC/ECE 517 Fall 2014/oss M1452 jns: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(82 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Integrating an XML Parser =
= Integrating an XML Parser =
This wiki is about the project M 1452: Integrate an XML Parser, which is a part of the ongoing development on the Servo browser. Below you can find a brief description about the servo project, rust programming language, requirements of this particular project, setting up of the development environment and other relevant details.
An important part of loading web pages is the process of turning HTML source into a [http://en.wikipedia.org/wiki/Document_Object_Model DOM]. Currently Servo have parsers to do this but the ability to turn [http://en.wikipedia.org/wiki/XHTML XHTML] into DOM is missing. The goal of our project is to integrate an xml parser into servo such that it can parse XHTML as well. <ref>https://github.com/servo/servo/wiki/XML-parser-student-project</ref>.
 
[https://github.com/servo/servo/wiki/XML-parser-student-project Project Description]
 


__TOC__
__TOC__
= Background Information =
= Background Information =
Servo is an experimental web browser layout engine being developed by Mozilla Research. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. The project has a symbiotic relationship with the Rust programming language, in which it is being developed.
=== Rust ===
[http://www.rust-lang.org/ Rust] is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming.
It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. <ref name=Rust>[https://www.mozilla.org/en-US/research/projects/ Overview of Mozilla Research Projects]</ref>
Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory. <ref name=Rust>[https://www.mozilla.org/en-US/research/projects/ Overview of Mozilla Research Projects]</ref>
=== Servo ===
[https://github.com/servo/servo Servo] is a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, Mozilla are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.<ref name=Rust></ref>
''Task Supervision Diagram''
[[File:Diagram1.jpg]]
''Task Projection Diagram''
[[File:Diagram2.jpg]]
• '' Each box '' represents a Rust task.
• '' Blue boxes '' represent the primary tasks in the browser pipeline.
• '' Gray boxes '' represent tasks auxiliary to the browser pipeline.
• '' White boxes '' represent worker tasks. Each such box represents several tasks, the precise number of which will vary with the workload.
• '' Dashed lines '' indicate supervisor relationships.
• '' Solid lines '' indicate communication channels.
= Prerequisites =
'''Installing Dependencies for Servo''' 
The list of commands to install the dependencies on various platforms are<ref name = github>[https://github.com/servo/servo Servo Github]</ref>
On OS X (homebrew):
    brew install automake pkg-config python glfw3 cmake
    pip install virtualenv
On OS X (MacPorts):
    sudo port install python27 py27-virtualenv cmake
On Debian-based Linuxes:
    sudo apt-get install curl freeglut3-dev \
    libfreetype6-dev libgl1-mesa-dri libglib2.0-dev xorg-dev \
    msttcorefonts gperf g++ cmake python-virtualenv \
    libssl-dev libglfw-dev
On Fedora:
    sudo yum install curl freeglut-devel libtool gcc-c++ libXi-devel \
        freetype-devel mesa-libGL-devel glib2-devel libX11-devel libXrandr-devel gperf \
        fontconfig-devel cabextract ttmkfdir python python-virtualenv expat-devel \
        rpm-build openssl-devel glfw-devel cmake
    pushd .
    cd /tmp
    wget http://corefonts.sourceforge.net/msttcorefonts-2.5-1.spec
    rpmbuild -bb msttcorefonts-2.5-1.spec
    sudo yum install $HOME/rpmbuild/RPMS/noarch/msttcorefonts-2.5-1.noarch.rpm
    popd
On Arch Linux:
    sudo pacman -S base-devel git python2 python2-virtualenv mesa glfw ttf-font cmake
= Building Servo =
''Normal Build''


    git clone https://github.com/servo/servo
    cd servo
    ./mach build
    ./mach run tests/html/about-mozilla.html


== Rust ==
''Building for Android target''
[http://www.rust-lang.org/ Rust] is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming.
 
    git clone https://github.com/servo/servo
    cd servo
    ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android
    cd ports/android
    ANDROID_SDK=/path/to/sdk make install
 
= Project Implementation =
 
As part of our OSS project , we have created a new Parser trait and have implemented it for ServoHTMLParser struct as follows.
 
→ Created a new file mod.rs in the following directory:  <code> servo/components/script/parse/mod.rs</code>
 
→ Created a Parser Trait with the parse_chunk method in this mod.rs
 
    pub mod html;
    pub trait Parser {
        fn parse_chunk(&self,input: String);
        fn finish(&self);
    }
 
→ Implementation of this Parser trait for the ServoHTML Parser struct is in the servo <code>/components/script/dom/servohtmlparser.rs</code>
 
    impl Parser for ServoHTMLParser{
        fn parse_chunk(&self, input: String) {
      self.tokenizer().borrow_mut().feed(input);
        }
        fn finish(&self){
      self.tokenizer().borrow_mut().end();
        } 
    }
 
→ Modified <code>servo/components/script/parse/html.rs</code> to invoke the parse_chunk method appropriately.
 
    InputString(s) => {
            parser.parse_chunk(s);
        }
 
    InputUrl(url) => {
            let load_response = load_response.unwrap();
            match load_response.metadata.content_type {
                Some((ref t, _)) if t.as_slice().eq_ignore_ascii_case("image") => {
                    let page = format!("<html><body><img src='{:s}' /></body></html>", base_url.as_ref().unwrap().serialize());
                    parser.parse_chunk(page);
                },
   
    Payload(data) => {
                                // FIXME: use Vec<u8> (html5ever #34)
                                let data = UTF_8.decode(data.as_slice(), DecodeReplace).unwrap();
                                parser.parse_chunk(data);
                            }
 
= Design Patterns/Principles =
We have followed [http://www.artima.com/lejava/articles/designprinciples.html “Programming to Interfaces, not an implementation”] design principle while implementing initial step of our project. This principle is really about dependency relationships which have to be carefully managed . When you depend on interfaces only, you're decoupled from the implementation. That means the implementation can vary, and that's a healthy dependency relationship . We have modified the code of html.rs to invoke Parser trait’s parse_chunk method instead of directly invoking the methods of ServoHtmlParser.Thereby in future,if in case we use any other parser instance instead of ServoHtmlParser , we need not modify the invocation logic in html.rs function.(Traits are similar  to interfaces in other languages).


== Servo ==
= References =
<references>
</references>

Latest revision as of 23:45, 29 October 2014

Integrating an XML Parser

An important part of loading web pages is the process of turning HTML source into a DOM. Currently Servo have parsers to do this but the ability to turn XHTML into DOM is missing. The goal of our project is to integrate an xml parser into servo such that it can parse XHTML as well. <ref>https://github.com/servo/servo/wiki/XML-parser-student-project</ref>.

Project Description


Background Information

Servo is an experimental web browser layout engine being developed by Mozilla Research. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. The project has a symbiotic relationship with the Rust programming language, in which it is being developed.

Rust

Rust is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. <ref name=Rust>Overview of Mozilla Research Projects</ref>

Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory. <ref name=Rust>Overview of Mozilla Research Projects</ref>

Servo

Servo is a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, Mozilla are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.

Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.<ref name=Rust></ref>

Task Supervision Diagram

Task Projection Diagram

Each box represents a Rust task.

Blue boxes represent the primary tasks in the browser pipeline.

Gray boxes represent tasks auxiliary to the browser pipeline.

White boxes represent worker tasks. Each such box represents several tasks, the precise number of which will vary with the workload.

Dashed lines indicate supervisor relationships.

Solid lines indicate communication channels.

Prerequisites

Installing Dependencies for Servo

The list of commands to install the dependencies on various platforms are<ref name = github>Servo Github</ref>

On OS X (homebrew):

   brew install automake pkg-config python glfw3 cmake
   pip install virtualenv

On OS X (MacPorts):

   sudo port install python27 py27-virtualenv cmake

On Debian-based Linuxes:

   sudo apt-get install curl freeglut3-dev \
   libfreetype6-dev libgl1-mesa-dri libglib2.0-dev xorg-dev \
   msttcorefonts gperf g++ cmake python-virtualenv \
   libssl-dev libglfw-dev

On Fedora:

   sudo yum install curl freeglut-devel libtool gcc-c++ libXi-devel \
       freetype-devel mesa-libGL-devel glib2-devel libX11-devel libXrandr-devel gperf \
       fontconfig-devel cabextract ttmkfdir python python-virtualenv expat-devel \
       rpm-build openssl-devel glfw-devel cmake
   pushd .
   cd /tmp
   wget http://corefonts.sourceforge.net/msttcorefonts-2.5-1.spec
   rpmbuild -bb msttcorefonts-2.5-1.spec
   sudo yum install $HOME/rpmbuild/RPMS/noarch/msttcorefonts-2.5-1.noarch.rpm
   popd


On Arch Linux:

   sudo pacman -S base-devel git python2 python2-virtualenv mesa glfw ttf-font cmake

Building Servo

Normal Build

   git clone https://github.com/servo/servo
   cd servo
   ./mach build
   ./mach run tests/html/about-mozilla.html 

Building for Android target

   git clone https://github.com/servo/servo
   cd servo
   ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android
   cd ports/android
   ANDROID_SDK=/path/to/sdk make install

Project Implementation

As part of our OSS project , we have created a new Parser trait and have implemented it for ServoHTMLParser struct as follows.

→ Created a new file mod.rs in the following directory: servo/components/script/parse/mod.rs

→ Created a Parser Trait with the parse_chunk method in this mod.rs

   pub mod html;
   pub trait Parser {
       fn parse_chunk(&self,input: String);
       fn finish(&self);
   }

→ Implementation of this Parser trait for the ServoHTML Parser struct is in the servo /components/script/dom/servohtmlparser.rs

   impl Parser for ServoHTMLParser{
       fn parse_chunk(&self, input: String) {
 	    self.tokenizer().borrow_mut().feed(input);
       }
       fn finish(&self){
 	    self.tokenizer().borrow_mut().end();
       }  
   }

→ Modified servo/components/script/parse/html.rs to invoke the parse_chunk method appropriately.

    InputString(s) => {
           parser.parse_chunk(s);
       }
   InputUrl(url) => {
           let load_response = load_response.unwrap();
           match load_response.metadata.content_type {
               Some((ref t, _)) if t.as_slice().eq_ignore_ascii_case("image") => {
                   let page = format!("<html><body><img src='{:s}' /></body></html>", base_url.as_ref().unwrap().serialize());
                   parser.parse_chunk(page);
               },
   
   Payload(data) => {
                               // FIXME: use Vec<u8> (html5ever #34)
                               let data = UTF_8.decode(data.as_slice(), DecodeReplace).unwrap();
                               parser.parse_chunk(data);
                           }

Design Patterns/Principles

We have followed “Programming to Interfaces, not an implementation” design principle while implementing initial step of our project. This principle is really about dependency relationships which have to be carefully managed . When you depend on interfaces only, you're decoupled from the implementation. That means the implementation can vary, and that's a healthy dependency relationship . We have modified the code of html.rs to invoke Parser trait’s parse_chunk method instead of directly invoking the methods of ServoHtmlParser.Thereby in future,if in case we use any other parser instance instead of ServoHtmlParser , we need not modify the invocation logic in html.rs function.(Traits are similar to interfaces in other languages).

References

<references> </references>