CSC/ECE 517 Fall 2014/oss M1452 jns: Difference between revisions
No edit summary |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Integrating an XML Parser = | = Integrating an XML Parser = | ||
An important part of loading web pages is the process of turning HTML source into a DOM. Currently Servo | An important part of loading web pages is the process of turning HTML source into a [http://en.wikipedia.org/wiki/Document_Object_Model DOM]. Currently Servo have parsers to do this but the ability to turn [http://en.wikipedia.org/wiki/XHTML XHTML] into DOM is missing. The goal of our project is to integrate an xml parser into servo such that it can parse XHTML as well. <ref>https://github.com/servo/servo/wiki/XML-parser-student-project</ref>. | ||
[https://github.com/servo/servo/wiki/XML-parser-student-project Project Description] | |||
Line 97: | Line 99: | ||
= Project Implementation = | = Project Implementation = | ||
As part of our OSS project , we have created a new | As part of our OSS project , we have created a new Parser trait and have implemented it for ServoHTMLParser struct as follows. | ||
→ Created a new file mod.rs in the following directory: <code> servo/components/script/parse/mod.rs</code> | → Created a new file mod.rs in the following directory: <code> servo/components/script/parse/mod.rs</code> |
Latest revision as of 23:45, 29 October 2014
Integrating an XML Parser
An important part of loading web pages is the process of turning HTML source into a DOM. Currently Servo have parsers to do this but the ability to turn XHTML into DOM is missing. The goal of our project is to integrate an xml parser into servo such that it can parse XHTML as well. <ref>https://github.com/servo/servo/wiki/XML-parser-student-project</ref>.
Background Information
Servo is an experimental web browser layout engine being developed by Mozilla Research. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. The project has a symbiotic relationship with the Rust programming language, in which it is being developed.
Rust
Rust is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. <ref name=Rust>Overview of Mozilla Research Projects</ref>
Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory. <ref name=Rust>Overview of Mozilla Research Projects</ref>
Servo
Servo is a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, Mozilla are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.<ref name=Rust></ref>
Task Supervision Diagram
Task Projection Diagram
• Each box represents a Rust task.
• Blue boxes represent the primary tasks in the browser pipeline.
• Gray boxes represent tasks auxiliary to the browser pipeline.
• White boxes represent worker tasks. Each such box represents several tasks, the precise number of which will vary with the workload.
• Dashed lines indicate supervisor relationships.
• Solid lines indicate communication channels.
Prerequisites
Installing Dependencies for Servo
The list of commands to install the dependencies on various platforms are<ref name = github>Servo Github</ref>
On OS X (homebrew):
brew install automake pkg-config python glfw3 cmake pip install virtualenv
On OS X (MacPorts):
sudo port install python27 py27-virtualenv cmake
On Debian-based Linuxes:
sudo apt-get install curl freeglut3-dev \ libfreetype6-dev libgl1-mesa-dri libglib2.0-dev xorg-dev \ msttcorefonts gperf g++ cmake python-virtualenv \ libssl-dev libglfw-dev
On Fedora:
sudo yum install curl freeglut-devel libtool gcc-c++ libXi-devel \ freetype-devel mesa-libGL-devel glib2-devel libX11-devel libXrandr-devel gperf \ fontconfig-devel cabextract ttmkfdir python python-virtualenv expat-devel \ rpm-build openssl-devel glfw-devel cmake pushd . cd /tmp wget http://corefonts.sourceforge.net/msttcorefonts-2.5-1.spec rpmbuild -bb msttcorefonts-2.5-1.spec sudo yum install $HOME/rpmbuild/RPMS/noarch/msttcorefonts-2.5-1.noarch.rpm popd
On Arch Linux:
sudo pacman -S base-devel git python2 python2-virtualenv mesa glfw ttf-font cmake
Building Servo
Normal Build
git clone https://github.com/servo/servo cd servo ./mach build ./mach run tests/html/about-mozilla.html
Building for Android target
git clone https://github.com/servo/servo cd servo ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android cd ports/android ANDROID_SDK=/path/to/sdk make install
Project Implementation
As part of our OSS project , we have created a new Parser trait and have implemented it for ServoHTMLParser struct as follows.
→ Created a new file mod.rs in the following directory: servo/components/script/parse/mod.rs
→ Created a Parser Trait with the parse_chunk method in this mod.rs
pub mod html; pub trait Parser { fn parse_chunk(&self,input: String); fn finish(&self); }
→ Implementation of this Parser trait for the ServoHTML Parser struct is in the servo /components/script/dom/servohtmlparser.rs
impl Parser for ServoHTMLParser{ fn parse_chunk(&self, input: String) { self.tokenizer().borrow_mut().feed(input); } fn finish(&self){ self.tokenizer().borrow_mut().end(); } }
→ Modified servo/components/script/parse/html.rs
to invoke the parse_chunk method appropriately.
InputString(s) => { parser.parse_chunk(s); }
InputUrl(url) => { let load_response = load_response.unwrap(); match load_response.metadata.content_type { Some((ref t, _)) if t.as_slice().eq_ignore_ascii_case("image") => { let page = format!("<html><body><img src='{:s}' /></body></html>", base_url.as_ref().unwrap().serialize()); parser.parse_chunk(page); }, Payload(data) => { // FIXME: use Vec<u8> (html5ever #34) let data = UTF_8.decode(data.as_slice(), DecodeReplace).unwrap(); parser.parse_chunk(data); }
Design Patterns/Principles
We have followed “Programming to Interfaces, not an implementation” design principle while implementing initial step of our project. This principle is really about dependency relationships which have to be carefully managed . When you depend on interfaces only, you're decoupled from the implementation. That means the implementation can vary, and that's a healthy dependency relationship . We have modified the code of html.rs to invoke Parser trait’s parse_chunk method instead of directly invoking the methods of ServoHtmlParser.Thereby in future,if in case we use any other parser instance instead of ServoHtmlParser , we need not modify the invocation logic in html.rs function.(Traits are similar to interfaces in other languages).
References
<references> </references>