CSC/ECE 517 Fall 2014/oss M1452 jns: Difference between revisions
No edit summary |
|||
Line 2: | Line 2: | ||
[https://github.com/servo/servo/wiki/XML-parser-student-project Project Description] | [https://github.com/servo/servo/wiki/XML-parser-student-project Project Description] | ||
Servo has the html parser but the ability to parse an XHTML into DOM is missing. The goal of this project is to integrate an | |||
__TOC__ | __TOC__ |
Revision as of 17:03, 29 October 2014
Integrating an XML Parser
Servo has the html parser but the ability to parse an XHTML into DOM is missing. The goal of this project is to integrate an
Background Information
Servo is an experimental web browser layout engine being developed by Mozilla Research. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks. The project has a symbiotic relationship with the Rust programming language, in which it is being developed.
Rust
Rust is a systems language for writing high performance applications that are usually written in C or C++ but it was developed to prevent some of the problems related to invalid memory accesses that generate segmentation faults. It covers imperative, functional and object-oriented programming. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. <ref name=Rust>Overview of Mozilla Research Projects</ref>
Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory. <ref name=Rust>Overview of Mozilla Research Projects</ref>
Servo
Servo is a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, Mozilla are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.<ref name=Rust></ref>
Task Supervision Diagram
Task Projection Diagram
• Each box represents a Rust task.
• Blue boxes represent the primary tasks in the browser pipeline.
• Gray boxes represent tasks auxiliary to the browser pipeline.
• White boxes represent worker tasks. Each such box represents several tasks, the precise number of which will vary with the workload.
• Dashed lines indicate supervisor relationships.
• Solid lines indicate communication channels.
Prerequisites
Installing Dependencies for Servo
The list of commands to install the dependencies on various platforms are<ref name = github>Servo Github</ref>
On OS X (homebrew):
brew install automake pkg-config python glfw3 cmake pip install virtualenv
On OS X (MacPorts):
sudo port install python27 py27-virtualenv cmake
On Debian-based Linuxes:
sudo apt-get install curl freeglut3-dev \ libfreetype6-dev libgl1-mesa-dri libglib2.0-dev xorg-dev \ msttcorefonts gperf g++ cmake python-virtualenv \ libssl-dev libglfw-dev
On Fedora:
sudo yum install curl freeglut-devel libtool gcc-c++ libXi-devel \ freetype-devel mesa-libGL-devel glib2-devel libX11-devel libXrandr-devel gperf \ fontconfig-devel cabextract ttmkfdir python python-virtualenv expat-devel \ rpm-build openssl-devel glfw-devel cmake pushd . cd /tmp wget http://corefonts.sourceforge.net/msttcorefonts-2.5-1.spec rpmbuild -bb msttcorefonts-2.5-1.spec sudo yum install $HOME/rpmbuild/RPMS/noarch/msttcorefonts-2.5-1.noarch.rpm popd
On Arch Linux:
sudo pacman -S base-devel git python2 python2-virtualenv mesa glfw ttf-font cmake
Building Servo
Normal Build
git clone https://github.com/servo/servo cd servo ./mach build ./mach run tests/html/about-mozilla.html
Building for Android target
git clone https://github.com/servo/servo cd servo ANDROID_TOOLCHAIN=/path/to/toolchain ANDROID_NDK=/path/to/ndk PATH=$PATH:/path/to/toolchain/bin ./mach build --android cd ports/android ANDROID_SDK=/path/to/sdk make install
Steps of Implementation
→ Created a new file mod.rs in the following directory: servo/components/script/parse/mod.rs
→ Created a Parser Trait with the parse_chunk method in this mod.rs
pub mod html; pub trait Parser { fn parse_chunk(&self,input: String); fn finish(&self); }
→ Implementation of this Parser trait for the ServoHTML Parser struct is in the servo /components/script/dom/servohtmlparser.rs
impl Parser for ServoHTMLParser{ fn parse_chunk(&self, input: String) { self.tokenizer().borrow_mut().feed(input); } fn finish(&self){ self.tokenizer().borrow_mut().end(); } }
→ Modified servo/components/script/parse/html.rs
to invoke the parse_chunk method appropriately.
InputString(s) => { parser.parse_chunk(s); }
InputUrl(url) => { let load_response = load_response.unwrap(); match load_response.metadata.content_type { Some((ref t, _)) if t.as_slice().eq_ignore_ascii_case("image") => { let page = format!("<html><body><img src='{:s}' /></body></html>", base_url.as_ref().unwrap().serialize()); parser.parse_chunk(page); }, Payload(data) => { // FIXME: use Vec<u8> (html5ever #34) let data = UTF_8.decode(data.as_slice(), DecodeReplace).unwrap(); parser.parse_chunk(data); }
Design Patterns/Principles
We have followed “Programming to Interfaces, not an implementation” design principle while implementing initial step of our project. This principle is really about dependency relationships which have to be carefully managed . When you depend on interfaces only, you're decoupled from the implementation. That means the implementation can vary, and that's a healthy dependency relationship . We have modified the code of html.rs to invoke Parser trait’s parse_chunk method instead of directly invoking the methods of ServoHtmlParser.Thereby in future,if in case we use any other parser instance instead of ServoHtmlParser , we need not modify the invocation logic in html.rs function.(Traits are similar to interfaces in other languages).
References
<references></references>