CSC/ECE 517 Spring 2020 - M2001. Implement charset prescanning for the HTML parser

From Expertiza_Wiki
Revision as of 23:20, 13 April 2020 by Bwanza (talk | contribs)
Jump to navigation Jump to search

Servo is a modern, high-performance browser engine designed for both application and embedded use. Servo is written in the Rust language. It is currently developed on 64-bit macOS, 64-bit Linux, 64-bit Windows, and Android. As of February 17, 2020, Servo is not yet capable of charset parsing, a feature that all other browsers have. The goal of this project is to implement HTML charset parsing in the current version of Servo.

Introduction

Servo

Servo is an experimental browser engine developed to take advantage of the memory safety properties and concurrency features of the Rust programming language. The project was initiated by Mozilla Research with the effort from Samsung to port it to Android and ARM processors. The prototype seeks to create a highly parallel environment, in which many components (such as rendering, layout, HTML parsing, image decoding, etc.) are handled by fine-grained, isolated tasks.

Rust

Rust is a multi-paradigm programming language focused on performance and safety, especially safe concurrency. Rust is syntactically similar to C++ but provides memory safety without using garbage collection.

DOM

  • The HTML DOM is an Object Model for HTML. It defines HTML elements as objects, properties for all HTML elements, methods for all HTML elements, events for all HTML elements.
  • The HTML DOM is an API (Programming Interface) for JavaScript. JavaScript can add/change/remove HTML elements, add/change/remove HTML attributes, add/change/remove CSS styles etc.

The HTML DOM Tree of Objects

Setup

Setting up the local environment on our machines requires "rustup", an installer for the systems programming language Rust. The guide to set up the local environment for each operating system can be found here.

Final Project

Problem Statement

  • Our main focus is to complete the initial steps listed on the project page. The goal here is to create a new Rust module in the html5ever repository and implement the byte stream prescanning algorithm.
  • After completing the initial steps, we integrate the new prescan algorithm into Servo's HTML parser implementation following the encoding sniffing algorithm. Here, Rust package manager "Cargo" will be used.

Design Pattern

Design pattern will not be applied here since our main goal is to create a method that implements a byte stream prescanning algorithm.

Implementation