CSC/ECE 517 Fall 2014/oss M1456 kdv
Implement MIME sniffing - spec
This article is about the open source project M 1456: Implement MIME<ref>https://mimesniff.spec.whatwg.org/</ref> sniffing - spec <ref>https://github.com/servo/servo/issues/3144</ref>, which is part of the Servo Parallel Browser Project. Servo is prototype web browser engine which is currently under development and is written in Rust language. Here you can find a brief Background information about MIME sniffing in servo, rust programming language, the initial step and the first step implemented in the project.
Background Information
When we do a HTTP web request which declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file or video file from disk, as in #3131). In this case, we want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received. For example, If its an image we can use image rendering and a video rendering can be used for a video file.
What is Servo?
Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental project to build a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, we are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.
Github link to Servo Project.
Rust Programming Language
Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a new programming language for developing reliable and efficient systems. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety.
Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory.
Github link to Rust language.
Implement MIME sniffing
Initial Step
As part of initial step we first Build Servo. Learned how to create tasks in Rust, then spawn a new sniffer task in the load method in resource_task.rs, by creating a sender and receiver pair as well - this task would read all the available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages continued to load unchanged after rebuilding our code.
Classes
As part of OSS Servo project we worked on below Class files.
Modify
- components/net/resource_task.rs
Create
- components/net/sniffer_task.rs
Delete
- none
Step 1
- For every resource request (Example: file, http, data or any kind of request) in load method, it would interpose a sniffer task which sniffs all the data, parse the headers if required and return it. Below is the code snippet of the modified
resouce_task.rs
file.
fn load(&self, load_data: LoadData, start_chan: Sender<LoadResponse>) { let mut load_data = load_data; load_data.headers.user_agent = self.user_agent.clone(); // Create new communication channel, create new sniffer task, // send all the data to the new sniffer task with the send // end of the pipe, receive all the data. let sniffer_task = sniffer_task::new_sniffer_task(start_chan.clone()); let loader = match load_data.url.scheme.as_slice() { "file" => file_loader::factory, "http" | "https" => http_loader::factory, "data" => data_loader::factory, "about" => about_loader::factory, _ => { debug!("resource_task: no loader for scheme {:s}", load_data.url.scheme); start_sending(start_chan, Metadata::default(load_data.url)) .send(Done(Err("no loader for scheme".to_string()))); return } }; debug!("resource_task: loading url: {:s}", load_data.url.serialize()); loader(load_data, sniffer_task); }
- The function which gets called in the load method is
new_sniffer_task
. The sniffer task creates a channel and then callSnifferManager
which creates SnifferManager struct with channel information.SnifferManager
waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function. Below is the code snippet of the newsniffer_task.rs
file.
struct SnifferManager { data_receiver: Receiver<LoadResponse>, } impl SnifferManager { fn new(data_receiver: Receiver <LoadResponse>) -> SnifferManager { SnifferManager { data_receiver: data_receiver, } } } impl SnifferManager { fn start(&self, next_rx: Sender<LoadResponse>) { loop { self.load(next_rx.clone(), self.data_receiver.recv()); } } fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) { next_rx.send(snif_data); } }
Design Patterns
We used the command design pattern in our project. As Wikipedia states, command pattern is a design pattern that in which an "object is used to represent and encapsulate all the information needed to call a method at a later time". <ref name="commandDesignPatternWiki">http://en.wikipedia.org/wiki/Command_pattern</ref>.
In our specific case, the object is the Sniffer Task. A new Sniffer Task is created via the new_sniffer_task
function. When the sniffer task is created, it is isolated and it encapsulates all the information needed to sniff the MIME data out of the URI/file. Since it's isolated, the only way to communicate between the Sniffer Task and the main servo process is by using channels and using send
and receive
functions to pass data around. For example, in the load function in SnifferManager in sniffer_task.rs
, we use send
to send the data back to the next processing pipeline.
fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) { next_rx.send(snif_data); }
There are a few benefits that command design pattern provides. Firstly, since the code block is encapsulated, it is independent and can be modified and switched out easily. This makes maintaining the code very easy. Secondly, because the code blocks has all the resources it needs to perform it's task (in our case, sniffing the MIME), we can easily take it and run it in a different process parallel to the main servo process.
Sniffer Task Project Build
- After the code changes we Build servo project by running below command.
./mach build
- Tested the code by passing html webpage request.
./mach run tests/html/about-mozilla.html
Future Development Scope
Step 2
Move from a 1:1 sniffer:request task model to a shared sniffer task - create the sniffing task in ResourceTask::start, store the created sender in a field, and hand off clones of this sender to the factory function. The sniffer task will now also have to be sent the original sender as well, since it is not available at task creation.
Step 3
When the headers are received, implement the sniffing heuristic algorithm.
Step 4
Implement the mime type matching algorithm, one category at a time - look for a good way to store the patterns and resulting types to minimize code duplication.
Step 5
Ensure that the resulting mime type is present in the load that the consumer receives
Step 6
Write tests that demonstrate working sniffing (for example, loading images with no/wrong extension)
Appendix
Setting up Servo
There are two main steps to set up the environment for this project. Linux environment is preferred for setting up the environment as it is simple and easy.
References
<references/>