CSC/ECE 517 Fall 2014/oss M1456 kdv
Implement MIME sniffing - spec
This article is about the open source project M 1456: Implement MIME<ref>https://mimesniff.spec.whatwg.org/</ref> sniffing - spec <ref>https://github.com/servo/servo/issues/3144</ref>, which is part of the Servo Parallel Browser Project. Servo is prototype web browser engine which is currently under development and is written in Rust language. Here you can find a brief Background information about MIME sniffing in servo, rust programming language, the initial step and the first step implemented in the project.
Background Information
When an HTTP resource declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file from disk, as in #3131). We want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received.
What is Servo?
Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental project to build a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, we are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.
Github link to Servo Project.
Rust Programming Language
Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a new programming language for developing reliable and efficient systems. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety.
Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory.
Github link to Rust language.
Implement MIME sniffing
Initial Step
Build Servo. Learn about tasks in Rust, then spawn a new sniffer task in the load method in resource_task.rs, creating a sender and receiver pair as well - the task should read all available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages should continue to load unchanged after rebuilding.
Step 1
interpose a sniffer in every resource request in the load method in resouce_task.rs - make a new sender/receiver pair, spawn a task that waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function.
Design Patterns
We used the command design pattern in our project. As Wikipedia states, command pattern is a design pattern that in which an "object is used to represent and encapsulate all the information needed to call a method at a later time". <ref name="commandDesignPatternWiki">http://en.wikipedia.org/wiki/Command_pattern</ref>.
In our specific case, the object is the Sniffer Task. A new Sniffer Task is created via the new_sniffer_task
function. When the sniffer task is created, it is isolated and it encapsulates all the information needed to sniff the MIME data out of the URI/file. Since it's isolated, the only way to communicate between the Sniffer Task and the main servo process is by using channels and using send
and receive
functions to pass data around. For example, in the load function in SnifferManager in sniffer_task.rs
, we use send
to send the data back to the next processing pipeline.
fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) { next_rx.send(snif_data); }
There are a few benefits that command design pattern provides. Firstly, since the code block is encapsulated, it is independent and can be modified and switched out easily. This makes maintaining the code very easy. Secondly, because the code blocks has all the resources it needs to perform it's task (in our case, sniffing the MIME), we can easily take it and run it in a different process parallel to the main servo process.
Future Development
Step 2
Move from a 1:1 sniffer:request task model to a shared sniffer task - create the sniffing task in ResourceTask::start, store the created sender in a field, and hand off clones of this sender to the factory function. The sniffer task will now also have to be sent the original sender as well, since it is not available at task creation.
Step 3
When the headers are received, implement the sniffing heuristic algorithm.
Step 4
Implement the mime type matching algorithm, one category at a time - look for a good way to store the patterns and resulting types to minimize code duplication.
Step 5
Ensure that the resulting mime type is present in the load that the consumer receives
Step 6
Write tests that demonstrate working sniffing (for example, loading images with no/wrong extension)
Appendix
Setting up Servo
There are two main steps to set up the environment for this project. Linux environment is preferred for setting up the environment as it is simple and easy.
References
<references/>