CSC/ECE 517 Fall 2014/oss M1456 kdv

From Expertiza_Wiki
Jump to navigation Jump to search

Implement MIME sniffing - spec

This article is about the open source project M 1456: Implement MIME<ref>https://mimesniff.spec.whatwg.org/</ref> sniffing - spec <ref>https://github.com/servo/servo/issues/3144</ref>, which is part of the Servo Parallel Browser Project. Servo is prototype web browser engine which is currently under development and is written in Rust language. Here you can find a brief Background information about MIME sniffing in servo, rust programming language, the initial step and the first step implemented in the project.


Background Information

When we do a HTTP web request which declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file or video file from disk, as in #3131). In this case, we want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received. For example, If its an image we can use image rendering and a video rendering can be used for a video file.

What is Servo?

Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental browser layout engine supported by Mozilla Research. It is being actively developed in the Rust programming language with the main purpose being to rethink what the modern browser layout engine should do, how it performs, and how well it renders pages.

The use of the Rust programming language, discussed later, allows the engine to break down the problem of page layout into small isolated tasks that can be performed in parallel, for speed and efficiency, and isolated from each other. With the security features of the Rust programming language like memory safety, an important goal for the Servo project is to build on top of a solid, secure, and safe computing base

The best resource to learn more about Servo, and the Servo Project is the GitHub project page: Servo Project.

Rust Programming Language

Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a research project supported by Mozilla Research. It is a systems programming language designed to take full advantage of modern hardware. What makes Rust special is its emphasis on security and parallelism. The language has syntax similar to C and C++, as can be seen in the snippet below.

fn main() {
    println!("hello, world");
}

The rust programming language supports object-orientation and the language's design has refined through the development of Servo, previously mentioned. As was said previously, it is the security features that set Rust apart. First of all, Rust is designed to be memory-safe, meaning that it does not allow null or dangling pointers. Second, Rust has an ownership system existing entirely at compile time adding safety without inhibiting run time efficiency. Third, the Rust programming language assumes that things are immutable unless specifically stated otherwise. As an example, below we create a variable x and store the integer value of 5 and attempt to reassign it the integer value of 10:

let x = 5i;
x = 10i;

This would result in an error at compile time because you are not allowed to reassign the values of immutable variables. The beauty of this is that variables are only mutable when the programmer specifically states that they should be.

The best resource for more about the Rust programming language is the documentation: Rust Documentation<ref>http://doc.rust-lang.org/guide.html</ref> or the GitHub project page: Rust language.

Implement MIME sniffing

Initial Step

As part of initial step we first Build Servo. Learned how to create tasks in Rust, then spawn a new sniffer task in the load method in resource_task.rs, by creating a sender and receiver pair as well - this task would read all the available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages continued to load unchanged after rebuilding our code.

Classes

As part of OSS Servo project we worked on below Class files.

Modify

  • components/net/resource_task.rs

Create

  • components/net/sniffer_task.rs

Delete

  • none

Step 1

  • For every resource request (Example: file, http, data or any kind of request) in load method, it would interpose a sniffer task which sniffs all the data, parse the headers if required and return it. Below is the code snippet of the modified resouce_task.rs file.
    fn load(&self, load_data: LoadData, start_chan: Sender<LoadResponse>) {
        let mut load_data = load_data;
        load_data.headers.user_agent = self.user_agent.clone();

        // Create new communication channel, create new sniffer task,
        // send all the data to the new sniffer task with the send
        // end of the pipe, receive all the data.

        let sniffer_task = sniffer_task::new_sniffer_task(start_chan.clone());

        let loader = match load_data.url.scheme.as_slice() {
            "file" => file_loader::factory,
            "http" | "https" => http_loader::factory,
            "data" => data_loader::factory,
            "about" => about_loader::factory,
            _ => {
                debug!("resource_task: no loader for scheme {:s}", load_data.url.scheme);
                start_sending(start_chan, Metadata::default(load_data.url))
                    .send(Done(Err("no loader for scheme".to_string())));
                return
            }
        };
        debug!("resource_task: loading url: {:s}", load_data.url.serialize());

        loader(load_data, sniffer_task);
    }
  • The function which gets called in the load method is new_sniffer_task. The sniffer task creates a channel and then call SnifferManager which creates SnifferManager struct with channel information. SnifferManager waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function. Below is the code snippet of the new sniffer_task.rs file.
struct SnifferManager {
  data_receiver: Receiver<LoadResponse>,
}

impl SnifferManager {
  fn new(data_receiver: Receiver <LoadResponse>) -> SnifferManager {
    SnifferManager {
      data_receiver: data_receiver,
    }
  }
}

impl SnifferManager {
  fn start(&self, next_rx: Sender<LoadResponse>) {
    loop {
      self.load(next_rx.clone(), self.data_receiver.recv());
    }
  }

  fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
    next_rx.send(snif_data);
  }
}

Design Patterns

We used the command design pattern in our project. As Wikipedia states, command pattern is a design pattern that in which an "object is used to represent and encapsulate all the information needed to call a method at a later time". <ref name="commandDesignPatternWiki">http://en.wikipedia.org/wiki/Command_pattern</ref>.

In our specific case, the object is the Sniffer Task. A new Sniffer Task is created via the new_sniffer_task function. When the sniffer task is created, it is isolated and it encapsulates all the information needed to sniff the MIME data out of the URI/file. Since it's isolated, the only way to communicate between the Sniffer Task and the main servo process is by using channels and using send and receive functions to pass data around. For example, in the load function in SnifferManager in sniffer_task.rs, we use send to send the data back to the next processing pipeline.

fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
  next_rx.send(snif_data);
}

There are a few benefits that command design pattern provides. Firstly, since the code block is encapsulated, it is independent and can be modified and switched out easily. This makes maintaining the code very easy. Secondly, because the code blocks has all the resources it needs to perform it's task (in our case, sniffing the MIME), we can easily take it and run it in a different process parallel to the main servo process.


Sniffer Task Project Build

  • After the code changes we Build servo project by running below command.
 ./mach build 
  • Tested the code by passing html webpage request.
./mach run tests/html/about-mozilla.html 

Conclusion

As part of OSS Servo project we have created a new sniffer_task.rs and modified resource_task.rs load method. Then created a merge request M1456-Implement MIME sniffing initial Step and got feedback from Josh were we addressed the issues. Few open issues would be addressed as part of Final OSS project.

Future Development Scope

Step 2

Move from a 1:1 sniffer:request task model to a shared sniffer task - create the sniffing task in ResourceTask::start, store the created sender in a field, and hand off clones of this sender to the factory function. The sniffer task will now also have to be sent the original sender as well, since it is not available at task creation.

Step 3

When the headers are received, implement the sniffing heuristic algorithm.

Step 4

Implement the mime type matching algorithm, one category at a time - look for a good way to store the patterns and resulting types to minimize code duplication.

Step 5

Ensure that the resulting mime type is present in the load that the consumer receives

Step 6

Write tests that demonstrate working sniffing (for example, loading images with no/wrong extension)

Appendix

Setting up Servo

There are two main steps to set up the environment for this project. Linux environment is preferred for setting up the environment as it is simple and easy.

References

<references/>