CSC/ECE 517 Fall 2014/oss M1456 kdv: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 7: Line 7:


== '''Background Information''' ==
== '''Background Information''' ==
When an HTTP resource declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file from disk, as in [https://github.com/servo/servo/pull/3131 #3131]). We want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received.
When we do a HTTP web request which declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file or video file from disk, as in [https://github.com/servo/servo/pull/3131 #3131]). In this case, we want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received. For example, If its an image we can use image rendering and a video rendering can be used for a video file.  


==='''What is Servo?'''===
==='''What is Servo?'''===
Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental project to build a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, we are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism.
Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental browser layout engine supported by Mozilla Research. It is being actively developed in the Rust programming language with the main purpose being to rethink what the modern browser layout engine should do, how it performs, and how well it renders pages.


Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation.
The use of the Rust programming language, discussed later, allows the engine to break down the problem of page layout into small isolated tasks that can be performed in parallel, for speed and efficiency, and isolated from each other. With the security features of the Rust programming language like memory safety, an important goal for the Servo project is to build on top of a solid, secure, and safe computing base


Github link to [https://github.com/servo/servo Servo] Project.
The best resource to learn more about Servo, and the Servo Project is the GitHub project page: [https://github.com/servo/servo Servo] Project.


==='''Rust Programming Language'''===  
==='''Rust Programming Language'''===  
Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a new programming language for developing reliable and efficient systems. It's designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety.
Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a research project supported by Mozilla Research. It is a systems programming language designed to take full advantage of modern hardware. What makes Rust special is its emphasis on security and parallelism. The language has syntax similar to C and C++, as can be seen in the snippet below.


Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory.
<pre>
fn main() {
    println!("hello, world");
}
</pre>


Github link to [https://github.com/rust-lang/rust Rust language].
The rust programming language supports object-orientation and the language's design has refined through the development of Servo, previously mentioned. As was said previously, it is the security features that set Rust apart. First of all, Rust is designed to be memory-safe, meaning that it does not allow null or dangling pointers. Second, Rust has an ownership system existing entirely at compile time adding safety without inhibiting run time efficiency. Third, the Rust programming language assumes that things are immutable unless specifically stated otherwise. As an example, below we create a variable x and store the integer value of 5 and attempt to reassign it the integer value of 10:
 
<pre>
let x = 5i;
x = 10i;
</pre>
 
This would result in an error at compile time because you are not allowed to reassign the values of immutable variables. The beauty of this is that variables are only mutable when the programmer specifically states that they should be.
 
The best resource for more about the Rust programming language is the documentation: [http://doc.rust-lang.org/guide.html Rust Documentation]<ref>http://doc.rust-lang.org/guide.html</ref> or the GitHub project page: [https://github.com/rust-lang/rust Rust language].


=='''Implement MIME sniffing'''==
=='''Implement MIME sniffing'''==


==='''Initial Step'''===
==='''Initial Step'''===
Build Servo. Learn about tasks in Rust, then spawn a new sniffer [http://doc.rust-lang.org/guide.html#tasks task] in the load method in resource_task.rs, creating a sender and receiver pair as well - the task should read all available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages should continue to load unchanged after rebuilding.
As part of initial step we first [https://github.com/servo/servo Build Servo]. Learned how to create tasks in Rust, then spawn a new sniffer [http://doc.rust-lang.org/guide.html#tasks task] in the load method in resource_task.rs, by creating a sender and receiver pair as well - this task would read all the available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages continued to load unchanged after rebuilding our code.
 
==='''Classes'''===
As part of OSS Servo project we worked on below Class files.
 
'''Modify'''
* components/net/resource_task.rs
 
'''Create'''
* components/net/sniffer_task.rs
 
'''Delete'''
* none


==='''Step 1'''===
==='''Step 1'''===
interpose a sniffer in every resource request in the load method in resouce_task.rs - make a new sender/receiver pair, spawn a task that waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function.
* For every resource request (Example: file, http, data or any kind of request) in load method, it would interpose a sniffer task which sniffs all the data, parse the headers if required and return it. Below is the code snippet of the modified <code> resouce_task.rs </code> file.
 
<pre>
    fn load(&self, load_data: LoadData, start_chan: Sender<LoadResponse>) {
        let mut load_data = load_data;
        load_data.headers.user_agent = self.user_agent.clone();
 
        // Create new communication channel, create new sniffer task,
        // send all the data to the new sniffer task with the send
        // end of the pipe, receive all the data.
 
        let sniffer_task = sniffer_task::new_sniffer_task(start_chan.clone());
 
        let loader = match load_data.url.scheme.as_slice() {
            "file" => file_loader::factory,
            "http" | "https" => http_loader::factory,
            "data" => data_loader::factory,
            "about" => about_loader::factory,
            _ => {
                debug!("resource_task: no loader for scheme {:s}", load_data.url.scheme);
                start_sending(start_chan, Metadata::default(load_data.url))
                    .send(Done(Err("no loader for scheme".to_string())));
                return
            }
        };
        debug!("resource_task: loading url: {:s}", load_data.url.serialize());
 
        loader(load_data, sniffer_task);
    }
</pre>
 
* The function which gets called in the load method is <code> new_sniffer_task</code>. The sniffer task creates a channel and then call <code> SnifferManager </code> which creates SnifferManager struct with channel information. <code> SnifferManager </code>  waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function. Below is the code snippet of the new <code>sniffer_task.rs</code> file.
 
<pre>
struct SnifferManager {
  data_receiver: Receiver<LoadResponse>,
}
 
impl SnifferManager {
  fn new(data_receiver: Receiver <LoadResponse>) -> SnifferManager {
    SnifferManager {
      data_receiver: data_receiver,
    }
  }
}
 
impl SnifferManager {
  fn start(&self, next_rx: Sender<LoadResponse>) {
    loop {
      self.load(next_rx.clone(), self.data_receiver.recv());
    }
  }
 
  fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
    next_rx.send(snif_data);
  }
}
 
</pre>
 
==Design Patterns==
 
We used the [http://en.wikipedia.org/wiki/Command_pattern command design pattern] in our project. As Wikipedia states, command pattern is a design pattern that in which an "object is used to represent and encapsulate all the information needed to call a method at a later time". <ref name="commandDesignPatternWiki">http://en.wikipedia.org/wiki/Command_pattern</ref>.
 
In our specific case, the object is the Sniffer Task. A new Sniffer Task is created via the <code> new_sniffer_task</code> function. When the sniffer task is created, it is isolated and it encapsulates all the information needed to sniff the MIME data out of the URI/file. Since it's isolated, the only way to communicate between the Sniffer Task and the main servo process is by using channels and using <code>send</code> and <code>receive</code> functions to pass data around. For example, in the load function in SnifferManager in <code>sniffer_task.rs</code>, we use <code>send</code> to send the data back to the next processing pipeline.
 
<pre>
fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
  next_rx.send(snif_data);
}
</pre>
 
There are a few benefits that command design pattern provides. Firstly, since the code block is encapsulated, it is independent and can be modified and switched out easily. This makes maintaining the code very easy. Secondly, because the code blocks has all the resources it needs to perform it's task (in our case, sniffing the MIME), we can easily take it and run it in a different process parallel to the main servo process.
 
 
==Sniffer Task Project Build==
 
* After the code changes we Build servo project by running below command.
 
<pre> ./mach build </pre>
 
* Tested the code by passing html webpage request.
 
<pre>./mach run tests/html/about-mozilla.html </pre>


==Conclusion==
As part of OSS Servo project we have created a new <code>sniffer_task.rs</code> and modified <code>resource_task.rs</code> load method. Then created a merge request [https://github.com/servo/servo/pull/3766 M1456-Implement MIME sniffing initial Step] and got feedback from Josh were we addressed the issues. Few open issues would be addressed as part of Final OSS project.


==Future Development==
==Future Development Scope==


==='''Step 2'''===
==='''Step 2'''===

Latest revision as of 17:24, 29 October 2014

Implement MIME sniffing - spec

This article is about the open source project M 1456: Implement MIME<ref>https://mimesniff.spec.whatwg.org/</ref> sniffing - spec <ref>https://github.com/servo/servo/issues/3144</ref>, which is part of the Servo Parallel Browser Project. Servo is prototype web browser engine which is currently under development and is written in Rust language. Here you can find a brief Background information about MIME sniffing in servo, rust programming language, the initial step and the first step implemented in the project.


Background Information

When we do a HTTP web request which declares the type of its content (using the Content-Type header), it's very easy to decide how to present it - if it's an image, decode it; if it's an HTML page, parse and display it. When no such header is present, Servo currently falters (such as loading an image file or video file from disk, as in #3131). In this case, we want to implement the "sniffing" specification, which looks at the starting byte content of the resource and guesses what manner of content is being received. For example, If its an image we can use image rendering and a video rendering can be used for a video file.

What is Servo?

Servo<ref>https://www.mozilla.org/en-US/research/projects/</ref> is an experimental browser layout engine supported by Mozilla Research. It is being actively developed in the Rust programming language with the main purpose being to rethink what the modern browser layout engine should do, how it performs, and how well it renders pages.

The use of the Rust programming language, discussed later, allows the engine to break down the problem of page layout into small isolated tasks that can be performed in parallel, for speed and efficiency, and isolated from each other. With the security features of the Rust programming language like memory safety, an important goal for the Servo project is to build on top of a solid, secure, and safe computing base

The best resource to learn more about Servo, and the Servo Project is the GitHub project page: Servo Project.

Rust Programming Language

Rust<ref>https://www.mozilla.org/en-US/research/projects/</ref> is a research project supported by Mozilla Research. It is a systems programming language designed to take full advantage of modern hardware. What makes Rust special is its emphasis on security and parallelism. The language has syntax similar to C and C++, as can be seen in the snippet below.

fn main() {
    println!("hello, world");
}

The rust programming language supports object-orientation and the language's design has refined through the development of Servo, previously mentioned. As was said previously, it is the security features that set Rust apart. First of all, Rust is designed to be memory-safe, meaning that it does not allow null or dangling pointers. Second, Rust has an ownership system existing entirely at compile time adding safety without inhibiting run time efficiency. Third, the Rust programming language assumes that things are immutable unless specifically stated otherwise. As an example, below we create a variable x and store the integer value of 5 and attempt to reassign it the integer value of 10:

let x = 5i;
x = 10i;

This would result in an error at compile time because you are not allowed to reassign the values of immutable variables. The beauty of this is that variables are only mutable when the programmer specifically states that they should be.

The best resource for more about the Rust programming language is the documentation: Rust Documentation<ref>http://doc.rust-lang.org/guide.html</ref> or the GitHub project page: Rust language.

Implement MIME sniffing

Initial Step

As part of initial step we first Build Servo. Learned how to create tasks in Rust, then spawn a new sniffer task in the load method in resource_task.rs, by creating a sender and receiver pair as well - this task would read all the available data from the new receiver, and send it immediately via the original sender, while the new sender gets handed off to the factory function that is executed. All web pages continued to load unchanged after rebuilding our code.

Classes

As part of OSS Servo project we worked on below Class files.

Modify

  • components/net/resource_task.rs

Create

  • components/net/sniffer_task.rs

Delete

  • none

Step 1

  • For every resource request (Example: file, http, data or any kind of request) in load method, it would interpose a sniffer task which sniffs all the data, parse the headers if required and return it. Below is the code snippet of the modified resouce_task.rs file.
    fn load(&self, load_data: LoadData, start_chan: Sender<LoadResponse>) {
        let mut load_data = load_data;
        load_data.headers.user_agent = self.user_agent.clone();

        // Create new communication channel, create new sniffer task,
        // send all the data to the new sniffer task with the send
        // end of the pipe, receive all the data.

        let sniffer_task = sniffer_task::new_sniffer_task(start_chan.clone());

        let loader = match load_data.url.scheme.as_slice() {
            "file" => file_loader::factory,
            "http" | "https" => http_loader::factory,
            "data" => data_loader::factory,
            "about" => about_loader::factory,
            _ => {
                debug!("resource_task: no loader for scheme {:s}", load_data.url.scheme);
                start_sending(start_chan, Metadata::default(load_data.url))
                    .send(Done(Err("no loader for scheme".to_string())));
                return
            }
        };
        debug!("resource_task: loading url: {:s}", load_data.url.serialize());

        loader(load_data, sniffer_task);
    }
  • The function which gets called in the load method is new_sniffer_task. The sniffer task creates a channel and then call SnifferManager which creates SnifferManager struct with channel information. SnifferManager waits for input on the new receiver and passes it on to the old sender, while handing off the new sender to the factory function. Below is the code snippet of the new sniffer_task.rs file.
struct SnifferManager {
  data_receiver: Receiver<LoadResponse>,
}

impl SnifferManager {
  fn new(data_receiver: Receiver <LoadResponse>) -> SnifferManager {
    SnifferManager {
      data_receiver: data_receiver,
    }
  }
}

impl SnifferManager {
  fn start(&self, next_rx: Sender<LoadResponse>) {
    loop {
      self.load(next_rx.clone(), self.data_receiver.recv());
    }
  }

  fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
    next_rx.send(snif_data);
  }
}

Design Patterns

We used the command design pattern in our project. As Wikipedia states, command pattern is a design pattern that in which an "object is used to represent and encapsulate all the information needed to call a method at a later time". <ref name="commandDesignPatternWiki">http://en.wikipedia.org/wiki/Command_pattern</ref>.

In our specific case, the object is the Sniffer Task. A new Sniffer Task is created via the new_sniffer_task function. When the sniffer task is created, it is isolated and it encapsulates all the information needed to sniff the MIME data out of the URI/file. Since it's isolated, the only way to communicate between the Sniffer Task and the main servo process is by using channels and using send and receive functions to pass data around. For example, in the load function in SnifferManager in sniffer_task.rs, we use send to send the data back to the next processing pipeline.

fn load(&self, next_rx: Sender<LoadResponse>, snif_data: LoadResponse) {
  next_rx.send(snif_data);
}

There are a few benefits that command design pattern provides. Firstly, since the code block is encapsulated, it is independent and can be modified and switched out easily. This makes maintaining the code very easy. Secondly, because the code blocks has all the resources it needs to perform it's task (in our case, sniffing the MIME), we can easily take it and run it in a different process parallel to the main servo process.


Sniffer Task Project Build

  • After the code changes we Build servo project by running below command.
 ./mach build 
  • Tested the code by passing html webpage request.
./mach run tests/html/about-mozilla.html 

Conclusion

As part of OSS Servo project we have created a new sniffer_task.rs and modified resource_task.rs load method. Then created a merge request M1456-Implement MIME sniffing initial Step and got feedback from Josh were we addressed the issues. Few open issues would be addressed as part of Final OSS project.

Future Development Scope

Step 2

Move from a 1:1 sniffer:request task model to a shared sniffer task - create the sniffing task in ResourceTask::start, store the created sender in a field, and hand off clones of this sender to the factory function. The sniffer task will now also have to be sent the original sender as well, since it is not available at task creation.

Step 3

When the headers are received, implement the sniffing heuristic algorithm.

Step 4

Implement the mime type matching algorithm, one category at a time - look for a good way to store the patterns and resulting types to minimize code duplication.

Step 5

Ensure that the resulting mime type is present in the load that the consumer receives

Step 6

Write tests that demonstrate working sniffing (for example, loading images with no/wrong extension)

Appendix

Setting up Servo

There are two main steps to set up the environment for this project. Linux environment is preferred for setting up the environment as it is simple and easy.

References

<references/>