CSC/ECE 517 Fall 2019 - M1952. Missing DOM features project

From Expertiza_Wiki
Jump to navigation Jump to search

Servo is a modern, high-performance browser engine designed for both application and embedded use. The current version of Servo has a couple of issues. The first issue is the absence of the capability to parse the srcdoc attribute in an iframe tag in the HTML code. The second issue is that Servo does not have a named getter implemented in HTMLFormElement to reference the form elements by their id. The goal of this project is to implement these two functionalities in the current version of Servo.

Introduction

Servo

Servo is an experimental browser engine that seeks to create a highly parallel environment, in which components such as rendering, layout, HTML parsing, image decoding, etc. are handled by fine-grained, isolated tasks. It leverages the memory safety properties and concurrency features of the Rust programming language.

Rust

Rust is a multi-paradigm systems programming language primarily developed focused making the browser safe and concurrently operable. Rust has been the "most loved programming language" in the Stack Overflow Developer Survey every year since 2016.

DOM

DOM, short for Document Object Model, is an interface and a way to how programs treat web pages. It parses the web pages in a structured order so that programs can read and manipulate the web page's content, structure, and style. When an HTML page is parsed, the programs build, what is called, a DOM tree and this lists all the HTML tags as nodes while maintaining the scope under which these tags might be defined.

bitsofcode has an excellent read on the basics of DOM and here is a quick snapshot from the same: [Left - HTML page content; Right - DOM tree]


<---------------->

Note to Reviewers

You would find that our code doesn't contain many comments. The current maintainer for the project advised us to remove comments which only read the code further and hence, to follow Servo's formatting and style guidelines, we removed these comments.

Setup

We need to compile and build Servo on our local machines to work on the code and check whether the tests pass. Servo's GitHub page has an excellent starting guide to set up the environment for Servo here. It also mentions the other dependencies that need to be installed specific to an operating system.

Final Project

Problem Statement

We are working on the subsequent steps listed on the project page which is the named getter issue. Currently, Servo is unable to submit forms on web pages since it is not able to fetch the form elements by their ID. Now, in terms of DOM and HTML, the HTMLFormElement is the interface to the <form> tag in HTML. Hence, we need to implement the named element getter function in HTMLFormElement files.

Scope

The named getter issue has been worked upon as the subsequent steps.

  • Uncomment the named getter from HTMLFormElement.webidl file.
  • The previous step yields two new methods that need to be implemented in htmlformelement.rs - SupportedPropertyNames() and NamedGetter().
  • SupportedPropertyNames() is used to get the list of all meaningful property names for a HTMLFormElement object.
  • NamedGetter() gets the value of a specific property name.
  • Both NamedGetter() and SupportedPropertyNames() are expected to read from the past names map, but only NamedGetter() is expected to modify it.
  • Implement a HashMap from a DOMString (which holds an element's id/name value) to Dom<Element>. This Hashmap will be used in NamedGetter() to fetch the form node which has the given value in its name attribute. We will also update this Hashmap when we find a new node with this name. This Hashmap will also be used in SupportedPropertyNames() to extract the names of each form HTML element.

Design Patterns

Design patterns are not applicable as our task involves the implementation of methods and modifying various files. However, the Implementation section below provides details of why certain steps were implemented the way they were.

Flowchart

Given below is a high-level flowchart for the proposed solution:

Implementation

We have worked on the subsequent steps mentioned on the project page here.

Step 1: Uncomment the named getter from HTMLFormElement.webidl

The NamedGetter method was already declared. We uncommented those lines in the file HTMLFormElement.webidl. The function has the element's name as the attribute which is of type DOMString which stores a Rust String. It returns a RadioNodeList or Element based on the type of node returned.


Step 2: Add the missing NamedGetter and SupportedPropertyNames methods to htmlformelement.rs

We added the method definition for NamedGetter() based on the line uncommented in the previous step. Option<RadioNodeListOrElement> is Servo-specific syntax of returning either a RadioNodeList or Element from the function.

We added the SupportedPropertyNames() method definition. It returns a vector of element names which are of type DOMString as in Rust.

Step 3: Implement SupportedPropertyNames() per the specification

(1) We define an Enum SourcedNameSource of sources from where elements are inserted into the vector. If the element entry is made when we find the id attribute, then the source will be selected as Id and similarly for Name as well. The past option is used when the entry is fetched from past names map.

Per the specification, we need to maintain an ordered list of tuples called sourced names (string, element, source(duration)). We incorporate this by maintaining a vector of struct called SourcedName which stores name, element and its source.


(2) We loop over the child elements of form by calling controls.iter(). We first insert entries into the vector for children that are listed elements (non-image elements). We check whether that the child contains the id attribute and name attribute and insert the entry by defining a new structure object and pushing it to the vector. The value of id and name are fetched using get_string_attribute() function.


(3) We repeat the same process as mentioned in (2) but now for the image elements in the form.


(4) We borrow a reference to the past names map and iterate over the hashmap. We push the entry of past names map into the sourcedNamesVec by defining a new structure with key value as string, HTML element as element and Past as source with duration calculated.


(5) We sort the sourced names vector by comparing the element in tree order by using the sort_by() function of vector and the cmp() method in PartialOrd trait in Servo. We are able to sort entries with same element in order of id, name and at the end, put older entries before when the source is Past.


(6) As per the spec, we remove the elements which have the empty string as their name from the sourcedNamesVec. This is implemented by doing the inverse operation: retain only those elements which don't have empty string as their name.


(7-8) We remove the entries in sourcedNamesVec that have the same name as an earlier entry in the map. We return just the vector of element names. Since our sourcedNamesVec consists of the structure, we just extract the element names from the structure vector and push it to a new vector which stores the DOMStrings.

Step 4: Implement NamedGetter() function per the specification

(1) We need candidates to be a live RadioNodeList. However, operations are better defined for a vector and hence we define a vector that stores the Node itself and will convert it to a RadioNodeList when we return from the function.


We iterate over the form children by borrowing the controls member and check if the child is a listed element (non-image element). If yes, we check whether the child has an id attribute or a name attribute equal to name passed as parameter. If yes, we push this child to the candidates vector.


(2) If the vector candidates is empty, we repeat the same thing as we did in step 1 but now for image elements.


(3) If the candidates vector is empty, we infer that the element we are seeking is not in the current form DOM tree and hence, we return the element associated with name from past names map by formatting it as a Element.


(4) If candidates vector contains more than 1 node, we return the candidates itself by formatting it as a RadioNodeList.


(5) At this point, candidates has exactly one node. We insert the (name, element) pair into past names map and update the entry with the same name if it exists.


(6) We return the single node in candidates by formatting it as a Element.

Test Plan

The tests for named getter issue have already been written. We need to check whether the modifications we make to the code can still pass these tests.

The tests will be run using the mach utility commands:

./mach test-wpt tests/wpt/web-platform-tests/html/semantics/forms/the-form-element/form-elements-nameditem-01.html
./mach test-wpt tests/wpt/web-platform-tests/html/semantics/forms/the-form-element/form-elements-nameditem-02.html
./mach test-wpt tests/wpt/web-platform-tests/html/semantics/forms/the-form-element/form-nameditem.html

Before our implementation, 6/17 tests were passing. While, After our implementation, 11/17 tests are passing:

Results - Before:


Results - After:

To test whether the code change works, follow the steps as outlined.

  1. Install the pre-requisites required for servo as mentioned here
  2. Clone our GitHub repo: git clone https://github.com/cagandhi/servo
  3. Navigate to servo's directory: cd servo
  4. Checkout the git branch iframe-srcdoc: git checkout named-form-getter
  5. Check if code follows style guidelines: ./mach test-tidy
  6. Check if code has no compilation errors: ./mach check
  7. Check if servo is built successfully: ./mach build --dev --verbose
  8. Check if tests pass, i.e. servo can process named form getter: ./mach test-wpt tests/wpt/web-platform-tests/html/semantics/forms/the-form-element/form-elements-nameditem-01.html

You will see that the servo build is successful but currently, the tests might fail.

Pull Request

Here is the link to our pull request. We have attached the code snippets for the changes made in files in the PR. The pull request has been merged into the master branch of Servo.

OSS Project

Problem Statement

We have worked on the initial steps of the project page which is the srcdoc iframe issue. In HTML, there is a tag called <iframe> which allows you to embed a web page into another web page. This attribute has attributes like src and srcdoc which can be used to embed web pages. However, the uses of both attributes are different.

To embed a web page using src attribute, we need to provide a URL of the web page to be embedded. This works in Servo.

To embed a web page using srcdoc attribute, all we need to provide is just HTML content and it works even without adding <html> and <body> tags. This does not work in Servo. We have worked upon this issue for our OSS project.

Scope

The srcdoc iframe issue is to be done as initial steps.

  • Uncomment the srcdoc WebIDL attribute and implement the attribute getter.
  • Add a field to structure LoadData for storing the srcdoc contents when loading a srcdoc iframe.
  • Add a new method to script_thread.rs which loads the special about:srcdoc URL per the specification.
  • Call this new method from handle_new_layout when it's detected that a srcdoc iframe is being loaded.
  • In process_the_iframe_attributes, implement the srcdoc specification so that LoadData initiates a srcdoc load.
  • In attribute_mutated, ensure that changing the srcdoc attribute of an iframe element follows the specification.

Design Patterns

Design patterns are not applicable as our task involves the implementation of methods and modifying various files. However, the Implementation section below provides details of why certain steps were implemented the way they were.

Implementation

We have worked on the initial steps mentioned on the project page here.

Step 1: Uncomment srcdoc WebIDL attribute and implement the attribute getter

The srcdoc attribute was already declared. We simply uncommented those lines in the file HTMLIFrameElement.webidl.

We implemented the attribute getter in the file htmliframeelement.rs. It basically defines a new Element which stores the srcdoc String in its attribute and its value is returned by the getter. The lack of a semi-colon in the last line of a Rust function denotes that the value of the variable be returned from the function.

Since this attribute getter is called only at one place in the entire codebase in process_the_iframe_attributes() function, it was suggested to us that we make the function inline and we did the change in lines 245, 246 in our latest commit.

Step 2: Add a field to LoadData for storing the srcdoc contents when loading a srcdoc iframe

We added a public field srcdoc of String type in the line 170 in file lib.rs. We declared srcdoc of type DOMString in the webidl file and we are mapping the same field in the rust file. The data type DOMString is inherently a Rust String as can be seen here.

Step 3: Add a new method to script_thread.rs which loads the special about:srcdoc URL per the specification

We defined a method page_load_about_srcdoc which is based on the method start_page_load_about_blank in the file script_thread.rs and handles the loading of iframe tag with srcdoc property.

Effectively, we parse the about:srcdoc URL and set the URL in the context of the response which we load. Modern web browsers send responses in chunks and this is why we send the srcdoc content (an HTML string) in the chunk of the response.

Step 4: Call this new method from handle_new_layout when it's detected that a srcdoc iframe is being loaded

We already defined the method page_load_about_srcdoc in the above step. This function handle_new_layout is responsible for loading new data and redirecting the navigation to the relevant function based on the URL. If the structure LoadData has about:srcdoc in its url parameter, we pass in the new load and srcdoc string stored in LoadData.


Step 5: In process_the_iframe_attributes, implement the srcdoc specification so that LoadData initiates a srcdoc load

We added the processing of srcdoc specification in process_the_iframe_attributes() function in this file htmliframeelement.rs by referring the specification and with help from Josh.

We first check if the HTML element has the srcdoc attribute or not. In our case, we are processing the iframe HTML element and so self.upcast::<Element>() returns the iframe element's ID. We fetch the document to be shown on the window and store the ID of the incomplete process which we are currently executing. This is required since the browser processes are highly parallel. Next, we define a new LoadData instance and set its srcdoc property to that fetched by the attribute getter we implemented in Step 1. We then set the browsing context with the new attribute values.

Step 6: In attribute_mutated, ensure that changing the srcdoc attribute of an iframe element follows the specification

We added a code to fire the process_the_iframe_attributes method when srcdoc attribute of an iframe element is changed in the file htmliframeelement.rs.


Test Plan

To test if the engine is able to process iframe tag with srcdoc with the command, run: ./mach test-wpt tests/wpt/web-platform-tests/html/semantics/embedded-content/the-iframe-element/srcdoc_process_attributes.html.

The result of the test is:

We have successfully completed all the initial steps and the tests pass. Our pull request has been merged into the Servo repo.

Pull Request

Here is the link to our pull request. We have attached the code snippets for the changes made in files in the PR. This issue is now solved and our code has been merged into the master branch of Servo.

References

[1] https://servo.org/
[2] https://bocoup.com/blog/third-party-javascript-development-future#iframe-srcdoc
[3] https://www.w3schools.com/tags/tag_iframe.asp
[4] https://html.spec.whatwg.org/multipage/forms.html#dom-form-nameditem
[5] https://en.wikipedia.org/wiki/Servo_(software)
[6] https://en.wikipedia.org/wiki/Rust_(programming_language)
[7] https://github.com/servo/servo/blob/master/README.md#setting-up-your-environment
[8] https://bitsofco.de/what-exactly-is-the-dom/
[9] https://developer.mozilla.org/en-US/docs/Web/API/HTMLFormElement

[10] https://github.com/servo/servo/wiki/Missing-DOM-features-project
[11] https://github.com/servo/servo/issues/16479
[12] https://github.com/servo/servo/issues/4767
[13] https://github.com/servo/servo/pull/24576
[14] https://github.com/servo/servo/pull/25070