CSC/ECE 517 Fall 2017/M1752 Implement the Microdata API: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(98 intermediate revisions by 2 users not shown)
Line 3: Line 3:
==='''HTML Specification'''===
==='''HTML Specification'''===
The WHATWG Microdata HTML specification allows web data to be enriched in that it allows machines to learn more about the data in a web page.  
The WHATWG Microdata HTML specification allows web data to be enriched in that it allows machines to learn more about the data in a web page.  
A typical example of real-world using of Microdata is illustrated [https://code.tutsplus.com/tutorials/html5-microdata-welcome-to-the-machine--net-12356 here]
A typical example of real-world use of Microdata is illustrated below


The attributes ‘itemtype’ and ‘itemprop’ are in scope of this project.
Here is a simple HTML block that has some information about a student.  


More information about the Microdata specification is available [https://html.spec.whatwg.org/multipage/microdata.html here]
<big><pre>
  My name is <span>Grad Student</span>, and I am a <span>student</span> at <span>NC State</span>
  I live in  <span><span>Raleigh</span>,<span>NC</span></span>
</pre></big>


Some popular websites like Google, Skype and Microsoft use the Microdata. The number of websites that use Microdata is growing; currently about 13% of websites use Microdata (statistics courtesy w3techs.com)
If a machine (web parses etc) were to read this block as it is, it would not be able to directly interpret what part of the sentence is a Name or an Address.
 
This is where Microdata shines. It defines attributes to different parts of the HTML block. Below is the same information with Microdata -
 
<big>
<pre>
<div itemscope itemtype="http://data-vocabulary.org/Person">
  My name is <span itemprop="name">Grad Student</span>, and I am a <span itemprop="title">student</span> at
  <a href="http://ncsu.edu" itemprop="affliation">NC State</a>.
  I live in <span itemprop="address" itemtype="http://data-vocabulary.org/Address"><span itemprop="locality">Raleigh</span>,<span itemprop="region">NC</span>
  </span>
</div>
</pre></big>
 
As it is clear, the attributes itemprop and itemtype are used to enrich data: the value '''title''' has been assigned to the word '''student''', the value '''locality''' has been assigned to the state, '''NC'''.
This way any machine that accesses this HTML can understand the content better. More information about the Microdata specification is available [https://html.spec.whatwg.org/multipage/microdata.html here]. Some popular websites like Google, Skype and Microsoft use the Microdata from websites to provide additional insights. The number of websites that use Microdata is growing; currently about 13% of websites use Microdata (statistics courtesy w3techs.com).It should also be noted that the presence of Microdata does not change how the HTML block looks.


==='''Servo'''===
==='''Servo'''===
Line 17: Line 35:
==='''Rust'''===
==='''Rust'''===
Rust is a systems programming language focuses on memory safety and concurrency. It is similar to C++ but ensures memory safetely and high performance.  
Rust is a systems programming language focuses on memory safety and concurrency. It is similar to C++ but ensures memory safetely and high performance.  
More information about the Rust programming language is available [https://servo.org here]
More information about the Rust programming language is available [https://servo.org here]
=='''Scope'''==
=='''Scope'''==
The scope of this project is to implement initial support for Microdata API specification by allowing the Servo engine to read Microdata API tags from web pages and interpret them in the DOM. This should lay a groundwork for future improvements to implement features to created vCard and JSON data from Microdata on the ServoShell.  
The scope of this project is to implement initial support for Microdata API specification by allowing the Servo browser engine to read Microdata API tags from web pages and interpret them in the DOM. This should lay a groundwork for future improvements to implement features to created vCard and JSON data from Microdata on the ServoShell.  
Additional project information is available [https://github.com/servo/servo/wiki/Microdata-project here]
Additional project information is available [https://github.com/servo/servo/wiki/Microdata-project here]
 
=='''Design'''==
=='''Design'''==
As for the initial stage of this project, the scope did not require any major changes to the engine design. We implemented a DOM method to handle appropriate attributes in the Microdata API. The below diagram shows an overview of components involved in the design. The highlighted blocks have been modified.
As for the initial stage of this project, the scope did not require any major changes to the engine design. We implemented a DOM method to handle appropriate attributes in the Microdata API. The below diagram shows an overview of components involved in the design. The highlighted blocks have been modified.


[[File:Design.png]]
[[File:Design.png]]
=='''Build'''==
We have opened our pull request and are working on getting it merged based on the reviews received. Please use our forked repository till the pull request is merged.
Forked repository: https://github.com/CJ8664/servo
Clone command using git:
<code>git clone https://github.com/CJ8664/servo.git</code>
Once you have the forked repo, please follow steps [https://github.com/servo/servo/blob/master/README.md#building here] to do a build. Note that build may take up to 30 minutes, based on your system configuration. You can build on [https://janitor.technology/ Janitor] to reduce the build time
Steps forJanitor:
1. Create a servo container
2. Create a new directory, and clone our repository as mentioned abov
3. Follow build steps


=='''Implementation'''==
=='''Implementation'''==
The implementation involved updates to the Web Interface Definition Language (webidl) files and its Rust implementation.
The implementation involved updates to the Web Interface Definition Language (webidl) files and its Rust implementation.
   
   
===='''HTMLElement.WebIDL'''====
===='''HTMLElement.WebIDL'''====  
<big><code>/servo/components/script/dom/webidls/HTMLElement.WebIDLs</code></big>
 
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 57: Line 65:
|}
|}
   
   
===='''htmlelement.rs'''====
===='''htmlelement.rs''' ====  
 
<big><code>/servo/components/script/dom/htmlelement.rs</code></big>
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 78: Line 86:
|}
|}


==='''Configuration'''===
===='''Configuration'''====
The preference <code>[Pref="dom.microdata.testing.enabled"]</code> was added to resources.pref toggle the experimental microdata methods during development.
The preference <big><code>[Pref="dom.microdata.testing.enabled"]</code></big> was added to '''resources.pref''' preferences list to toggle the experimental microdata methods during development.
 
==='''Testing'''===
=='''Build'''==
testharness is used as the testing framework. Tests are located in the directory /tests/wpt/mozilla/tests/mozilla/microdata/  
 
<big>'''6/11 - The [https://github.com/servo/servo/pull/19038 Pull Request] was closed and the changes were merged upstream.'''</big>
 
3/11 - Due to 3rd party service issue (Appveyor) our build is facing issues with the continuous integration. We are working with the Servo team to resolve this.
 
3/11- Members of the servo project informed us that Appveyor build status does not determine the success of a pull request. We were advised to run a bors-servo build and this succeeded. Therefore all build checks necessary for PR to pass have been completed successfully. We are waiting changes to be merged to upstream.
 
[[File:Issue.PNG]]
 
 
We have opened our pull request and are working on getting it merged based on the reviews received. Please use our forked repository till the pull request is merged.
 
Forked repository: <big><code>https://github.com/CJ8664/servo</code></big>
 
Clone command using git:
<big><code>git clone https://github.com/CJ8664/servo.git</code></big>
 
Once you have the forked repo, please follow steps [https://github.com/servo/servo/blob/master/README.md#building here] to do a build.
 
Note that build may take up to 30 minutes, based on your system configuration. You can build on [https://janitor.technology/ Janitor] to reduce the build time.
 
==='''Building on the cloud'''===
 
This is the simplest and the fastest way to deploy and test an instance of servo. No configuration is required on your machine.
 
1. Go to <big><code>http://janitor.technology</code></big>
 
2. Click on '''New Container''' for Servo
 
3. Enter your email address to gain access to your container
 
4. Once logged in, go '''Containers''' on the top right.
 
5. You will now see a container - Click on the '''IDE''' button to open your online IDE environment.
 
6. Change directory to '''/home/user''' and create a new directory, say '''servo_test'''
 
7. Go to this new directory and clone our repository as mentioned above
 
8. Upon cloning you should see a /servo directory within 'servo_test'
 
9. Go to /servo
 
10. It is now time to build - run the following command:
 
<big><code>./mach build --dev</code></big>
 
If all goes well you will see a success message - 'Build Completed'
 
==='''Building locally'''===
 
Local build instructions for Windows environments are given below -
 
1. Install Python for Windows (https://www.python.org/downloads/release/python-2714/).
 
The Windows x86-64 MSI installer is fine. You should change the installation to install the "Add python.exe to Path" feature.
 
2. Install virtualenv.
 
In a normal Windows Shell (cmd.exe or "Command Prompt" from the start menu), do:
 
<big><code>pip install virtualenv</code></big>
 
If this does not work, you may need to reboot for the changed PATH settings (by the python installer) to take effect.
 
3. Install Git for Windows (https://git-scm.com/download/win). DO allow it to add git.exe to the PATH (default settings for the installer are fine).
 
4. Install Visual Studio Community 2017 (https://www.visualstudio.com/vs/community/).
 
You MUST add "Visual C++" to the list of installed components. It is not on by default. Visual Studio 2017 MUST installed to the default location or mach.bat will not find it.
 
If you encountered errors with the environment above, do the following for a workaround:
 
Download and install Build Tools for Visual Studio 2017
 
Install python2.7 x86-x64 and virtualenv
 
5. Run <big><code>mach.bat build -d</code></big> to build
 
If you have troubles with x64 type prompt as mach.bat set by default:
 
you may need to choose and launch the type manually, such as x86_x64 Cross Tools Command Prompt for VS 2017 in the Windows menu.)
 
cd to/the/path/servo
 
python mach build -d
 
Build instructions for all other environments are available [https://github.com/servo/servo here]
 
==='''Verifying a build'''===
 
We can quickly verify if the servo build is working by running the command
 
<big><code>'./mach run http://www.google.com'</code></big>
 
This will open a browser instance rendering the Google homepage.
 
This should be straightforward on any environment that has rendering support - Linus, Windows, MacOS, Android
 
If you are on Janitor environment, it's IDE will not provide rendering support. You might receive an error along the lines of 'No renderer found' upon executing the command.
 
'''Workaround''': On the 'Container' page on janitor.technology click on '''VNC''' for your container. Click '''Connect''' on the new tab that opens up.
 
You should now have remote access to a UI with a command line. Simply run the above command and the web page should render.
 
=='''Test Plan'''==
 
'''Testing Approach'''
 
Since our implementation adds the Microdata tags into the DOM, the approach used for testing is to directly query the DOM tree using JavaScript to detect the presence of Microdata in an HTML tag within the DOM. This will confirm if the engine was able to parse these tags and add them to the DOM.
Also, as per the microdata specifications, the tags can be contained in any HTML tag. Therefore, the test data consists of several HTML tags like 'div', 'ul', 'li', 'span' etc.  each with a microdata ('itemprop' and 'itemtype') attribute.
 
'''Test Framework'''
 
We have used the 'web-platform-tests' (WPT) suite for testing. It is an existing test suite used in the Servo project. It generally consists of two test types: JavaScript tests (to test DOM features, for example) written using the testharness.js library and reference tests (to test rendered output with what's expected to ensure that the rendering is done properly) written using the W3C reftest format. Since the microdata tags do not render anything on the page, only DOM testing is in scope.
 
testharness.js has been used to write the tests; it complements our testing approach as it can be called directly via JS within an HTML page. It provides a convenient API for making common assertions, and to work both for testing synchronous and asynchronous DOM features in a way that promotes clear, robust, tests.
 
testharness.js returns the result of the test directly from the html page which is then used by WPT to interpret the result of the test.
 
'''Test Cases'''
 
In order to test our implementation, the following scenarios have been evaluated.
 
'''Attribute with a single value should be stored properly'''
 
Input Data
 
[[File:tc1.PNG]]
 
Test Script
 
[[File:e1.PNG]]
 
 
'''Space separated values in the attributes should be stored as different values'''
 
Input Data
 
[[File:tc2.PNG]]
 
Test Script
 
[[File:e2.PNG]]
 
'''Duplicate occurrence of attributes should be ignored'''
 
Input Data
 
[[File:tc3.PNG]]
 
Test Script
 
[[File:e3.PNG]]
 
'''Extra whitespace in the attribute list should be ignored'''
 
Input Data
 
[[File:tc4.PNG]]
 
Test Script
 
[[File:e4.PNG]]
 
 
'''Attribute has not been set (null or empty)'''
 
Input Data
 
[[File:tc5.PNG]]
 
Test Script
 
[[File:e5.PNG]]
 
 
'''Testing Steps'''
 
Please read and perform the actions on the Build and Verification sections properly before testing.
 
1) Run the command <big><code>./mach tests-wpt /tests/wpt/mozilla/tests/mozilla/microdata/</code></big>
 
2) A webpage should render showing the status of the test.
 


The test data we created consists of HTML documents with 'itemprop' and 'itemtype' tags for different HTML elements like 'div', 'ul', 'li', 'span' etc.  
Here is the output of test-wpt after the tests have been run successfully.


Tests can be run by executing the following command on the servo directory
[[File:Tests.PNG]]
<code>./mach tests-wpt</code>


==='''Dependencies'''===
==='''Dependencies'''===
Line 97: Line 288:


=='''References'''==
=='''References'''==
http://html5doctor.com/microdata/   
http://html5doctor.com/microdata/   
http://web-platform-tests.org/writing-tests/testharness-api.html
https://html.spec.whatwg.org/multipage/microdata.html   
https://html.spec.whatwg.org/multipage/microdata.html   
https://code.tutsplus.com/tutorials/html5-microdata-welcome-to-the-machine--net-12356   
https://code.tutsplus.com/tutorials/html5-microdata-welcome-to-the-machine--net-12356   
http://www.servo.org
http://www.servo.org

Latest revision as of 05:35, 7 November 2017

Introduction

HTML Specification

The WHATWG Microdata HTML specification allows web data to be enriched in that it allows machines to learn more about the data in a web page. A typical example of real-world use of Microdata is illustrated below

Here is a simple HTML block that has some information about a student.

  My name is <span>Grad Student</span>, and I am a <span>student</span> at <span>NC State</span>
  I live in  <span><span>Raleigh</span>,<span>NC</span></span>

If a machine (web parses etc) were to read this block as it is, it would not be able to directly interpret what part of the sentence is a Name or an Address.

This is where Microdata shines. It defines attributes to different parts of the HTML block. Below is the same information with Microdata -

<div itemscope itemtype="http://data-vocabulary.org/Person">
  My name is <span itemprop="name">Grad Student</span>, and I am a <span itemprop="title">student</span> at
  <a href="http://ncsu.edu" itemprop="affliation">NC State</a>.
  I live in <span itemprop="address" itemtype="http://data-vocabulary.org/Address"><span itemprop="locality">Raleigh</span>,<span itemprop="region">NC</span>
  </span>
</div>

As it is clear, the attributes itemprop and itemtype are used to enrich data: the value title has been assigned to the word student, the value locality has been assigned to the state, NC. This way any machine that accesses this HTML can understand the content better. More information about the Microdata specification is available here. Some popular websites like Google, Skype and Microsoft use the Microdata from websites to provide additional insights. The number of websites that use Microdata is growing; currently about 13% of websites use Microdata (statistics courtesy w3techs.com).It should also be noted that the presence of Microdata does not change how the HTML block looks.

Servo

Servo is a modern, high-performance browser engine designed for both application and embedded use and written in the Rust programming language. It is currently developed on 64bit OS X, 64bit Linux, and Android.

Rust

Rust is a systems programming language focuses on memory safety and concurrency. It is similar to C++ but ensures memory safetely and high performance.

More information about the Rust programming language is available here

Scope

The scope of this project is to implement initial support for Microdata API specification by allowing the Servo browser engine to read Microdata API tags from web pages and interpret them in the DOM. This should lay a groundwork for future improvements to implement features to created vCard and JSON data from Microdata on the ServoShell. Additional project information is available here

Design

As for the initial stage of this project, the scope did not require any major changes to the engine design. We implemented a DOM method to handle appropriate attributes in the Microdata API. The below diagram shows an overview of components involved in the design. The highlighted blocks have been modified.

Implementation

The implementation involved updates to the Web Interface Definition Language (webidl) files and its Rust implementation.

HTMLElement.WebIDL

/servo/components/script/dom/webidls/HTMLElement.WebIDLs

Method Name Return Type Description
propertyNames() String Method definition only. The implementation is done in htmlelement.rs

htmlelement.rs

/servo/components/script/dom/htmlelement.rs

Method Name Return Type Description Location
parse_plain_attributes() AttrValue This method returns a value of an attribute associated to an the HTML Element. traits Virtual_Methods
propertyNames() Option<Vec<DOMString>> This method parses the space-separated values of the 'item-type' attributes' struct HTMLElement

Configuration

The preference [Pref="dom.microdata.testing.enabled"] was added to resources.pref preferences list to toggle the experimental microdata methods during development.

Build

6/11 - The Pull Request was closed and the changes were merged upstream.

3/11 - Due to 3rd party service issue (Appveyor) our build is facing issues with the continuous integration. We are working with the Servo team to resolve this.

3/11- Members of the servo project informed us that Appveyor build status does not determine the success of a pull request. We were advised to run a bors-servo build and this succeeded. Therefore all build checks necessary for PR to pass have been completed successfully. We are waiting changes to be merged to upstream.


We have opened our pull request and are working on getting it merged based on the reviews received. Please use our forked repository till the pull request is merged.

Forked repository: https://github.com/CJ8664/servo

Clone command using git: git clone https://github.com/CJ8664/servo.git

Once you have the forked repo, please follow steps here to do a build.

Note that build may take up to 30 minutes, based on your system configuration. You can build on Janitor to reduce the build time.

Building on the cloud

This is the simplest and the fastest way to deploy and test an instance of servo. No configuration is required on your machine.

1. Go to http://janitor.technology

2. Click on New Container for Servo

3. Enter your email address to gain access to your container

4. Once logged in, go Containers on the top right.

5. You will now see a container - Click on the IDE button to open your online IDE environment.

6. Change directory to /home/user and create a new directory, say servo_test

7. Go to this new directory and clone our repository as mentioned above

8. Upon cloning you should see a /servo directory within 'servo_test'

9. Go to /servo

10. It is now time to build - run the following command:

./mach build --dev

If all goes well you will see a success message - 'Build Completed'

Building locally

Local build instructions for Windows environments are given below -

1. Install Python for Windows (https://www.python.org/downloads/release/python-2714/).

The Windows x86-64 MSI installer is fine. You should change the installation to install the "Add python.exe to Path" feature.

2. Install virtualenv.

In a normal Windows Shell (cmd.exe or "Command Prompt" from the start menu), do:

pip install virtualenv

If this does not work, you may need to reboot for the changed PATH settings (by the python installer) to take effect.

3. Install Git for Windows (https://git-scm.com/download/win). DO allow it to add git.exe to the PATH (default settings for the installer are fine).

4. Install Visual Studio Community 2017 (https://www.visualstudio.com/vs/community/).

You MUST add "Visual C++" to the list of installed components. It is not on by default. Visual Studio 2017 MUST installed to the default location or mach.bat will not find it.

If you encountered errors with the environment above, do the following for a workaround:

Download and install Build Tools for Visual Studio 2017

Install python2.7 x86-x64 and virtualenv

5. Run mach.bat build -d to build

If you have troubles with x64 type prompt as mach.bat set by default:

you may need to choose and launch the type manually, such as x86_x64 Cross Tools Command Prompt for VS 2017 in the Windows menu.)

cd to/the/path/servo

python mach build -d

Build instructions for all other environments are available here

Verifying a build

We can quickly verify if the servo build is working by running the command

'./mach run http://www.google.com'

This will open a browser instance rendering the Google homepage.

This should be straightforward on any environment that has rendering support - Linus, Windows, MacOS, Android

If you are on Janitor environment, it's IDE will not provide rendering support. You might receive an error along the lines of 'No renderer found' upon executing the command.

Workaround: On the 'Container' page on janitor.technology click on VNC for your container. Click Connect on the new tab that opens up.

You should now have remote access to a UI with a command line. Simply run the above command and the web page should render.

Test Plan

Testing Approach

Since our implementation adds the Microdata tags into the DOM, the approach used for testing is to directly query the DOM tree using JavaScript to detect the presence of Microdata in an HTML tag within the DOM. This will confirm if the engine was able to parse these tags and add them to the DOM. Also, as per the microdata specifications, the tags can be contained in any HTML tag. Therefore, the test data consists of several HTML tags like 'div', 'ul', 'li', 'span' etc. each with a microdata ('itemprop' and 'itemtype') attribute.

Test Framework

We have used the 'web-platform-tests' (WPT) suite for testing. It is an existing test suite used in the Servo project. It generally consists of two test types: JavaScript tests (to test DOM features, for example) written using the testharness.js library and reference tests (to test rendered output with what's expected to ensure that the rendering is done properly) written using the W3C reftest format. Since the microdata tags do not render anything on the page, only DOM testing is in scope.

testharness.js has been used to write the tests; it complements our testing approach as it can be called directly via JS within an HTML page. It provides a convenient API for making common assertions, and to work both for testing synchronous and asynchronous DOM features in a way that promotes clear, robust, tests.

testharness.js returns the result of the test directly from the html page which is then used by WPT to interpret the result of the test.

Test Cases

In order to test our implementation, the following scenarios have been evaluated.

Attribute with a single value should be stored properly

Input Data

Test Script


Space separated values in the attributes should be stored as different values

Input Data

Test Script

Duplicate occurrence of attributes should be ignored

Input Data

Test Script

Extra whitespace in the attribute list should be ignored

Input Data

Test Script


Attribute has not been set (null or empty)

Input Data

Test Script


Testing Steps

Please read and perform the actions on the Build and Verification sections properly before testing.

1) Run the command ./mach tests-wpt /tests/wpt/mozilla/tests/mozilla/microdata/

2) A webpage should render showing the status of the test.


Here is the output of test-wpt after the tests have been run successfully.

Dependencies

html5ever - HTML attribute names are fetched in Servo from a lookup file in the html5ever module. The html5ever module was augmented with the 'itemprop' and 'itemtype' attributes for use in Servo.

Pull Request

The pull request used to incorporate our changes upstream is available here

References

http://html5doctor.com/microdata/

http://web-platform-tests.org/writing-tests/testharness-api.html

https://html.spec.whatwg.org/multipage/microdata.html

https://code.tutsplus.com/tutorials/html5-microdata-welcome-to-the-machine--net-12356

http://www.servo.org