CSC/ECE 517 Fall 2017/Semester Project - Implement the Microdata API

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

HTML Specification

The WHATWG Microdata HTML specification allows web data to be enriched in that it allows machines to learn more about the data in a web page. A typical example of real-world use of Microdata is illustrated below

Here is a simple HTML block that has some information about a student.

  My name is <span>Grad Student</span>, and I am a <span>student</span> at <span>NC State</span>
  I live in  <span><span>Raleigh</span>,<span>NC</span></span>

If a machine (web parses etc) were to read this block as it is, it would not be able to directly interpret what part of the sentence is a Name or an Address.

This is where Microdata shines. It defines attributes to different parts of the HTML block. Below is the same information with Microdata -

<div itemscope itemtype="http://data-vocabulary.org/Person">
  My name is <span itemprop="name">Grad Student</span>, and I am a <span itemprop="title">student</span> at
  <a href="http://ncsu.edu" itemprop="affliation">NC State</a>.
  I live in <span itemprop="address" itemtype="http://data-vocabulary.org/Address"><span itemprop="locality">Raleigh</span>,<span itemprop="region">NC</span>
  </span>
</div>

As it is clear, the attributes itemprop and itemtype are used to enrich data: the value title has been assigned to the word student, the value locality has been assigned to the state, NC. This way any machine that accesses this HTML can understand the content better. More information about the Microdata specification is available here. Some popular websites like Google, Skype and Microsoft use the Microdata from websites to provide additional insights. The number of websites that use Microdata is growing; currently about 13% of websites use Microdata (statistics courtesy w3techs.com).It should also be noted that the presence of Microdata does not change how the HTML block looks.

Servo

Servo is a modern, high-performance browser engine designed for both application and embedded use and written in the Rust programming language. It is currently developed on 64bit OS X, 64bit Linux, and Android.

Rust

Rust is a systems programming language focuses on memory safety and concurrency. It is similar to C++ but ensures memory safetely and high performance.

More information about the Rust programming language is available here

Background

This project is the second phase of the OSS project - M1752 Implement the Microdata API. For more details, please refer to the phase 1 documentation on the Wiki prior to going through this document to gain familiarity with the components involved

Design

The design consists of the following procedures -

1) The DOM parser parses the HTML page and adds the microdata elements, along with the other html elements, to the DOM tree.

2) The JSON and vCard extraction algorithms are invoked based on the type of Microdata present in the DOM.

3) These algorithms are executed to convert the Microdata to the respective formats

4) The notification algorithm sends a data structure to notify servoshell that microdata elements exist on the page.

5) Servoshell activates certain UI elements based on the notification. These elements allow us to download the converted microdata as JSON/vCard

The below diagram provides details on the components involved in the process flow.


The diagram below outlines the sequence of operations that take place in order for servoshell to interpret the microdata (in a bottom-up manner)

Test Plan

Testing Approach

The interaction between servo and servoshell would be tested by populating certain servoshell UI elements with values if microdata is detected. For example, if a webpage contains microdata, the servoshell tab title (Not the HTML title!) would say something along the lines of 'Microdata found'.

For verifying the validity of the VCF file that is downloaded, we will import it using a tool that supports VCF and verify whether the individual fields are populated correctly. Some candidate VCF client applications include - Outlook, Windows Contacts, Android Contacts, Contacts app on MacOS.

As for JSON, we plan to build test scripts that query the JSON object with each microdata key. The value returned from the JSON is compared with the expected microdata property value in the DOM.

Test Data

1) Sample html pages containing variety of microdata would be created.

2) Webpages across the internet containing microdata would also be used.

Test Scenarios

Testing the interaction between servo and servoshell

1. Open a webpage containing microdata

2. Verify if the servoshell tab title shows - Microdata detected

Testing the vCard file

1. Open a webpage containing vCard related microdata

2. Download it using the servoshell

3. Import it using an external contacts application that supports VCF and note the results.

Testing JSON file

1. Open a webpage containing microdata

2. Download it using the servoshell

3. Verify using the test script

References

Microdata Project

Phase 1 Wiki Page

Servoshell

http://html5doctor.com/microdata/

http://web-platform-tests.org/writing-tests/testharness-api.html

https://html.spec.whatwg.org/multipage/microdata.html

https://code.tutsplus.com/tutorials/html5-microdata-welcome-to-the-machine--net-12356

http://www.servo.org