CSC/ECE 517 Fall 2015 M1505 Add conformance tests to unicode-bidi and fix conformance bugs

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction

Web browsers are expected to support international text, and Servo is no exception. This project is an attempt to improve an existing library to implement the Unicode Bidirectional Algorithm for display of mixed right-to-left and left-to-right text, and it has not yet achieved full conformance with the specification<ref name="servo">http://en.wikipedia.org/wiki/Servo_%28layout_engine%29</ref>.

Servo

Servo is an experimental project to build a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, we are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism. Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation<ref name = "servo"/>.

Rust

Rust is a new programming language for developing reliable and efficient systems. It is designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory<ref>http://www.rust-lang.org/</ref>.

Architecture

  • generate.py - This file is central to the code base as it fetches the test data files BidiTest.txt<ref name="biditest">http://www.unicode.org/Public/UNIDATA/BidiTest.txt</ref> and BidiCharacterTest.txt<ref name="bidichartest">http://www.unicode.org/Public/UNIDATA/BidiCharacterTest.txt</ref>
  • BidiTest.txt<ref name="biditest"/> - Contains test case sample data at word level.
  • BidiCharacterTest.txt<ref name="bidichartest"/> - Contains test case sample data at character level.
  • Cargo - Rust file which will be used to run the cargo test.
  • lib.rs - Rust file which will be used for Unicode Bidirectional Algorithm testing.

Project Description

Initial Steps:

  • Clone the unicode-bidi repository, compile it, and run the tests in defined in src/lib.rs.

In-order to run the tests the user must navigate to the unicode-bidi folder and then use the following command. Rust must be downloaded in order for this command to work.

cargo test

The fetch commands have been commented out for now, as they were not required in the initial steps.

if __name__ == "__main__":
    os.chdir("../src/") # changing download path to /unicode-bidi/src/
    r = "tables.rs"
    # downloading the test case files
    # fetch("BidiTest.txt")
    # fetch("BidiCharacterTest.txt")
  • By hand, convert several test cases from the file into Rust tests that can be run automatically.

We have tested the following public methods:

    • reorder_line() : We added tests to this method. A failing test was found, which will be fixed in the subsequent steps.
    • is_ltr()
    • is_rtl()
    • removed_by_x9()
    • not_removed_by_x9()
#[test]
    fn test_reorder_line() {
        use super::{process_text, reorder_line};
        use std::borrow::Cow;
        fn reorder(s: &str) -> Cow<str> {
            let info = process_text(s, None);
            let para = &info.paragraphs[0];
            reorder_line(s, para.range.clone(), &info.levels)
        }
        assert_eq!(reorder("abc123"), "abc123");
        assert_eq!(reorder("1.-2"), "1.-2");
        assert_eq!(reorder("1-.2"), "1-.2");
        assert_eq!(reorder("abc אבג"), "abc גבא");
        //Numbers being weak LTR characters, cannot reorder strong RTL
        assert_eq!(reorder("123 אבג"), "גבא 123");
        //Testing for RLE Character
        assert_eq!(reorder("\u{202B}abc אבג\u{202C}"), "\u{202B}\u{202C}גבא abc");
        //Testing neutral characters
        assert_eq!(reorder("אבג? אבג"), "גבא ?גבא");
        //Testing neutral characters with special case
        assert_eq!(reorder("A אבג?"), "A גבא?");
        //Testing neutral characters with Implicit RTL Marker
        //The given test highlights a possible non-conformance issue that will perhaps be fixed in the subsequent steps.
        //assert_eq!(reorder("A אבג?\u{202f}"), "A \u{202f}?גבא");
        assert_eq!(reorder("אבג abc"), "abc גבא");
        assert_eq!(reorder("abc\u{2067}.-\u{2069}ghi"),
                           "abc\u{2067}-.\u{2069}ghi");
        assert_eq!(reorder("Hello, \u{2068}\u{202E}world\u{202C}\u{2069}!"),
                           "Hello, \u{2068}\u{202E}\u{202C}dlrow\u{2069}!");
    }

    #[test]
    fn test_is_ltr() {
        use super::is_ltr;
        assert_eq!(is_ltr(10), true);
        assert_eq!(is_ltr(11), false);
        assert_eq!(is_ltr(20), true);
    }

    #[test]
    fn test_is_rtl() {
        use super::is_rtl;
        assert_eq!(is_rtl(13), true);
        assert_eq!(is_rtl(11), true);
        assert_eq!(is_rtl(20), false);
    }

    #[test]
    fn test_removed_by_x9() {
        use prepare::removed_by_x9;
        let rem_classes = &[RLE, LRE, RLO, LRO, PDF, BN];
        let not_classes = &[L, RLI, AL, LRI, PDI];
        for x in rem_classes {
            assert_eq!(removed_by_x9(*x), true);
        }
        for x in not_classes {
            assert_eq!(removed_by_x9(*x), false);
        }
    }

    #[test]
    fn test_not_removed_by_x9() {
        use prepare::not_removed_by_x9;
        let non_x9_classes = &[L, R, AL, EN, ES, ET, AN, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI];
        for x in non_x9_classes {
            assert_eq!(not_removed_by_x9(&x), true);
        }
    }

Subsequent Steps:

  • Need to add methods to tools/generate.py that automatically converts the tests in the conformance suite into Rust tests that can be run automatically [1]
  • Implement the missing step L1 from the UBA [2]
  • Implement the missing step N0 from the UBA [3]
  • Solve the conformance problems related to the implementation of steps W1 to W7[4]

Here is a flowchart which represents the sequence of activities that make up the logical flow of this project

Design Principles

Two of our proposed design principles are:

1. Open-Closed principle:

We will be adding code to generate.py to convert the test cases in BidiTest.txt and BidiCharacterTest.txt into Rust test cases. However, we won't be changing exisiting code in the generate.py file.

2. Single Responsibility Principle:

The code to be implemented in generate.py and lib.rs will contain seperate single responsibilities. generate.py deals will fetching files, loading, unloading data. Whereas, lib.rs deals with actually testing the exisiting methods, and extending the functionality of the Unicode-Bidi algorithm.

Design Pattern

The proposed design pattern for this project is the Command Pattern. The Command Pattern uses an object to encapsulate all information needed to perform an action or trigger an event at a later time. <ref>https://en.wikipedia.org/wiki/Command_pattern</ref>

The four terms associated with this pattern are:

  • Command Object:A command object knows about receiver and invokes a method of the receiver.
  • Command: Values for parameters of the receiver method are stored in the command
  • Receiver: Does the work after receiving the command.
  • Invoker: An invoker object knows how to execute a command, and optionally does bookkeeping about the command execution.
  • Client: The client decides which commands to execute at which points

In this project the terms given above can be interpreted as:

  • Command Object: generate.py
  • Command: BidiCharacterTest.txt/ BidiTest.txt
  • Receiver: lib.rs
  • Invoker: Cargo Test
  • Client: Cargo

UML Diagrams

Class Diagram

Here is a class diagram representing the different classes involved and their mutual interaction.

Test Cases

The project involved adding code from BidiCharacterTest.txt and BidiTest.txt so as to ensure that the implementation of the unicode-bidi algorithm always conforms to the specifications defined in the Unicode Bidirectional algorithm. However as part of the initial steps, we did add a few manual test cases that would check for conformance to some of the major steps. Here are some of those test cases:

  • Check for LTR by passing the level number
  • Check for RTL by passing the level number
  • Check for removal of characters according to the Rule X9 of the algorithm
  • Check for non removal of characters according to the Rule X9 of the algorithm
  • Check for reordering of characters in accordance with the following types of characters:
    • Weak LTR
    • Strong LTR
    • Strong RTL
    • Neutral characters
    • RTL(Explicit Right-To-Left) Markers (Failing Test Case. The steps to implement this was not implemented till that point in time.)

Video

<TBA>

References

<references/>