CSC/ECE 517 Fall 2015 M1505 Add conformance tests to unicode-bidi and fix conformance bugs: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 218: Line 218:


[http://unicode.org/reports/tr9/ Unicode Bidi Documentation]
[http://unicode.org/reports/tr9/ Unicode Bidi Documentation]
[http://www.w3.org/International/articles/inline-bidi-markup/uba-basics Unicode Bidi Basics]
[http://www.w3.org/International/articles/inline-bidi-markup/uba-basics Unicode Bidi Basics]
[http://www.iamcal.com/understanding-bidirectional-text/ Understanding Bidirectional Text]
[http://www.iamcal.com/understanding-bidirectional-text/ Understanding Bidirectional Text]
[http://servo.org/ Servo]
[http://servo.org/ Servo]
[https://www.rust-lang.org/ Rust]
[https://www.rust-lang.org/ Rust]
[https://doc.rust-lang.org/book/testing.html Testing in Rust]
[https://doc.rust-lang.org/book/testing.html Testing in Rust]
[https://www.youtube.com/watch?v=f_Gw-usX0z0 OSS Walkthrough Video]
[https://www.youtube.com/watch?v=f_Gw-usX0z0 OSS Walkthrough Video]


==References==
==References==
<references/>
<references/>

Revision as of 08:50, 13 November 2015

Introduction

Web browsers are expected to support international text, and Servo is no exception. This project is an attempt to improve an existing library to implement the Unicode Bidirectional Algorithm for display of mixed right-to-left and left-to-right text, and it has not yet achieved full conformance with the specification<ref name="servo">http://en.wikipedia.org/wiki/Servo_%28layout_engine%29</ref>.

Servo

Servo is an experimental project to build a Web browser engine for a new generation of hardware: mobile devices, multi-core processors and high-performance GPUs. With Servo, we are rethinking the browser at every level of the technology stack — from input parsing to page layout to graphics rendering — to optimize for power efficiency and maximum parallelism. Servo builds on top of Rust to provide a secure and reliable foundation. Memory safety at the core of the platform ensures a high degree of assurance in the browser’s trusted computing base. Rust’s lightweight task mechanism also promises to allow fine-grained isolation between browser components, such as tabs and extensions, without the need for expensive runtime protection schemes, like operating system process isolation<ref name = "servo"/>.

Rust

Rust is a new programming language for developing reliable and efficient systems. It is designed to support concurrency and parallelism in building platforms that take full advantage of modern hardware. Its static type system is safe and expressive and it provides strong guarantees about isolation, concurrency execution and memory safety. Rust combines powerful and flexible modern programming constructs with a clear performance model to make program efficiency predictable and manageable. One important way it achieves this is by allowing fine-grained control over memory allocation through contiguous records and stack allocation. This control is balanced with the absolute requirement of safety: Rust’s type system and runtime guarantee the absence of data races, buffer overflow, stack overflow or access to uninitialized or deallocated memory<ref>http://www.rust-lang.org/</ref>.

Architecture

  • generate.py - This file is central to the code base as it fetches the test data files BidiTest.txt<ref name="biditest">http://www.unicode.org/Public/UNIDATA/BidiTest.txt</ref> and BidiCharacterTest.txt<ref name="bidichartest">http://www.unicode.org/Public/UNIDATA/BidiCharacterTest.txt</ref>
  • BidiTest.txt<ref name="biditest"/> - Contains test case sample data at word level.
  • BidiCharacterTest.txt<ref name="bidichartest"/> - Contains test case sample data at character level.
  • Cargo - Rust file which will be used to run the cargo test.
  • lib.rs - Rust file which will be used for Unicode Bidirectional Algorithm testing.

Project Description

Initial Steps:

  • Clone the unicode-bidi repository, compile it, and run the tests in defined in src/lib.rs.

In-order to run the tests the user must navigate to the unicode-bidi folder and then use the following command. Rust must be downloaded in order for this command to work.

cargo test

The fetch commands have been commented out for now, as they were not required in the initial steps.

if __name__ == "__main__":
    os.chdir("../src/") # changing download path to /unicode-bidi/src/
    r = "tables.rs"
    # downloading the test case files
    # fetch("BidiTest.txt")
    # fetch("BidiCharacterTest.txt")
  • By hand, convert several test cases from the file into Rust tests that can be run automatically.

We have tested the following public methods:

  • reorder_line() : We added tests to this method. A failing test was found, which will be fixed in the subsequent steps.
  • is_ltr()
  • is_rtl()
  • removed_by_x9()
  • not_removed_by_x9()
#[test]
    fn test_reorder_line() {
        use super::{process_text, reorder_line};
        use std::borrow::Cow;
        fn reorder(s: &str) -> Cow<str> {
            let info = process_text(s, None);
            let para = &info.paragraphs[0];
            reorder_line(s, para.range.clone(), &info.levels)
        }
        assert_eq!(reorder("abc123"), "abc123");
        assert_eq!(reorder("1.-2"), "1.-2");
        assert_eq!(reorder("1-.2"), "1-.2");
        assert_eq!(reorder("abc אבג"), "abc גבא");
        //Numbers being weak LTR characters, cannot reorder strong RTL
        assert_eq!(reorder("123 אבג"), "גבא 123");
        //Testing for RLE Character
        assert_eq!(reorder("\u{202B}abc אבג\u{202C}"), "\u{202B}\u{202C}גבא abc");
        //Testing neutral characters
        assert_eq!(reorder("אבג? אבג"), "גבא ?גבא");
        //Testing neutral characters with special case
        assert_eq!(reorder("A אבג?"), "A גבא?");
        //Testing neutral characters with Implicit RTL Marker
        //The given test highlights a possible non-conformance issue that will perhaps be fixed in the subsequent steps.
        //assert_eq!(reorder("A אבג?\u{202f}"), "A \u{202f}?גבא");
        assert_eq!(reorder("אבג abc"), "abc גבא");
        assert_eq!(reorder("abc\u{2067}.-\u{2069}ghi"),
                           "abc\u{2067}-.\u{2069}ghi");
        assert_eq!(reorder("Hello, \u{2068}\u{202E}world\u{202C}\u{2069}!"),
                           "Hello, \u{2068}\u{202E}\u{202C}dlrow\u{2069}!");
    }

    #[test]
    fn test_is_ltr() {
        use super::is_ltr;
        assert_eq!(is_ltr(10), true);
        assert_eq!(is_ltr(11), false);
        assert_eq!(is_ltr(20), true);
    }

    #[test]
    fn test_is_rtl() {
        use super::is_rtl;
        assert_eq!(is_rtl(13), true);
        assert_eq!(is_rtl(11), true);
        assert_eq!(is_rtl(20), false);
    }

    #[test]
    fn test_removed_by_x9() {
        use prepare::removed_by_x9;
        let rem_classes = &[RLE, LRE, RLO, LRO, PDF, BN];
        let not_classes = &[L, RLI, AL, LRI, PDI];
        for x in rem_classes {
            assert_eq!(removed_by_x9(*x), true);
        }
        for x in not_classes {
            assert_eq!(removed_by_x9(*x), false);
        }
    }

    #[test]
    fn test_not_removed_by_x9() {
        use prepare::not_removed_by_x9;
        let non_x9_classes = &[L, R, AL, EN, ES, ET, AN, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI];
        for x in non_x9_classes {
            assert_eq!(not_removed_by_x9(&x), true);
        }
    }

Subsequent Steps:

  • Need to add methods to tools/generate.py that automatically converts the tests in the conformance suite into Rust tests that can be run automatically [1]

BidiTest and BidiCharacterTest contain a comprehensive list of test cases for testing characters to be rendered in Servo. Our implementation in generate.py will generate the Rust code which will be the input for the reorder_line() method in lib.rs.

  • Implement the missing step L1 from the UBA [2]

The L1 step can be described as: On each line, reset the embedding level of the following characters to the paragraph embedding level:

  • Segment separators,
  • Paragraph separators,
  • Any sequence of whitespace characters and/or isolate formatting characters (FSI, LRI, RLI, and PDI) preceding a segment separator or paragraph separator, and
  • Any sequence of whitespace characters and/or isolate formatting characters (FSI, LRI, RLI, and PDI) at the end of the line.

In combination with the following rule, this means that trailing whitespace will appear at the visual end of the line (in the paragraph direction). Tabulation will always have a consistent direction within a paragraph. <ref>http://unicode.org/reports/tr9/#L1</ref>

This step will be implemented in the lib.rs file.

  • Implement the missing step N0 from the UBA [3]

Process bracket pairs in an isolating run sequence sequentially in the logical order of the text positions of the opening paired brackets. Identify the bracket pairs in the current isolating run sequence according to BD16. <ref>http://unicode.org/reports/tr9/#N0</ref>

  • Solve the conformance problems related to the implementation of steps W1 to W7[4]

We will have to improve the existing implementation of the steps W1 to W7.

Flowchart of Project Description

Design Principles

Two of our proposed design principles are:

1. Open-Closed principle:

In object-oriented programming, the open/closed principle states "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification" that is, such an entity can allow its behavior to be extended without modifying its source code. <ref>https://en.wikipedia.org/wiki/Open/closed_principle</ref>

In the subsequent steps we will be using this principle in generate.py. We will add code to generate.py to convert the test cases in BidiTest.txt and BidiCharacterTest.txt into Rust test cases. However, we won't be changing existing code in the generate.py file.

2. Single Responsibility Principle:

In object-oriented programming, the single responsibility principle states that every class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.<ref>https://en.wikipedia.org/wiki/Single_responsibility_principle</ref>

The code to be implemented in generate.py and lib.rs will contain separate single responsibilities. generate.py deals will fetching files, loading, unloading data. Whereas, lib.rs deals with actually testing the existing methods, and extending the functionality of the Unicode-Bidi algorithm. Thus, the design principle is being used in the project.

Design Pattern

The proposed design pattern for this project is the Command Pattern. The Command Pattern uses an object to encapsulate all information needed to perform an action or trigger an event at a later time. <ref>https://en.wikipedia.org/wiki/Command_pattern</ref>

The four terms associated with this pattern are:

  • Command Object:A command object knows about receiver and invokes a method of the receiver.
  • Command: Values for parameters of the receiver method are stored in the command
  • Receiver: Does the work after receiving the command.
  • Invoker: An invoker object knows how to execute a command, and optionally does bookkeeping about the command execution.
  • Client: The client decides which commands to execute at which points

In this project the terms given above can be interpreted as:

  • Command Object: generate.py
  • Command: BidiCharacterTest.txt/ BidiTest.txt
  • Receiver: lib.rs
  • Invoker: Cargo Test
  • Client: Cargo

UML Diagrams

Class Diagram

Here is a class diagram representing the different classes involved and their mutual interaction.

Test Cases

The project involved adding code from BidiCharacterTest.txt and BidiTest.txt so as to ensure that the implementation of the unicode-bidi algorithm always conforms to the specifications defined in the Unicode Bidirectional algorithm. However as part of the initial steps, we did add a few manual test cases that would check for conformance to some of the major steps. Here are some of those test cases:

  • Check for LTR by passing the level number
  • Check for RTL by passing the level number
  • Check for removal of characters according to the Rule X9 of the algorithm
  • Check for non removal of characters according to the Rule X9 of the algorithm
  • Check for reordering of characters in accordance with the following types of characters:
    • Weak LTR
    • Strong LTR
    • Strong RTL
    • Neutral characters
    • RTL(Explicit Right-To-Left) Markers (Failing Test Case. The steps to implement this was not implemented till that point in time.)

Video

<TBA>

External Links

Unicode Bidi Documentation

Unicode Bidi Basics

Understanding Bidirectional Text

Servo

Rust

Testing in Rust

OSS Walkthrough Video

References

<references/>