[Unicode]  Public Review Issues Home | Site Map | Search
 
254 Testing the Unicode Bidirectional Algorithm for Unicode 6.3 2013.07.26
Status: Closed
Originator: UTC
Informal Discussion: Unicode Mail List (Join)
Formal Feedback: Contact Form
Resolution: The bidi reference implementations will be posted to coincide with Unicode 6.3.
 

Description of Issue:

Unicode Standard Annex #9, Unicode Bidirectional Algorithm (UBA), has a major update slated for release in September, 2013. This update is the most significant change in Unicode 6.3. The changes to the algorithm and text have been already been approved by the Unicode Technical Committee, subject to final editorial review.

The Unicode Technical Committee is encouraging implementations to test their code against the new test files and the two reference implementations during the month of July, 2013. It is vital that the interpretation of the text of the specification in UAX #9 be absolutely clear, and that the values in the test data be thoroughly tested by at least two implementations before release, because any changes after release—even to fix problems—can cause significant interoperability problems. The UBA is used for displaying all Arabic and Hebrew text on the web and in application programs, so there are significant ramifications for any changes to the algorithm.

The proposed update to UAX #9 involves a substantial extension of the UBA to allow for the implementation of isolate runs, introducing new Bidi_Class property values and formatting characters in support of that extension. There are also changes to Section 3.3.5, Resolving Neutral and Isolate Formatting Types to resolve paired punctuation marks as a unit. For details, see http://www.unicode.org/reports/tr9/tr9-28.html.

The new conformance test files for UBA are:

The header of each file explains how it can be used for testing. The UCD files for the release may be accessed either through http://www.unicode.org/Public/6.3.0/, or ftp://www.unicode.org/Public/6.3.0/ucd/.

Two independent reference implementations of UBA 6.3 are available for assisting in testing. Each is available in source code -- one implemented in C, and the other in Java. Readme documentation is available for each reference implementation, to explain how to use it. The UTC would appreciate reports of any potential problems or errors in the reference implementations, as well. The source code is available at:

We recommend testing implementations against the reference code using a "monkey test", whereby random sequences of significant characters are submitted against both the implementation and the reference code, and the results compared. This kind of testing can check edge cases more completely than testing just with the two conformance test files. The significant characters should include a few characters of each Bidi_Class, plus a set of paired bracket characters, such as (){}[]. The latter are important, because the algorithm is now sensitive to the occurrence of paired brackets in the text.

For information about how to discuss this issue and how to supply formal feedback, please see the feedback and discussion instructions. The accumulated feedback received so far on this issue is shown below, or you can look at a full page view.

 

Access to Copyright and terms of use