Unicode 6.0.0

Unicode 6.0.0 is a major version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.0.0. In the discussion below, shortened references to "Unicode 6.0" or "Version 6.0" specifically refer to Version 6.0.0.

Contents of This Document

A. Summary

The Unicode Standard, Version 6.0 is the first major version of the Unicode Standard to be published solely in online format.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.0:

This version of the Unicode Standard is synchronized with the Second Edition of 10646: ISO/IEC 10646:2011. That Second Edition represents the republication of ISO/IEC 10646:2003 plus the rolled-up content additions from Amendments 1 through 8. The repertoire for Unicode Version 6.0 includes all the characters of the Second Edition, plus one additional character U+20B9 INDIAN RUPEE SIGN, which is still in the process of addition to 10646.

B. Version Information

Version 6.0 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

A complete specification of the contributory files for Unicode 6.0 is found on the page Components for 6.0.0.That page also provides the recommended reference format for Unicode Standard Annexes.

Code Charts

For Unicode 6.0.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 6.0, see the list of current Updates and Errata.

C. Stability Policy Update

In the Unicode 6.0 timeframe, the Property Alias Uniqueness stability policy has been updated, to make it clear that that uniqueness is defined specifically by the UAX44-LM3 matching rule, rather than by a generic reference to all of the UAX #44 matching rules. Also, the UAX44-LM3 matching rule has been clarified regarding the status of any property aliases beginning with the sequence of characters "is" (or "Is" or "IS"), because of the prevalence of implementations of Unicode character properties or property values with APIs prefixed with "Is", as for example IsNumeric() for the Unicode Numeric property, and so on.

Another Property Value Stability constraint has been added, to make it clear that all decimal digits (Numeric_Type=Decimal) only occur in contiguous ranges of 10 characters, with ascending numeric values from 0 to 9.

D. Textual Changes and Character Additions

Character Assignment Overview

230 characters have been added to the BMP, while 1,858 characters have been added in the supplementary planes. For the first time in the history of the Unicode Standard, the majority of the regular encoded characters (graphic and format) are not in the BMP.

Most character additions are in new blocks, but there are also character additions to a number of existing blocks.

The following table shows the allocation of code points in Unicode 6.0, by character type. It highlights the numbers for the BMP and the supplementary planes separately. For more information on the specific characters newly assigned in Unicode 6.0, see the file DerivedAge.txt in the Unicode Character Database. For more details regarding character counts, see Appendix D, Changes from Previous Versions.

* Format characters include U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR.

New Blocks

Text Changes and Additions

Numbers indicate the chapter or section in the Unicode 6.0 core specification where there are some significant changes or additions. This list is not exhaustive. Select changes for Chapter 3, Conformance, are listed separately under E. Conformance Changes. Many figures have been updated or added throughout.

E. Conformance Changes

There are several changes to conformance requirements in Unicode 6.0 that impact implementations. The most important of these are:

F. Unicode Character Database Changes

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.0 can be found in UAX #44, Unicode Character Database. The changes listed there include a number of important property revisions to existing characters that will affect implementations.

G. Unicode Standard Annex Changes

In Version 6.0, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex	Changes
UAX #9 Unicode Bidirectional Algorithm	Added informative text on alternative ways of detecting paragraph direction and recommendations for conversion to plain text
UAX #11 East Asian Width	No significant changes in this version
UAX #14 Unicode Line Breaking Algorithm	Incorporated fix for Corrigendum #7; revised the description of the SHY character; removed Ideographic Space from the list of spaces that may be compressed or expanded
UAX #15 Unicode Normalization Forms	Restructured for better document flow; corrected definitions of classes of characters in the Composition Exclusion Table; corrected statement of the algorithm for guaranteeing process stability
UAX #24 Unicode Script Property	Added discussion of multiple script values; added documentation regarding the new provisional data file ScriptExtensions.txt
UAX #29 Unicode Text Segmentation	Updated the Default Grapheme Cluster Boundary specification for Thai and Lao, and added an informative note on tailoring aksaras
UAX #31 Unicode Identifier and Pattern Syntax	Added new scripts to the tables categorizing script usage; clarified text in Section 2.3, Layout and Format Control Characters; restructured tables; updated discussion of case folding
UAX #34 Unicode Named Character Sequences	Clarified the scope of use for character sequence notation and the format used in the data files
UAX #38 Unicode Han Database (Unihan)	Added a history section; clarified the status of on and kun Japanese readings; provided URI for interactive access to the contents of the Unihan database; updated the regular expressions and descriptions for various Unihan data fields
UAX #41 Common References for Unicode Standard Annexes	No significant changes in this version
UAX #42 Unicode Character Database in XML	Added attributes for new properties and values; updated the patterns for Unihan properties
UAX #44 Unicode Character Database	Added tables listing Deprecated and Stabilized properties; updated the Matching Rules; added documentation for new properties and data files; many other clarifications regarding particular properties

Unicode® 6.0.0

Released: 2010 October 11 (Announcement)