Unicode 3.0.0
Version 3.0.0 has been superseded by the
latest version
of the Unicode Standard.
|
Version 3.0.0 of the Unicode Standard consists of
the core specification, The Unicode Standard,
Version 3.0, the code charts for this version (currently only available
in hard copy),
five Unicode Technical Reports,
and the 3.0 Update of the Unicode Character Database (UCD).
The core specification gives the general principles,
requirements for conformance, and guidelines for implementers. The
code charts show representative glyphs for all the Unicode
characters. The Unicode Technical Reports supply detailed
information about particular aspects of the standard. The Unicode
Character Database supplies normative and informative data for
implementers to allow them to implement the Unicode Standard.
|
A complete specification of the contributory files for Unicode 3.0.0
is found on the page
Components for 3.0.0. That page also provides the recommended
reference format for this version of the Unicode Standard.
Online Edition
The text of The Unicode Standard, Version 3.0
(ISBN 0-201-61633-5) is available online via the navigation links
on this page, with the exception of the code charts and the Han
radical-stroke indices. A slightly modified
HTML version of Chapter 1 has also been provided. Printing from the PDF files has
been disabled. Normative references to the Unicode Standard, Version 3.0
should use the printed edition.
Overview
Unicode 3.0.0 is a major version
of the Unicode Standard and supersedes all previous versions. This page summarizes
the important changes for the Unicode Standard, Version 3.0.0. In the
discussion below, shortened references to "Unicode 3.0" or "Version 3.0"
specifically refer to Version 3.0.0.
The core specification, The Unicode Standard, Version 3.0
contains descriptions
and properties for many new characters. It is synchronized with
ISO/IEC 10646-1 second edition. The text of the standard has been extensively rewritten
to improve its structure and clarity.
Unicode 3.0 also includes enhanced implementation guidelines, and
has been reorganized to describe related scripts within separate
chapters. In addition to new characters, there are significant
clarifications or modifications to character semantics from Unicode
2.0 to Unicode 3.0.
The vast majority of implementations of earlier versions will
be conformant to Unicode 3.0.0 once the character properties for
their supported characters are updated to
Version 3.0.0 of the Unicode Character Database.
The most significant additions to the standard include the
following:
- Transformation Formats. The precise definitions of the common
Unicode Transformation Formats are provided, including UTF-8,
UTF-16, UTF-16BE, and UTF-16LE. The relations between abstract
characters, code points (scalar values) and code units (8, 16 or
32 bit) are clarified.
- Bidirectional properties. Bidirectional properties are now
more consistent with the general category property, and new
bidirectional properties were created. See
UTR #9: The
Bidirectional Algorithm.
- Case. Case properties have been extended for those situations
where there is a mapping to multiple characters and where case is
locale dependent.
- Combining classes. These were updated significantly to resolve
problems of normalization and decomposition for Indic scripts in
particular.
- Decomposition and Composition. Unicode character
decompositions have been significantly updated to fix errors in
the original assignments, to allow correct collation weighting,
and to make decompositions consistent for normalization. Certain
characters are excluded from composition, and the precise
algorithm for composition is provided. See
UTR #15: Unicode
Normalization Forms.
- General Category. A series of general category changes were
made to assist the convergence of the Unicode definition of
identifier with ISO TR 10176.
- Newlines. Line handling characteristics have been documented
more fully for Unicode environments. See
UTR #13: Unicode
Newline Guidelines.
- Quotation Marks. Two new punctuation categories, Pi and Pf,
were created for initial and final quotes with properties that
vary by language.
- Linebreak properties. Linebreaking properties (normative and
informative) are added to the standard to support consistent
linebreaking behavior over all Unicode characters. See
UTR #14: Line
Breaking Properties.
- East-Asian width properties. Properties for supporting correct
choice of full-width vs. half-width glyphs in an East Asian
context are provided. See
UTR #11: East Asian
Width.
- Specific Characters:
- The use of the byte order mark with transformation formats is clarified.
- Use of line and paragraph separators is clarified.
- Capital letters with iota adscript. The representative
glyphs, semantics, case mappings and decompositions have been
revised to make their handling more consistent.
- Consonant RA rules have been updated and expanded to cover Eyelash Ra.
- U+2007 FIGURE SPACE is no longer treated like
a numeric separator for purposes of bidirectional layout.
- The description of layout controls was
enhanced to include the behavior of U+00A0 NO-BREAK SPACE,
U+00AD SOFT HYPHEN, and zero-width spaces.
- The use of U+007E TILDE as a spacing clone of
combining tilde and as a regular character is described more
completely.
New Characters
The new characters added to Unicode 3.0 are summarized in the following
table:
Unicode 3.0 Summary
Category |
V 2.1 |
V 3.0 |
Alphabetics, Symbols |
6511 |
10236 |
CJK Ideographs |
21204 |
27786 |
Hangul Syllables |
11172 |
11172 |
Total assigned
characters |
38887 |
49194 |
Private Use |
6400 |
6400 |
Surrogates |
2048 |
2048 |
Controls |
65 |
65 |
Not Characters |
2 |
2 |
Total assigned 16-bit code values |
47402 |
57709 |
Unassigned 16-bit code values |
18134 |
7827 |
Besides adding characters to existing blocks, Unicode 3.0 adds a
number of new blocks, listed below, and including the number of
code points allocated to each block. For a list of all the blocks in
Unicode 3.0, see
Blocks.txt
New Blocks
Number |
Block Name |
80 |
Syriac |
192 |
Thaana |
128 |
Sinhala |
160 |
Myanmar |
384 |
Ethiopic |
96 |
Cherokee |
640 |
Unified Canadian Aboriginal Syllabics |
32 |
Ogham |
96 |
Runic |
128 |
Khmer |
176 |
Mongolian |
256 |
Braille Patterns |
128 |
CJK Radicals Supplement |
224 |
Kangxi Radicals |
16 |
Ideographic Description Characters |
32 |
Bopomofo Extended |
6582 |
CJK Unified Ideographs Extension A |
1168 |
Yi Syllables |
64 |
Yi Radicals |
Conformance Changes
Conformance clauses, definitions, and explanatory text were added
for handling Unicode Transformation Formats. The Unicode
Bidirectional Behavior algorithm rules were clarified and expanded,
and new bidirectional character properties were documented. Other
normative character property values were changed; see the Unicode
character database file for more information.
Unicode Technical Reports
The following technical reports are approved and considered part
of the Unicode Standard, Version 3.0. These reports may contain
either normative or informative material, or both. Any reference to
version 3.0 of the standard automatically includes these technical
reports.