1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
|
---|
2 |
|
---|
3 | "http://www.w3.org/TR/REC-html40/loose.dtd">
|
---|
4 |
|
---|
5 | <html>
|
---|
6 |
|
---|
7 |
|
---|
8 |
|
---|
9 | <head>
|
---|
10 |
|
---|
11 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
---|
12 |
|
---|
13 | <meta http-equiv="Content-Language" content="en-us">
|
---|
14 |
|
---|
15 | <meta name="GENERATOR" content="Microsoft FrontPage 4.0">
|
---|
16 |
|
---|
17 | <meta name="ProgId" content="FrontPage.Editor.Document">
|
---|
18 |
|
---|
19 | <link rel="stylesheet" href="http://www.unicode.org/unicode.css" type="text/css">
|
---|
20 |
|
---|
21 | <title>Unicode Character Database</title>
|
---|
22 |
|
---|
23 | </head>
|
---|
24 |
|
---|
25 |
|
---|
26 |
|
---|
27 | <body>
|
---|
28 |
|
---|
29 |
|
---|
30 |
|
---|
31 | <h1>UNICODE CHARACTER DATABASE<br>
|
---|
32 | Version 3.0.0</h1>
|
---|
33 |
|
---|
34 | <table border="1" cellspacing="2" cellpadding="0" height="87" width="100%">
|
---|
35 |
|
---|
36 | <tr>
|
---|
37 |
|
---|
38 | <td valign="TOP" width="144">Revision</td>
|
---|
39 |
|
---|
40 | <td valign="TOP">3.0.0</td>
|
---|
41 |
|
---|
42 | </tr>
|
---|
43 |
|
---|
44 | <tr>
|
---|
45 |
|
---|
46 | <td valign="TOP" width="144">Authors</td>
|
---|
47 |
|
---|
48 | <td valign="TOP">Mark Davis and Ken Whistler</td>
|
---|
49 |
|
---|
50 | </tr>
|
---|
51 |
|
---|
52 | <tr>
|
---|
53 |
|
---|
54 | <td valign="TOP" width="144">Date</td>
|
---|
55 |
|
---|
56 | <td valign="TOP">1999-09-11</td>
|
---|
57 |
|
---|
58 | </tr>
|
---|
59 |
|
---|
60 | <tr>
|
---|
61 |
|
---|
62 | <td valign="TOP" width="144">This Version</td>
|
---|
63 |
|
---|
64 | <td valign="TOP"><a href="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html</a></td>
|
---|
65 |
|
---|
66 | </tr>
|
---|
67 |
|
---|
68 | <tr>
|
---|
69 |
|
---|
70 | <td valign="TOP" width="144">Previous Version</td>
|
---|
71 |
|
---|
72 | <td valign="TOP">n/a</td>
|
---|
73 |
|
---|
74 | </tr>
|
---|
75 |
|
---|
76 | <tr>
|
---|
77 |
|
---|
78 | <td valign="TOP" width="144">Latest Version</td>
|
---|
79 |
|
---|
80 | <td valign="TOP"><a href="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html</a></td>
|
---|
81 |
|
---|
82 | </tr>
|
---|
83 |
|
---|
84 | </table>
|
---|
85 |
|
---|
86 | <p align="center">Copyright © 1995-1999 Unicode, Inc. All Rights reserved.</p>
|
---|
87 |
|
---|
88 | <h2>Disclaimer</h2>
|
---|
89 |
|
---|
90 | <p>The Unicode Character Database is provided as is by Unicode, Inc. No claims
|
---|
91 |
|
---|
92 | are made as to fitness for any particular purpose. No warranties of any kind are
|
---|
93 |
|
---|
94 | expressed or implied. The recipient agrees to determine applicability of
|
---|
95 |
|
---|
96 | information provided. If this file has been purchased on magnetic or optical
|
---|
97 |
|
---|
98 | media from Unicode, Inc., the sole remedy for any claim will be exchange of
|
---|
99 |
|
---|
100 | defective media within 90 days of receipt.</p>
|
---|
101 |
|
---|
102 | <p>This disclaimer is applicable for all other data files accompanying the
|
---|
103 |
|
---|
104 | Unicode Character Database, some of which have been compiled by the Unicode
|
---|
105 |
|
---|
106 | Consortium, and some of which have been supplied by other sources.</p>
|
---|
107 |
|
---|
108 | <h2>Limitations on Rights to Redistribute This Data</h2>
|
---|
109 |
|
---|
110 | <p>Recipient is granted the right to make copies in any form for internal
|
---|
111 |
|
---|
112 | distribution and to freely use the information supplied in the creation of
|
---|
113 |
|
---|
114 | products supporting the Unicode<sup>TM</sup> Standard. The files in the Unicode
|
---|
115 |
|
---|
116 | Character Database can be redistributed to third parties or other organizations
|
---|
117 |
|
---|
118 | (whether for profit or not) as long as this notice and the disclaimer notice are
|
---|
119 |
|
---|
120 | retained. Information can be extracted from these files and used in
|
---|
121 |
|
---|
122 | documentation or programs, as long as there is an accompanying notice indicating
|
---|
123 |
|
---|
124 | the source.</p>
|
---|
125 |
|
---|
126 | <h2>Introduction</h2>
|
---|
127 |
|
---|
128 | <p>The Unicode Character Database is a set of files that define the Unicode
|
---|
129 |
|
---|
130 | character properties and internal mappings. For more information about character
|
---|
131 |
|
---|
132 | properties and mappings, see <i><a href="http://www.unicode.org/unicode/uni2book/u2.html">The
|
---|
133 |
|
---|
134 | Unicode Standard</a></i>.</p>
|
---|
135 |
|
---|
136 | <p>The Unicode Character Database has been updated to reflect Version 3.0 of the
|
---|
137 |
|
---|
138 | Unicode Standard, with many characters added to those published in Version 2.0.
|
---|
139 |
|
---|
140 | A number of corrections have also been made to case mappings or other errors in
|
---|
141 |
|
---|
142 | the database noted since the publication of Version 2.0. Normative bidirectional
|
---|
143 |
|
---|
144 | properties have also been modified to reflect decisions of the Unicode Technical
|
---|
145 |
|
---|
146 | Committee.</p>
|
---|
147 |
|
---|
148 | <p>For more information on versions of the Unicode Standard and how to reference
|
---|
149 |
|
---|
150 | them, see <a href="http://www.unicode.org/unicode/standard/versions/">http://www.unicode.org/unicode/standard/versions/</a>.</p>
|
---|
151 |
|
---|
152 | <h2>Conformance</h2>
|
---|
153 |
|
---|
154 | <p>Character properties may be either normative or informative. <i>Normative</i>
|
---|
155 |
|
---|
156 | means that implementations that claim conformance to the Unicode Standard (at a
|
---|
157 |
|
---|
158 | particular version) and which make use of a particular property or field must
|
---|
159 |
|
---|
160 | follow the specifications of the standard for that property or field in order to
|
---|
161 |
|
---|
162 | be conformant. The term <i>normative</i> when applied to a property or field of
|
---|
163 |
|
---|
164 | the Unicode Character Database, does <i>not</i> mean that the value of that
|
---|
165 |
|
---|
166 | field will never change. Corrections and extensions to the standard in the
|
---|
167 |
|
---|
168 | future may require minor changes to normative values, even though the Unicode
|
---|
169 |
|
---|
170 | Technical Committee strives to minimize such changes. An<i> informative </i>property
|
---|
171 |
|
---|
172 | or field is strongly recommended, but a conformant implementation is free to use
|
---|
173 |
|
---|
174 | or change such values as it may require while still being conformant to the
|
---|
175 |
|
---|
176 | standard. Particular implementations may choose to override the properties and
|
---|
177 |
|
---|
178 | mappings that are not normative. In that case, it is up to the implementer to
|
---|
179 |
|
---|
180 | establish a protocol to convey that information.</p>
|
---|
181 |
|
---|
182 | <h2>Files</h2>
|
---|
183 |
|
---|
184 | <p>The following summarizes the files in the Unicode Character Database. For
|
---|
185 |
|
---|
186 | more information about these files, see the referenced technical report or
|
---|
187 |
|
---|
188 | section of Unicode Standard, Version 3.0.</p>
|
---|
189 |
|
---|
190 | <p><b>UnicodeData.txt (Chapter 4)</b>
|
---|
191 |
|
---|
192 | <ul>
|
---|
193 |
|
---|
194 | <li>The main file in the Unicode Character Database.</li>
|
---|
195 |
|
---|
196 | <li>For detailed information on the format, see <a href="UnicodeData.html">UnicodeData.html</a>.
|
---|
197 |
|
---|
198 | This file also characterizes which properties are normative and which are
|
---|
199 |
|
---|
200 | informative.</li>
|
---|
201 |
|
---|
202 | </ul>
|
---|
203 |
|
---|
204 | <p><b>PropList.txt (Chapter 4)</b>
|
---|
205 |
|
---|
206 | <ul>
|
---|
207 |
|
---|
208 | <li>Additional informative properties list: <i>Alphabetic, Ideographic,</i>
|
---|
209 |
|
---|
210 | and <i>Mathematical</i>, among others.</li>
|
---|
211 |
|
---|
212 | </ul>
|
---|
213 |
|
---|
214 | <p><b>SpecialCasing.txt (Chapter 4)</b>
|
---|
215 |
|
---|
216 | <ul>
|
---|
217 |
|
---|
218 | <li>List of informative special casing properties, including one-to-many
|
---|
219 |
|
---|
220 | mappings such as SHARP S => "SS", and locale-specific mappings,
|
---|
221 |
|
---|
222 | such as for Turkish <i>dotless i</i>.</li>
|
---|
223 |
|
---|
224 | </ul>
|
---|
225 |
|
---|
226 | <p><b>Blocks.txt (Chapter 14)</b>
|
---|
227 |
|
---|
228 | <ul>
|
---|
229 |
|
---|
230 | <li>List of normative block names.</li>
|
---|
231 |
|
---|
232 | </ul>
|
---|
233 |
|
---|
234 | <p><b>Jamo.txt (Chapter 4)</b>
|
---|
235 |
|
---|
236 | <ul>
|
---|
237 |
|
---|
238 | <li>List of normative Jamo short names, used in deriving HANGUL SYLLABLE names
|
---|
239 |
|
---|
240 | algorithmically.</li>
|
---|
241 |
|
---|
242 | </ul>
|
---|
243 |
|
---|
244 | <p><b>ArabicShaping.txt (Section 8.2)</b>
|
---|
245 |
|
---|
246 | <ul>
|
---|
247 |
|
---|
248 | <li>Basic Arabic and Syriac character shaping properties, such as initial,
|
---|
249 |
|
---|
250 | medial and final shapes. These properties are normative for minimal shaping
|
---|
251 |
|
---|
252 | of Arabic and Syriac. </li>
|
---|
253 |
|
---|
254 | </ul>
|
---|
255 |
|
---|
256 | <p><b>NamesList.txt (Chapter 14)</b>
|
---|
257 |
|
---|
258 | <ul>
|
---|
259 |
|
---|
260 | <li>This file duplicates some of the material in the UnicodeData file, and
|
---|
261 |
|
---|
262 | adds informative annotations uses in the character charts, as printed in the
|
---|
263 |
|
---|
264 | Unicode Standard. </li>
|
---|
265 |
|
---|
266 | <li><b>Note: </b>The information in NamesList.txt and Index.txt files matches
|
---|
267 |
|
---|
268 | the appropriate version of the book. Changes in the Unicode Character
|
---|
269 |
|
---|
270 | Database since then may not be reflected in these files, since they are
|
---|
271 |
|
---|
272 | primarily of archival interest.</li>
|
---|
273 |
|
---|
274 | </ul>
|
---|
275 |
|
---|
276 | <p><b>Index.txt (Chapter 14)</b>
|
---|
277 |
|
---|
278 | <ul>
|
---|
279 |
|
---|
280 | <li>Informative index to Unicode characters, as printed in the Unicode
|
---|
281 |
|
---|
282 | Standard</li>
|
---|
283 |
|
---|
284 | <li><b>Note: </b>The information in NamesList.txt and Index.txt files matches
|
---|
285 |
|
---|
286 | the appropriate version of the book. Changes in the Unicode Character
|
---|
287 |
|
---|
288 | Database since then may not be reflected in these files, since they are
|
---|
289 |
|
---|
290 | primarily of archival interest.</li>
|
---|
291 |
|
---|
292 | </ul>
|
---|
293 |
|
---|
294 | <p><b>CompositionExclusions.txt (<a href="http://www.unicode.org/unicode/reports/tr15/">UTR#15
|
---|
295 |
|
---|
296 | Unicode Normalization Forms</a>)</b>
|
---|
297 |
|
---|
298 | <ul>
|
---|
299 |
|
---|
300 | <li>Normative properties for normalization.</li>
|
---|
301 |
|
---|
302 | </ul>
|
---|
303 |
|
---|
304 | <p><b>LineBreak.txt (<a href="http://www.unicode.org/unicode/reports/tr14/">UTR
|
---|
305 |
|
---|
306 | #14: Line Breaking Properties</a>)</b>
|
---|
307 |
|
---|
308 | <ul>
|
---|
309 |
|
---|
310 | <li>Normative and informative properties for line breaking. To see which
|
---|
311 |
|
---|
312 | properties are informative and which are normative, consult UTR#14.</li>
|
---|
313 |
|
---|
314 | </ul>
|
---|
315 |
|
---|
316 | <p><b>EastAsianWidth.txt (<a href="http://www.unicode.org/unicode/reports/tr11/">UTR
|
---|
317 |
|
---|
318 | #11: East Asian Character Width</a>)</b>
|
---|
319 |
|
---|
320 | <ul>
|
---|
321 |
|
---|
322 | <li>Informative properties for determining the choice of wide vs. narrow
|
---|
323 |
|
---|
324 | glyphs in East Asian contexts.</li>
|
---|
325 |
|
---|
326 | </ul>
|
---|
327 |
|
---|
328 | <p><b>diffXvY.txt</b>
|
---|
329 |
|
---|
330 | <ul>
|
---|
331 |
|
---|
332 | <li>Mechanically-generated informative files containing accumulated
|
---|
333 |
|
---|
334 | differences between successive versions of UnicodeData.txt</li>
|
---|
335 |
|
---|
336 | </ul>
|
---|
337 |
|
---|
338 |
|
---|
339 |
|
---|
340 | </body>
|
---|
341 |
|
---|
342 |
|
---|
343 |
|
---|
344 | </html>
|
---|
345 |
|
---|