Unicode 6.0 Web Bookmarks About this page This page contains hyperlinks to The Unicode Standard, Version 6.0. The Unicode 6.0.0 page lists the contents with links to each PDF file. Preface Why Unicode? What's New? Support for Languages and Symbol Sets Conformance Updates Important Property and Behavioral Updates Block Descriptions Data File Descriptions and Updates Stability Policy Updates CJK Industry Standards Detailed Change Information Organization of This Standard Concepts, Architecture, Conformance, and Guidelines Character Block Descriptions Code Charts Appendices References and Index Glossary and Character Index Unicode Standard Annexes The Unicode Character Database Unicode Code Charts Unicode Technical Standards and Unicode Technical Reports Updates and Errata Acknowledgements 1 Introduction Figure 1-1. Wide ASCII 1.1 Coverage Standards Coverage New Characters 1.2 Design Goals Figure 1-2. Unicode Compared to the 2022 Framework 1.3 Text Handling Characters and Glyphs Text Elements 2 General Structure 2.1 Architectural Context Basic Text Processes Text Elements, Characters, and Text Processes Figure 2-1. Text Elements and Characters Text Processes and Encoding Character Identity 2.2 Unicode Design Principles Table 2-1. The 10 Unicode Design Principles Universality Efficiency Characters, Not Glyphs Figure 2-2. Characters Versus Glyphs Table 2-2. User-Perceived Characters with Multiple Code Points Figure 2-3. Unicode Character Code to Rendered Glyphs Semantics Plain Text Logical Order Figure 2-4. Bidirectional Ordering Figure 2-5. Writing Direction and Numbers Unification Figure 2-6. Typeface Variation for the Bone Character Dynamic Composition Figure 2-7. Dynamic Composition Equivalent Sequences Stability Convertibility 2.3 Compatibility Characters Usage Allocation Compatibility Variants Compatibility Decomposable Characters Compatibility Character Versus Compatibility Decomposable Character 2.4 Code Points and Characters Figure 2-8. Abstract and Encoded Characters Types of Code Points Table 2-3. Types of Code Points Control Codes Noncharacters Private Use Surrogates Restricted Interchange Code Point Semantics 2.5 Encoding Forms Non-overlap Figure 2-9. Overlap in Legacy Mixed-Width Encodings Figure 2-10. Boundaries and Interpretation Conformance Examples Figure 2-11. Unicode Encoding Forms UTF-32 Fixed Width Preferred Usage UTF-16 Optimized for BMP Supplementary Characters and Surrogates Preferred Usage Origin Collation UTF-8 Byte-Oriented Variable Width ASCII Transparency Preferred Usage Self-synchronizing Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 UTF-32 Versus UTF-16 Characters Versus Code Points UTF-8 Binary Sorting 2.6 Encoding Schemes Byte Order Table 2-4. The Seven Unicode Encoding Schemes Encoding Scheme Versus Encoding Form Examples Figure 2-12. Unicode Encoding Schemes 2.7 Unicode Strings 2.8 Unicode Allocation Planes Basic Multilingual Plane Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Private Use Planes Allocation Areas and Character Blocks Allocation Areas Blocks Allocation Order Assignment of Code Points 2.9 Details of Allocation Figure 2-13. Unicode Allocation Plane 0 (BMP) Figure 2-14. Allocation on the BMP ASCII and Latin-1 Compatibility Area General Scripts Area Punctuation and Symbols Area Supplementary General Scripts Area CJK Miscellaneous Area CJKV Ideographs Area General Scripts Area (Asia and Africa) Hangul Area Surrogates Area Private Use Area Compatibility and Specials Area Plane 1 (SMP) Figure 2-15. Allocation on Plane 1 General Scripts Areas General Scripts Areas (RTL) Cuneiform and Hieroglyphic Area Ideographic Scripts Area Symbols Areas Plane 2 (SIP) Other Planes 2.10 Writing Direction Figure 2-16. Writing Directions Bidirectional Vertical Boustrophedon Other Historical Directionalities 2.11 Combining Characters Combining Characters Diacritics Symbol Diacritics Enclosing Combining Marks Figure 2-17. Combining Enclosing Marks for Symbols Script-Specific Combining Characters Sequence of Base Characters and Diacritics Figure 2-18. Sequence of Base Characters and Diacritics Ordering Indic Vowel Signs Figure 2-19. Reordered Indic Vowel Signs Properties Figure 2-20. Properties and Combining Character Sequences Multiple Combining Characters Figure 2-21. Stacking Sequences Table 2-5. Interaction of Combining Characters Table 2-6. Nondefault Stacking Ligated Multiple Base Characters Figure 2-22. Ligated Multiple Base Characters Exhibiting Nonspacing Marks in Isolation "Characters" and Grapheme Clusters 2.12 Equivalent Sequences and Normalization Figure 2-23. Equivalent Sequences Normalization Figure 2-24. Canonical Ordering Decompositions Types of Decomposables Examples Figure 2-25. Types of Decomposables Non-decomposition of Overlaid Diacritics Security Issue 2.13 Special Characters and Noncharacters Special Noncharacter Code Points Byte Order Mark (BOM) Unicode Signature Layout and Format Control Characters The Replacement Character Control Codes 2.14 Conforming to the Unicode Standard Characteristics of Conformant Implementations Unacceptable Behavior Acceptable Behavior Supported Subsets 3 Conformance 3.1 Versions of the Unicode Standard Stability Version Numbering Major and Minor Versions Update Version Errata and Corrigenda Errata Corrigenda References to the Unicode Standard Precision in Version Citation References to Unicode Character Properties References to Unicode Algorithms 3.2 Conformance Requirements Code Points Unassigned to Abstract Characters Interpretation Modification Character Encoding Forms Character Encoding Schemes Bidirectional Text Normalization Forms Normative References Unicode Algorithms Default Casing Algorithms Unicode Standard Annexes 3.3 Semantics Definitions Character Identity and Semantics 3.4 Characters and Encoding Table 3-1. Named Unicode Algorithms 3.5 Properties Types of Properties Property Values Classification of Properties by Their Values Property Status Table 3-2. Normative Character Properties Table 3-3. Informative Character Properties Context Dependence Stability of Properties Simple and Derived Properties Property Aliases Private Use 3.6 Combination Combining Character Sequences Grapheme Clusters Application of Combining Marks Figure 3-1. Enclosing Marks Combining Marks and Korean Syllables 3.7 Decomposition Compatibility Decomposition Canonical Decomposition 3.8 Surrogates 3.9 Unicode Encoding Forms Table 3-4. Examples of Unicode Encoding Forms UTF-32 UTF-16 Table 3-5. UTF-16 Bit Distribution UTF-8 Table 3-6. UTF-8 Bit Distribution Table 3-7. Well-Formed UTF-8 Byte Sequences Encoding Form Conversion Constraints on Conversion Processes Best Practices for Using U+FFFD Table 3-8. Use of U+FFFD in UTF-8 Conversion 3.10 Unicode Encoding Schemes Table 3-9. Summary of UTF-16BE, UTF-16LE, and UTF-16 Table 3-10. Summary of UTF-32BE, UTF-32LE, and UTF-32 3.11 Normalization Forms Normalization Stability Combining Classes Specification of Unicode Normalization Forms Starters Table 3-11. Combining Marks and Starter Status Canonical Ordering Algorithm Table 3-12. Reorderable Pairs Canonical Composition Algorithm Definition of Normalization Forms 3.12 Conjoining Jamo Behavior Definitions Hangul Syllable Boundary Determination Table 3-13. Hangul Syllable No-Break Rules Standard Korean Syllables Transforming into Standard Korean Syllables Examples. Table 3-14. Korean Syllable Break Examples Hangul Syllable Composition Example Hangul Syllable Decomposition Example Hangul Syllable Name Generation 3.13 Default Case Algorithms Tailoring Definitions Table 3-15. Context Specification for Casing Default Case Conversion Default Case Folding Default Case Detection Table 3-16. Case Detection Examples Default Caseless Matching 4 Character Properties Status and Attributes Consistency of Properties 4.1 Unicode Character Database Unihan Database Stability Aliases UCD in XML Online Availability 4.2 Case Definitions of Case and Casing Table 4-1. Relationship of Casing Definitions Table 4-2. Case Function Values for Strings Case Mapping Table 4-3. Sources for Case Mapping Information 4.3 Combining Classes Figure 4-1. Positions of Common Combining Marks Reordrant, Split, and Subjoined Combining Marks Reordrant Class Zero Combining Marks Table 4-4. Class Zero Combining Marks—Reordrant Table 4-5. Thai, Lao, and Tai Viet Logical Order Exceptions Split Class Zero Combining Marks Table 4-6. Class Zero Combining Marks—Split Subjoined Class Zero Combining Marks Table 4-7. Class Zero Combining Marks—Subjoined Strikethrough Class Zero Combining Marks Table 4-8. Class Zero Combining Marks—Strikethrough 4.4 Directionality 4.5 General Category Table 4-9. General Category 4.6 Numeric Value Decimal Digits Script-Specific Digits Ideographic Numeric Values Table 4-10. Primary Numeric Ideographs Table 4-11. Ideographs Used as Accounting Numbers 4.7 Bidi Mirrored 4.8 Name Stability Character Name Syntax Names as Identifiers Character Name Matching Named Character Sequences Character Name Aliases Unicode Name Property Formal Definition of the Name Property Name Uniqueness Interpretation of Field 1 of UnicodeData.txt Control Codes Code Point Labels Table 4-12. Construction of Code Point Labels Use of Character Names in APIs and User Interfaces Use in APIs User Interfaces 4.9 Unicode 1.0 Names 4.10 Letters, Alphabetic, and Ideographic Letters and Syllables Alphabetic Ideographic 4.11 Properties Related to Text Boundaries 4.12 Characters with Unusual Properties Table 4-13. Unusual Properties 5 Implementation Guidelines 5.1 Data Structures for Character Conversion Issues Multistage Tables Flat Tables. Ranges Two-Stage Tables Figure 5-1. Two-Stage Tables Optimized Two-Stage Table Multistage Table Tuning 5.2 Programming Languages and Data Types Unicode Data Types for C ANSI/ISO C wchar_t 5.3 Unknown and Missing Characters Reserved and Private-Use Character Codes Interpretable but Unrenderable Characters Default Property Values Default Ignorable Code Points Interacting with Downlevel Systems 5.4 Handling Surrogate Pairs in UTF-16 Strategies for Surrogate Pair Support 5.5 Handling Numbers Figure 5-2. CJK Ideographic Numbers 5.6 Normalization Alternative Spellings Normalization Figure 5-3. Normalization 5.7 Compression 5.8 Newline Guidelines Definitions Table 5-1. Hex Values for Acronyms Encoding Notation EBCDIC Newline Function Table 5-2. NLF Platform Correlations Line Separator and Paragraph Separator Recommendations Converting from Other Character Code Sets Interpreting Characters in Text Converting to Other Character Code Sets Input and Output Page Separator 5.9 Regular Expressions 5.10 Language Information in Plain Text Requirements for Language Tagging Language Tags and Han Unification Typical Scenarios 5.11 Editing and Selection Consistent Text Elements Cluster Boundaries Figure 5-4. Consistent Character Boundaries Stacked Boundaries Atomic Character Boundaries. Linear Boundaries Nonlinear Boundaries 5.12 Strategies for Handling Nonspacing Marks Rendering Other Processes Keyboard Input Figure 5-5. Dead Keys Versus Handwriting Sequence Truncation Figure 5-6. Truncating Grapheme Clusters 5.13 Rendering Nonspacing Marks Figure 5-7. Inside-Out Rule Fallback Rendering Figure 5-8. Fallback Rendering Bidirectional Positioning Figure 5-9. Bidirectional Placement Justification Figure 5-10. Justification Canonical Equivalence Table 5-3. Typing Order Differing from Canonical Order Table 5-4. Permuting Combining Class Weights Positioning Methods Positioning with Ligatures Figure 5-11. Positioning with Ligatures Positioning with Contextual Forms Figure 5-12. Positioning with Contextual Forms Positioning with Enhanced Kerning Figure 5-13. Positioning with Enhanced Kerning 5.14 Locating Text Element Boundaries 5.15 Identifiers 5.16 Sorting and Searching Culturally Expected Sorting and Searching Language-Insensitive Sorting Searching Sublinear Searching Figure 5-14. Sublinear Searching 5.17 Binary Order UTF-8 in UTF-16 Order UTF-16 in UTF-8 Order 5.18 Case Mappings Titlecasing Complications for Case Mapping Change in Length Greek iota subscript Context-dependent Case Mappings Locale-dependent Case Mappings Figure 5-15. Uppercase Mapping for Turkish I Figure 5-16. Lowercase Mapping for Turkish I Caseless Characters German sharp s Figure 5-17. Casing of German Sharp S Reversibility Caseless Matching Stability Normalization and Casing Table 5-5. Casing and Normalization in Strings 5.19 Mapping Compatibility Variants Confusables 5.20 Unicode Security Alternate Encodings Spoofing 5.21 Default Ignorable Code Points Stateful Format Controls Table 5-6. Paired Stateful Controls Table 5-7. Paired Stateful Controls (Deprecated) 5.22 Best Practice for U+FFFD Substitution 6 Writing Systems and Punctuation Scripts and Blocks Scripts and Writing Systems Punctuation 6.1 Writing Systems Alphabets Abjads Syllabaries Abugidas Figure 6-1. Overriding Inherent Vowels Logosyllabaries Typology of Scripts in the Unicode Standard Table 6-1. Typology of Scripts in the Unicode Standard Notational Systems 6.2 General Punctuation Use and Interpretation Rendering Writing Direction Figure 6-2. Forms of CJK Punctuation Layout Controls Encoding Characters with Multiple Semantic Values Blocks Devoted to Punctuation Format Control Characters Space Characters Table 6-2. Unicode Space Characters Dashes and Hyphens Table 6-3. Unicode Dash Characters Soft Hyphen Tilde. Dictionary Abbreviation Symbols Paired Punctuation Mirroring of Paired Punctuation. Quotation Marks and Brackets Language-Based Usage of Quotation Marks European Usage Figure 6-3. European Quotation Marks East Asian Usage Table 6-4. East Asian Quotation Marks Glyph Variation. Figure 6-4. Asian Quotation Marks Table 6-5. Opening and Closing Forms Overloaded Character Codes Consequences for Semantics Apostrophes Letter Apostrophe Punctuation Apostrophe Other Punctuation Hyphenation Point Word Separator Middle Dot Fraction Slash Spacing Overscores and Underscores Doubled Punctuation Period or Full Stop Ellipsis Vertical Ellipsis Leader Dots Other Basic Latin Punctuation Marks Canonical Equivalence Issues for Greek Punctuation Bullets Paragraph Marks Numeric Separators. Commercial Minus At Sign Table 6-6. Names for the @ Archaic Punctuation and Editorial Marks Archaic Punctuation Editorial Marks New Testament Editorial Marks Ancient Greek Editorial Marks Figure 6-5. Examples of Ancient Greek Editorial Marks Figure 6-6. Use of Greek Paragraphos Double Oblique Hyphen Indic Punctuation Dandas Table 6-7. Unicode Danda Characters CJK Punctuation Figure 6-7. CJK Parentheses Sesame Dots Unknown or Unavailable Ideographs CJK Compatibility Forms Vertical Forms Styled Overscores and Underscores Small Form Variants Fullwidth and Halfwidth Variants 7 European Alphabetic Scripts 7.1 Latin Languages Diacritical Marks. Alternative Glyphs. Figure 7-1. Alternative Glyphs in Latin Variations in Diacritical Marks Latvian Cedilla Cedilla and Comma Below in Turkish and Romanian Exceptional Case Pairs Diacritics on i and j Figure 7-2. Diacritics on i and j Vietnamese Figure 7-3. Vietnamese Letters and Tone Marks Standards. Related Characters Letters of Basic Latin: U+0041–U+007A Letters of the Latin-1 Supplement: U+00C0–U+00FF Languages Ordinals Latin Extended-A: U+0100–U+017F Compatibility Digraphs Languages Latin Extended-B: U+0180–U+024F Arrangement Croatian Digraphs Matching Serbian Cyrillic Letters Pinyin Diacritic–Vowel Combinations Case Pairs Caseless Letters Glottal Stop IPA Extensions: U+0250–U+02AF Standards Unifications IPA Alternates Case Pairs Typographic Variants Affricate Digraph Ligatures Arrangement Phonetic Extensions: U+1D00–U+1DBF Typographic Features of the UPA. Other Phonetic Extensions Digraph for th Latin Extended Additional: U+1E00–U+1EFF Capital Sharp S Vietnamese Vowel Plus Tone Mark Combinations Latin Extended-C: U+2C60–U+2C7F Uighur Claudian Letters Latin Extended-D: U+A720–U+A7FF Egyptological Transliteration Historic Mayan Letters European Medievalist Letters Insular and Celticist Letters Orthographic Letter Additions Latvian Letters Ancient Roman Epigraphic Letters Latin Ligatures: U+FB00–U+FB06 7.2 Greek Greek: U+0370–U+03FF Standards Polytonic Greek Nonspacing Marks Table 7-1. Nonspacing Marks Used with Greek Iota Variant Letterforms Figure 7-4. Variations in Greek Capital Letter Upsilon Representative Glyphs for Greek Phi Greek Letters as Symbols Symbols Versus Numbers Compatibility Punctuation Historic Letters Coptic-Unique Letters Related Characters Greek Extended: U+1F00–U+1FFF Spacing Diacritics Table 7-2. Greek Spacing and Nonspacing Pairs Ancient Greek Numbers: U+10140–U+1018F Acrophonic Numerals Other Numerical Symbols Symbol for Zero 7.3 Coptic Development of the Coptic Script Casing Font Styles Characters for Cryptogrammic Use Crossed Shei Supralineation Combining Diacritical Marks Punctuation Numerical Use of Letters Figure 7-5. Coptic Numerals 7.4 Cyrillic Historic Letterforms Glagolitic Cyrillic: U+0400–U+04FF Standards Extended Cyrillic Abkhasian Palochka Cyrillic Supplement: U+0500–U+052F Komi Kurdish Letters Cyrillic Extended-A: U+2DE0–U+2DFF Titlo Letters Cyrillic Extended-B: U+A640–U+A69F Numeric Enclosing Signs Old Abkhasian Letters 7.5 Glagolitic Glyph Forms Ordering Punctuation and Diacritics Numerical Use of Letters 7.6 Armenian Orthography User Community Punctuation Preferred Characters Ligatures 7.7 Georgian Script Forms Case Forms Mtavruli Style Figure 7-6. Georgian Scripts and Casing Punctuation Historic Punctuation 7.8 Modifier Letters Case and Modifier Letters General Category Blocks Names Spacing Modifier Letters: U+02B0–U+02FF Phonetic Usage Encoding Principles Superscript Letters Spacing Clones of Diacritics Rhotic Hook Tone Letters Figure 7-7. Tone Letters Modifier Tone Letters: U+A700–U+A71F 7.9 Combining Marks Sequence of Base Letters and Combining Marks Multiple Semantics Glyphic Variation Overlaid Diacritics Marks as Spacing Characters Spacing Clones of Diacritical Marks Relationship to ISO/IEC 8859-1 Diacritics Positioned Over Two Base Characters Figure 7-8. Double Diacritics Figure 7-9. Positioning of Double Diacritics Figure 7-10. Use of CGJ with Double Diacritics Combining Marks with Ligatures Figure 7-11. Interaction of Combining Marks with Ligatures Combining Diacritical Marks: U+0300–U+036F Standards Underlining and Overlining Combining Diacritical Marks Supplement: U+1DC0–U+1DFF Combining Marks for Symbols: U+20D0–U+20FF Figure 7-12. Use of Vertical Line Overlay for Negation Enclosing Marks Combining Half Marks: U+FE20–U+FE2F Figure 7-13. Double Diacritics and Half Marks Combining Marks in Other Blocks 8 Middle Eastern Scripts 8.1 Hebrew Hebrew: U+0590–U+05FF Directionality Cursive. Standards Vowels and Other Marks of Pronunciation Shin and Sin Final (Contextual Variant) Letterforms Yiddish Digraphs Punctuation Cantillation Marks Positioning Meteg Atnah Hafukh and Qamats Qatan Holam Male and Holam Haser Puncta Extraordinaria Nun Hafukha Currency Symbol Alphabetic Presentation Forms: U+FB1D–U+FB4F Use of Wide Letters 8.2 Arabic Arabic: U+0600–U+06FF Figure 8-1. Directionality and Cursive Connection Directionality Standards Encoding Principles Punctuation The Non-joiner and the Joiner Figure 8-2. Using a Joiner Figure 8-3. Using a Non-joiner Figure 8-4. Combinations of Joiners and Non-joiners Harakat (Vowel) Nonspacing Marks Figure 8-5. Placement of Harakat Arabic-Indic Digits Table 8-1. Arabic Digit Names Table 8-2. Glyph Variation in Eastern Arabic-Indic Digits Extended Arabic Letters Koranic Annotation Signs Additional Vowel Marks Honorifics Arabic Mathematical Symbols Date Separator Full Stop Currency Symbols End of Ayah Other Signs Spanning Numbers Figure 8-6. Arabic Year Sign Poetic Verse Sign Arabic Cursive Joining Minimum Rendering Requirements Joining Types Table 8-3. Primary Arabic Joining Types Table 8-4. Derived Arabic Joining Types Joining Rules Table 8-5. Arabic Glyph Types Arabic Ligatures Ligature Classes Table 8-6. Arabic Obligatory Ligature Joining Groups Ligature Rules Table 8-7. Arabic Ligature Notation Optional Features Arabic Joining Groups Dual-Joining Table 8-8. Dual-Joining Arabic Characters Right-Joining Table 8-9. Right-Joining Arabic Characters Letter heh Letter yeh Table 8-10. Forms of the Arabic Letter yeh Combining Hamza Above Jawi Arabic Supplement: U+0750–U+077F Marwari Arabic Presentation Forms-A: U+FB50–U+FDFF Ornate Parentheses Nuktas Arabic Presentation Forms-B: U+FE70–U+FEFF Spacing and Tatweel Forms of Arabic Diacritics Zero Width No-Break Space 8.3 Syriac Syriac: U+0700–U+074F Syriac Language Languages Using the Syriac Script. Shaping Directionality Syriac Type Styles Character Names Syriac Abbreviation Mark Figure 8-7. Syriac Abbreviation Figure 8-8. Use of SAM Ligatures and Combining Characters Diacritic Marks and Vowels Punctuation Digits Harklean Marks Dalath and Rish Semkath Vowel Marks Miscellaneous Diacritics. Table 8-11. Miscellaneous Syriac Diacritic Use Use of Characters of the Arabic Block Syriac Shaping Minimum Rendering Requirements Joining Types Table 8-12. Syriac Final Alaph Glyph Types Syriac Character Joining Groups Table 8-13. Dual-Joining Syriac Characters Table 8-14. Right-Joining Syriac Characters Table 8-15. Syriac Alaph Glyph Forms Ligature Classes Table 8-16. Syriac Ligatures 8.4 Samaritan Directionality Vowel Signs Consonant Modifiers Punctuation Table 8-17. Samaritan Performative Punctuation Marks 8.5 Thaana Directionality Vowels Table 8-18. Thaana Glyph Placement Numerals Punctuation Character Names and Arrangement 9 South Asian Scripts-I 9.1 Devanagari Devanagari: U+0900–U+097F Standards Encoding Principles Principles of the Devanagari Script Rendering Devanagari Characters Consonant Letters Independent Vowel Letters Dependent Vowel Signs (Matras) Vowel Letters Table 9-1. Devanagari Vowel Letters Virama (Halant) Figure 9-1. Dead Consonants in Devanagari Consonant Conjuncts Figure 9-2. Conjunct Formations in Devanagari Explicit Virama (Halant) Figure 9-3. Preventing Conjunct Forms in Devanagari Explicit Half-Consonants Figure 9-4. Half-Consonants in Devanagari Figure 9-5. Independent Half-Forms in Devanagari Figure 9-6. Half-Consonants in Oriya Consonant Forms Figure 9-7. Consonant Forms in Devanagari and Oriya Rendering Devanagari Rules for Rendering Notation Dead Consonant Rule Consonant RA Rules Modifier Mark Rules Ligature Rules Memory Representation and Rendering Order Figure 9-8. Rendering Order in Devanagari Sample Half-Forms Table 9-2. Sample Devanagari Half-Forms Sample Ligatures Table 9-3. Sample Devanagari Ligatures Sample Half-Ligature Forms Table 9-4. Sample Devanagari Half-Ligature Forms Language-Specific Allographs Figure 9-9. Marathi Allographs Combining Marks Devanagari Digits, Punctuation, and Symbols Digits Punctuation Other Symbols Extensions in the Main Devanagari Block Sindhi Letters Konkani Bodo, Dogri, and Maithili Figure 9-10. Use of Apostrophe in Bodo, Dogri and Maithili Figure 9-11. Use of Avagraha in Dogri Kashmiri Letters Prishthamatra Orthography Table 9-5. Prishthamatra Orthography Devanagari Extended: U+A8E0-U+A8FF Cantillation Marks for the SZmaveda Marks of Nasalization Editorial Marks Vedic Extensions: U+1CD0-U+1CFF Tone Marks Diacritics for the Visarga. Nasalization Marks Ardhavisarga 9.2 Bengali (Bangla) Virama (Hasant) Vowel Letters Table 9-6. Bengali Vowel Letters Two-Part Vowel Signs Special Characters Historic Characters Rendering Behavior Consonant-Vowel Ligatures Table 9-7. Bengali Consonant-Vowel Combinations Figure 9-12. Requesting Bengali Consonant-Vowel Ligature Figure 9-13. Blocking Bengali Consonant-Vowel Ligature Khiya Khanda Ta. Figure 9-14. Bengali Syllable tta Ya-phalaa Interaction of Repha and Ya-phalaa Punctuation Truncation Table 9-8. Use of Apostrophe in Bangla 9.3 Gurmukhi Encoding Principles Vowel Letters Table 9-9. Gurmukhi Vowel Letters Tones Ordering Rendering Behavior Table 9-10. Gurmukhi Conjuncts Table 9-11. Additional Pairin and Addha Forms in Gurmukhi Table 9-12. Use of Joiners in Gurmukhi Other Symbols Punctuation 9.4 Gujarati Vowel Letters Table 9-13. Gujarati Vowel Letters Rendering Behavior Table 9-14. Gujarati Conjuncts Punctuation 9.5 Oriya Special Characters Vowel Letters Table 9-15. Oriya Vowel Letters Rendering Behavior Table 9-16. Oriya Conjuncts Consonant Forms Vowels Table 9-17. Oriya Vowel Placement Oriya VA and WA. Punctuation and Symbols Fraction Characters 9.6 Tamil Tamil: U+0B80–U+0BFF Virama (Pulli) Figure 9-15. Kssa Ligature in Tamil Rendering of the Tamil Script Tamil Vowels Independent Versus Dependent Vowels Left-Side Vowels Table 9-18. Tamil Vowel Reordering Two-Part Vowels Figure 9-16. Tamil Two-Part Vowels Table 9-19. Tamil Vowel Splitting and Reordering Figure 9-17. Vowel Reordering Around a Tamil Conjunct Tamil Ligatures Ligatures with Vowel i Figure 9-18. Tamil Ligatures with i Ligatures with Vowel u Table 9-20. Tamil Ligatures with u Figure 9-19. Spacing Forms of Tamil u Ligatures with ra Figure 9-20. Tamil Ligatures with ra Ligatures with aa in Traditional Tamil Orthography Figure 9-21. Tamil Ligatures with aa Figure 9-22. Tamil Ligatures with o Ligatures with ai in Traditional Tamil Orthography Figure 9-23. Tamil Ligatures with ai Figure 9-24. Vowel ai in Modern Tamil Tamil aytham Punctuation Tamil Named Character Sequences Table 9-21. Tamil Vowels, Consonants, and Syllables 9.7 Telugu Vowel Letters Table 9-22. Telugu Vowel Letters Rendering Behavior Special Characters Fractions Punctuation 9.8 Kannada Kannada: U+0C80–U+0CFF Principles of the Kannada Script Vowel Letters Table 9-23. Kannada Vowel Letters Consonant Conjuncts Special Characters Kannada Letter LLLA Rendering Kannada Explicit Virama (Halant) Consonant Clusters Involving RA Modifier Mark Rules Avagraha Sign Punctuation 9.9 Malayalam Vowel Letters Table 9-24. Malayalam Vowel Letters Rendering Behavior Table 9-25. Malayalam Orthographic Reform Table 9-26. Malayalam Conjuncts Table 9-27. Candrakala Examples Chillu Characters Table 9-28. Atomic Encoding of Malayalam Chillus Special Cases Involving ra Table 9-29. Malayalam /rr/ and /tt/ Table 9-30. Malayalam /nr/ and /nt/ Dot Reph Historic Characters Special Characters Punctuation 10 South Asian Scripts-II 10.1 Sinhala Vowel Letters Table 10-1. Sinhala Vowel Letters Other Letters for Tamil. Historical Symbols. 10.2 Tibetan General Principles of the Tibetan Script Figure 10-1. Tibetan Syllable Structure Consonants Vowels Coding Order Allographical Considerations Head Position "ra" Full-Form "ra" in Head Position. Subjoined Position "wa", "ya", and "ra" Halanta (Srog-Med) Line Breaking Considerations Tibetan Punctuation Svasti Signs Other Characters Tibetan Half-Numbers Tibetan Transliteration and Transcription of Other Languages Other Signs Traditional Text Formatting and Line Justification Figure 10-2. Justifying Tibetan Tseks Tibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding 10.3 Lepcha Structure Vowels Medials Retroflex Consonants Ordering of Syllable Components Table 10-2. Lepcha Syllabic Structure Rendering Digits Punctuation 10.4 Phags-pa History Basic Structure Syllable Division Candrabindu Figure 10-3. Phags-pa Syllable Om Alternate Letters Numbers Punctuation Positional Variants Table 10-3. Phags-pa Positional Forms of I, U, E, and O Mirrored Variants Table 10-4. Contextual Glyph Mirroring in Phags-pa Table 10-5. Phags-pa Standardized Variants Figure 10-4. Phags-pa Reversed Shaping 10.5 Limbu Consonants Vowels Vowel Length Glottalization Collating Order Glyph Placement Table 10-6. Positions of Limbu Combining Characters Punctuation Digits 10.6 Syloti Nagri Virama and Conjuncts Digits Punctuation Poetry Marks 10.7 Kaithi Standards Styles Rendering Behavior Vowel Letters Consonant Conjuncts Ruled Lines Nukta Punctuation Digits 10.8 Saurashtra Glyph Placement Digits Punctuation Saurashtra Consonant Sign Haaru 10.9 Meetei Mayek Structure Vowel Letters Final Consonants Abbreviations Order Punctuation Digits 10.10 Ol Chiki Structure Digits Punctuation Modifier Letters Glottalization Aspiration Ligatures 10.11 Kharoshthi Kharoshthi: U+10A00–U+10A5F Figure 10-5. Geographical Extent of the Kharoshthi Script Directionality Diacritic Marks and Vowels Numerals Figure 10-6. Kharoshthi Number 1996 Punctuation Word Breaks, Line Breaks, and Hyphenation Sorting Rendering Kharoshthi Figure 10-7. Kharoshthi Rendering Example Combining Vowels Table 10-7. Kharoshthi Vowel Signs Combining Vowel Modifiers Table 10-8. Kharoshthi Vowel Modifiers Combining Consonant Modifiers Table 10-9. Kharoshthi Consonant Modifiers Virama Table 10-10. Examples of Kharoshthi Virama 10.12 Brahmi Brahmi: U+11000–U+1106F Encoding Model Vowel Letters Table 10-11. Brahmi Vowel Letters Rendering Behavior Figure 10-8. Consonant Ligatures in Brahmi Vowel Modifiers Old Tamil Brahmi Bhattiprolu Brahmi Punctuation Numerals Table 10-12. Brahmi Positional Digits 11 Southeast Asian Scripts 11.1 Thai Standards. Encoding Principles. Table 11-1. Glyph Positions in Thai Syllables Rendering of Thai Combining Marks Thai Punctuation Spacing Thai Transcription of Pali and Sanskrit 11.2 Lao Encoding Principles Punctuation Glyph Placement Table 11-2. Glyph Positions in Lao Syllables Additional Letters Rendering of Lao Combining Marks Lao Aspirated Nasals 11.3 Myanmar Myanmar: U+1000–U+109F Standards Encoding Principles Composite Characters Encoding Subranges Conjuncts Kinzi Medial Consonants Asat Contractions Great sa Tall aa Ordering of Syllable Components Table 11-3. Myanmar Syllabic Structure Spacing. Myanmar Extended-A: U+AA60–U+AA7F Khamti Shan Consonants Vowels Tones Table 11-4. Khamti Shan Tone Marks Digits Other Symbols Subjoined Characters Historical Khamti Shan Aiton and Phake Consonants Subjoined Consonants Vowels Ligatures Tones 11.4 Khmer Khmer: U+1780–U+17FF Principles of the Khmer Script Glottal Consonant Table 11-5. Independent Khmer Vowel Characters Subscript Consonants Subscript Independent Vowel Signs Consonant Registers Table 11-6. Two Registers of Khmer Consonants Encoding Principles Subscript Consonant Signs Table 11-7. Khmer Subscript Consonant Signs Dependent Vowel Signs Table 11-8. Khmer Composite Dependent Vowel Signs with Nikahit Independent Vowel Characters Subscript Independent Vowel Signs Table 11-9. Khmer Subscript Independent Vowel Signs Other Signs as Syllabic Components Ligatures Figure 11-1. Common Ligatures in Khmer Multiple Glyphs Figure 11-2. Common Multiple Forms in Khmer Characters Whose Use Is Discouraged Ordering of Syllable Components. Figure 11-3. Examples of Syllabic Order in Khmer Consonant Shifters Ligature Control Figure 11-4. Ligation in Muul Style in Khmer Spacing. Khmer Symbols: U+19E0–U+19FF Symbols 11.5 Tai Le Table 11-10. Tai Le Tone Marks Digits. Table 11-11. Myanmar Digits Punctuation. 11.6 New Tai Lue Syllabic Structure Table 11-12. New Tai Lue Vowel Placement Final Consonants Tones Table 11-13. New Tai Lue Registers and Tones Digits 11.7 Tai Tham Consonants Independent Vowels Dependent Consonant Signs Dependent Vowel Signs Tone Marks Other Combining Marks Digits Punctuation Collating Order Linebreaking 11.8 Tai Viet Structure Visual Order Tone Classes and Tone Marks Final Consonants Symbols and Punctuation Table 11-14. Tai Viet Symbols and Punctuation Word Spacing Collating Order 11.9 Kayah Li Structure Vowels Tones Digits Punctuation 11.10 Cham Structure Independent Vowel Letters Consonants Ordering of Syllable Components Table 11-15. Cham Syllabic Structure Digits Punctuation Line Breaking 11.11 Philippine Scripts Tagalog: U+1700–U+171F Hanun?o: U+1720–U+173F Buhid: U+1740–U+175F Tagbanwa: U+1760–U+177F Principles of the Philippine Scripts Consonant Letters. Independent Vowel Letters. Dependent Vowel Signs. Virama. Directionality. Rendering. Table 11-16. Hanunoo and Buhid Vowel Sign Combinations Punctuation. 11.12 Buginese Structure Ligature Figure 11-5. Buginese Ligature Order Punctuation Numerals 11.13 Balinese Structure Table 11-17. Balinese Base Consonants and Conjunct Forms Table 11-18. Sasak Extensions for Balinese Behavior of ra Figure 11-6. Writing dharma in Balinese Behavior of ra repa Rendering Table 11-19. Balinese Consonant Clusters with u and u: Nukta Ordering Punctuation Hyphenation Musical Symbols Modre Symbols 11.14 Javanese Consonants Independent Vowels Dependent Vowels Figure 11-7. Representation of Javanese Two-Part Vowels Consonant Signs Rendering Digits Punctuation Reduplication Ordering of Syllable Components Linebreaking 11.15 Rejang Structure Rendering Ordering Digits Punctuation 11.16 Batak Structure Rendering Punctuation Linebreaking 11.17 Sundanese Structure Consonant Additions Digits Punctuation Ordering Ordering of Syllable Components Table 11-20. Sundanese Syllabic Structure Rendering 12 East Asian Scripts 12.1 Han CJK Unified Ideographs CJK Standards Table 12-1. Sources for Unified Han Source Label Discrepancies in Version 6.0 Omission of Repertoire for Some Sources Blocks Containing Han Ideographs Table 12-2. Blocks Containing Han Ideographs Table 12-3. Small Extensions to the URO IICore General Characteristics of Han Ideographs Table 12-4. Common Han Characters Terminology Distinguishing Han Character Usage Between Languages Figure 12-1. Han Spelling Figure 12-2. Semantic Context for Han Characters Simplified and Traditional Chinese Dialects and Early Forms of Chinese Sorting Han Ideographs Character Glyphs Principles of Han Unification Three-Dimensional Conceptual Model Figure 12-3. Three-Dimensional Conceptual Model Unification Rules Figure 12-4. CJK Source Separation Table 12-5. Source Encoding for Sword Variants Figure 12-5. Not Cognates, Not Unified Abstract Shape Two-Level Classification Ideographic Component Structure Figure 12-6. Ideographic Component Structure Figure 12-7. The Most Superior Node of an Ideographic Component Ideograph Features Uniqueness or Unification Spatial Positioning Examples Table 12-6. Ideographs Not Unified Table 12-7. Ideographs Unified Han Ideograph Arrangement Table 12-8. Han Ideograph Arrangement Radical-Stroke Indices Mappings for Han Ideographs CJK Unified Ideographs Extension B: U+20000–U+2A6D6 CJK Unified Ideographs Extension C: U+2A700–U+2B734 CJK Unified Ideographs Extension D: U+2B740–U+2B81D CJK Compatibility Ideographs: U+F900–U+FAFF CJK Compatibility Supplement: U+2F800–U+2FA1D Kanbun: U+3190–U+319F Symbols Derived from Han Ideographs CJK and KangXi Radicals: U+2E80–U+2FD5 Standards. Semantics. CJK Additions from HKSCS and GB 18030 CJK Strokes: U+31C0–U+31EF 12.2 Ideographic Description Characters Applicability to Other Scripts Ideographic Description Sequences Figure 12-8. Using the Ideographic Description Characters Equivalence. Interaction with the Ideographic Variation Mark. Rendering. Character Boundaries. Standards. 12.3 Bopomofo Standards Mandarin Tone Marks Table 12-9. Mandarin Tone Marks Standard Mandarin Bopomofo Extended BopomofoExtended Bopomofo Tone Marks. Table 12-10. Minnan and Hakka Tone Marks Rendering of Bopomofo 12.4 Hiragana and Katakana Hiragana: U+3040–U+309F Standards Combining Marks Iteration Marks Vertical Text Digraph Katakana: U+30A0–U+30FF Standards Punctuation-like Characters Vertical Text Digraph Katakana Phonetic Extensions: U+31F0–U+31FF Standards Kana Supplement U+1B000–U+1B0FF Figure 12-9. Japanese Historic Kana for e and ye 12.5 Halfwidth and Fullwidth Forms Unifications 12.6 Hangul Hangul Jamo: U+1100–U+11FF Hangul Jamo Extended-A: U+A960–U+A97F Hangul Jamo Extended-B: U+D7B0–U+D7FF Hangul Compatibility Jamo: U+3130–U+318F Standards Normalization Table 12-11. Separating Jamo Characters Hangul Syllables: U+AC00–U+D7A3 Standards Equivalence Hangul Syllable Composition Hangul Syllable Decomposition Hangul Syllable Name Hangul Syllable Representative Glyph Table 12-12. Line-Based Placement of Jungseong Collation 12.7 Yi Traditional Yi Script Standardized Yi Script Standards Naming Conventions and Order Yi Syllable Iteration Mark Punctuation Rendering Yi Radicals 13 Additional Modern Scripts 13.1 Ethiopic Ethiopic: U+1200–U+137F Basic and Extended Ethiopic. Encoding Principles. Variant Glyph FormsLabialized Subseries Table 13-1. Labialized Forms in Ethiopic -WAA Table 13-2. Labialized Forms in Ethiopic -WE Keyboard Input. Syllable Names Encoding Order and Sorting Word Separators Section Mark Diacritical MarksNumbers Ethiopic Extensions 13.2 Mongolian History Directionality Encoding Principles Figure 13-1. Mongolian Glyph Convergence Cursive Joining Figure 13-2. Mongolian Consonant Ligation Figure 13-3. Mongolian Positional Forms Free Variation Selectors Figure 13-4. Mongolian Free Variation Selector Representative Glyphs Vowel Harmony Figure 13-5. Mongolian Gender Forms Narrow No-Break Space Mongolian Vowel Separator Figure 13-6. Mongolian Vowel Separator Numbers Punctuation Nirugu Syllable Boundary Marker 13.3 Osmanya Structure Ordering Names and Glyphs 13.4 Tifinagh History Source Standards Ordering Directionality Diacritical Marks. Contextual Shaping Figure 13-7. Tifinagh Contextual Shaping Bi-Consonants Figure 13-8. Tifinagh Consonant Joiner and Bi-consonants 13.5 N'Ko Structure Digits Diacritical Marks Table 13-3. N'Ko Tone Diacritics on Vowels Table 13-4. Other N'Ko Diacritic Usage Ordinal Numbers Figure 13-9. Examples of N'Ko Ordinals Punctuation Character Names and Block Name Ordering Rendering Table 13-5. N'Ko Letter Shaping 13.6 Vai Sources Basic Structure Historic Syllables Logograms Digits Punctuation Segmentation Ordering 13.7 Bamum Bamum: U+A6A0–U+A6FF Structure Diacritical Marks Punctuation Digits Bamum Supplement: U+16800–U+16A3F 13.8 Cherokee Tones. Case and Spelling. Numbers. Rendering and Input Punctuation. Standards. 13.9 Canadian Aboriginal Syllabics Canadian Aboriginal Syllabics: U+1400–U+167F Organization Arrangement Extensions Punctuation and Symbols Canadian Aboriginal Syllabics Extended: U+18B0–U+18FF 13.10 Deseret Letter Names and Shapes. Structure. Sorting. Typographic Conventions. Figure 13-10. Short Words Equivalent to Deseret Letter Names Phonetics. Table 13-6. IPA Transcription of Deseret 13.11 Shavian Structure. Collation 13.12 Lisu Structure Tone Letters Table 13-7. Lisu Tone Letters Other Modifier Letters Digits and Separators Punctuation Table 13-8. Punctuation Adopted in Lisu Orthography Linebreaking Word Separation 14 Ancient and Historic Scripts 14.1 Ogham Structure. Rendering. Forfeda (Supplementary Characters) 14.2 Old Italic Directionality Punctuation Numerals GlyphsFigure 14-1. Distribution of Old Italic 14.3 Runic Historical Script Direction The Runic Alphabet Representative Glyphs Unifications Long-Branch and Short-Twig Staveless Runes Punctuation Marks Golden Numbers Encoding 14.4 Gothic Diacritics NumeralsPunctuation 14.5 Old Turkic Structure Directionality Punctuation 14.6 Linear B Linear B Syllabary: U+10000–U+1007F Standards Linear B Ideograms: U+10080–U+100FF Aegean Numbers: U+10100–U+1013F 14.7 Cypriot Syllabary Table 14-1. Similar Characters in Linear B and Cypriot 14.8 Ancient Anatolian Alphabets Lycian: U+10280–U+1029F Carian: U+102A0–U+102DF Lydian: U+10920–U+1093F Lycian Carian Lydian 14.9 Old South Arabian Directionality Structure Segmentation Monograms Numbers Table 14-2. Old South Arabian Numeric Characters Table 14-3. Number Formation in Old South Arabian Names 14.10 Phoenician Directionality Punctuation Stylistic Variation Numerals Names 14.11 Imperial Aramaic Directionality Punctuation Numbers Table 14-4. Number Formation in Aramaic 14.12 Mandaic Structure Punctuation Directionality Shaping and Layout Behavior Table 14-5. Dual-Joining Mandaic Characters Table 14-6. Right-Joining Mandaic Characters Linebreaking 14.13 Inscriptional Parthian and Inscriptional Pahlavi Directionality Shaping and Layout Behavior Table 14-7. Inscriptional Parthian Shaping Behavior Numbers Heterograms 14.14 Avestan Directionality Shaping Behavior Table 14-8. Avestan Shaping Behavior Punctuation 14.15 Ugaritic Variant Glyphs Ordering. Character Names 14.16 Old Persian Directionality Repertoire Numerals Variants 14.17 Sumero-Akkadian Cuneiform: U+12000–U+123FF Early History of Cuneiform Geographic Range Table 14-9. Cuneiform Script Usage Sources and Coverage Simple Signs Complex and Compound Signs Mergers and Splits Glyph Variants Acquiring Independent Semantic Status Formatting Ordering Other Standards Cuneiform Numbers and Punctuation: U+12400–U+1247F Cuneiform Punctuation Cuneiform Numerals 14.18 Egyptian Hieroglyphs Structure Directionality Rendering Table 14-10. Hieroglyphic Character Sequence Figure 14-2. Interpretion of Hieroglyphic Markup Hieratic Fonts Repertoire Character Names Sign Classification Enclosures Numerals 15 Symbols 15.1 Currency Symbols Unification Figure 15-1. Alternative Glyphs for Dollar Sign Fonts. Table 15-1. Currency Symbols Encoded in Other Blocks Lira Sign Yen and Yuan Euro Sign Indian Rupee Sign 15.2 Letterlike Symbols Letterlike Symbols: U+2100–U+214F Numero Sign Figure 15-2. Alternative Glyphs for Numero Sign Unit Symbols Compatibility Styles Standards Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF Words Used as Variables. Mathematical Alphabets Basic Set of Alphanumeric Characters Additional Characters Dotless Characters Figure 15-3. Wide Mathematical Accents Semantic Distinctions. Figure 15-4. Style Variants and Semantic Distinctions in Mathematics Mathematical AlphabetsTable 15-2. Mathematical Alphanumeric Symbols Compatibility Decompositions Fonts Used for Mathematical Alphabets Fraktur Math Italics Figure 15-5. Easily Confused Shapes for Mathematical Glyphs Hard-to-Distinguish Letters Font Support for Combining Diacritics Type Style for Script CharactersDouble-Struck Characters 15.3 Number Forms Number Forms: U+2150–U+218F Fractions Figure 15-6. Alternate Forms of Vulgar Fractions Roman Numerals Common Indic Number Forms: U+A830–U+A83F Rumi Numeral Forms: U+10E60–U+10E7E CJK Number Forms Chinese Counting-Rod Numerals Suzhou-Style Numerals Superscripts and Subscripts: U+2070–U+209F Parsing of Superscript and Subscript Digits Standards Superscripts and Subscripts in Other Blocks 15.4 Mathematical Symbols Semantics. Mathematical Property Mathematical Operators: U+2200–U+22FF Standards Encoding Principles Unifications Greek-Derived Symbols N-ary Operators Invisible Operators Minus Sign Delimiters Bidirectional Layout Other Elements of Mathematical Notation Supplements to Mathematical Symbols and Arrows Standards. Supplemental Mathematical Operators: U+2A00–U+2AFF Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF Mathematical Brackets. Long Division Miscellaneous Mathematical Symbols-B: U+2980–U+29FF Wiggly Fence. Miscellaneous Symbols and Arrows: U+2B00–U+2B7F Arrows: U+2190–U+21FF Bidirectional Layout Standards Unifications Supplemental Arrows Long Arrows. Standardized Variants of Mathematical Symbols Change in Representative Glyphs for U+2278 and U+2279 15.5 Invisible Mathematical Operators Invisible Separator Invisible Multiplication Invisible Plus Invisible Function Application 15.6 Technical Symbols Control Pictures: U+2400–U+243F Code Points for Pictures for Control Codes Pictures for ASCII Space Standards Miscellaneous Technical: U+2300–U+23FF Keytop Labels. Floor and Ceiling Crops and Quine Corners Figure 15-7. Usage of Crops and Quine Corners Angle Brackets. APL Functional Symbols Symbol Pieces. Table 15-3. Use of Mathematical Symbol Pieces Horizontal Brackets Terminal Graphics Characters. Decimal Exponent Symbol Figure 15-8. Usage of the Decimal Exponent Symbol Dental Symbols. Metrical Symbols Electrotechnical Symbols User Interface Symbols Standards. Optical Character Recognition: U+2440–U+245F Standards 15.7 Geometrical Symbols Box Drawing and Block Elements Box Drawing Block Elements Standards Geometric Shapes: U+25A0–U+25FF Hatched Squares Lozenge Use in Mathematics Standards 15.8 Miscellaneous Symbols Rendering of Emoji Symbols Color Words in Unicode Character Names Miscellaneous Symbols: U+2600–U+26FF Miscellaneous Symbols and Pictographs: U+1F300–U+1F5FF Standards Weather Symbols Traffic Signs Dictionary and Map Symbols Plastic Bottle Material Code System. Recycling Symbol for Generic Materials Universal Recycling Symbol Paper Recycling SymbolsGender Symbols Genealogical Symbols Game Symbols Animal Symbols Cultural Symbols Miscellaneous Symbols in Other Blocks Emoticons: U+1F600–U+1F64F Transport and Map Symbols: U+1F680–U+1F6FF Dingbats: U+2700–U+27BF Unifications and Additions. Ornamental Brackets. Alchemical Symbols: U+1F700–U+1F77F Mahjong Tiles: U+1F000–U+1F02F Domino Tiles: U+1F030–U+1F09F Playing Cards: U+1F0A0–U+1F0FF Yijing Hexagram Symbols: U+4DC0–U+4DFF Tai Xuan Jing Symbols: U+1D300–U+1D356 Monograms Digrams Tetragrams Ancient Symbols: U+10190–U+101CF Phaistos Disc Symbols: U+101D0–U+101FF 15.9 Enclosed and Square Enclosed Symbols Square Symbols Source Standards Allocation Decomposition Casing Enclosed Alphanumerics: U+2460–U+24FF Enclosed CJK Letters and Months: U+3200–U+32FF CJK Compatibility: U+3300–U+33FF Japanese Era Names Table 15-4. Japanese Era Names Enclosed Alphanumeric Supplement: U+1F100–U+1F1FF Regional Indicator Symbols Enclosed Ideographic Supplement: U+1F200–U+1F2FF 15.10 Braille Example Usage Model Imaging. Script 15.11 Western Musical Symbols Glyphs Symbols in Other Blocks Gregorian Processing. Input Methods. Directionality. Figure 15-9. Examples of Specialized Music Layout Format Characters. Precomposed Note Characters Figure 15-10. Precomposed Note Characters Alternative Noteheads. Figure 15-11. Alternative Noteheads Augmentation Dots and Articulation Symbols Figure 15-12. Augmentation Dots and Articulation Symbols Ornamentation. Table 15-5. Examples of Ornamentation 15.12 Byzantine Musical Symbols Processing. 15.13 Ancient Greek Musical Notation Unification Table 15-6. Representation of Ancient Greek Vocal and Instrumental Notation Naming Conventions Font Combining Marks 16 Special Areas and Format Characters 16.1 Control Codes Representing Control Sequences Escape Sequences Specification of Control Code Semantics Table 16-1. Control Codes Specified in the Unicode Standard Newline Function 16.2 Layout Controls Line and Word Breaking No-Break Space Word Joiner Zero Width No-Break Space Zero Width Space Table 16-2. Letter Spacing Zero-Width Spaces and Joiner Characters HyphenationLine and Paragraph Separator Cursive Connection and Ligatures Joiner Non-joiner Cursive Connection Figure 16-1. Prevention of Joining Figure 16-2. Exhibition of Joining Glyphs in Isolation Examples. Figure 16-3. Effect of Intervening Joiners Transparency Joiner and Non-joiner in Indic Scripts Implementation Notes. Filtering Joiner and Non-joiner Combining Grapheme Joiner Blocking Reordering CGJ and Collation Rendering CGJ and Joiner Characters Bidirectional Ordering Controls Table 16-3. Bidirectional Ordering Controls 16.3 Deprecated Format Characters Symmetric Swapping Character Shaping Selectors Numeric Shape Selectors 16.4 Variation Selectors Variation Sequence Mongolian 16.5 Private-Use Characters Properties. Normalization. Private Use Area: U+E000–U+F8FF Encoding Structure. Corporate Use SubareaEnd-User Subarea. Allocation of Subareas Supplementary Private Use Areas Encoding Structure. 16.6 Surrogates Area High-Surrogate Low-Surrogate Private-Use High-Surrogates 16.7 Noncharacters U+FFFF and U+10FFFF U+FFFE 16.8 Specials Byte Order Mark (BOM): U+FEFF Table 16-4. Unicode Encoding Scheme Signatures Table 16-5. U+FEFF Signature in Other Charsets Specials: U+FFF0–U+FFF8 Annotation Characters: U+FFF9–U+FFFB Figure 16-4. Annotation Characters Conformance Use in Plain Text Lexical Restrictions Formatting Input Collation Bidirectional Text Replacement Characters: U+FFFC–U+FFFD U+FFFC U+FFFD 16.9 Deprecated Tag Characters Deprecated Tag Characters: U+E0000–U+E007F Syntax for Embedding Tags Tag IdentificationTag TerminationLanguage Tags. Tag Scope and NestingFigure 16-5. Tag Characters Canceling Tag Values Working with Language Tags Avoiding Language TagsHigher-Level Protocols. Effect of Tags on Interpretation of TextDisplay Processing Range Checking for Tag Characters Editing and Modification Dangers of Incomplete Support Unicode Conformance Issues Formal Tag Syntax 17 About the Code Charts 17.1 Character Names List Images in the Code Charts and Character Lists Fonts Alternative Forms Orientation Special Characters and Code Points Combining Characters Dashed Box Convention Reserved Characters Noncharacters Deprecated Characters Character Names Informative Aliases Normative Aliases Cross References Explicit Inequality Other Linguistic Relationships Information About Languages Case Mappings Decompositions Subheads 17.2 CJK Unified and Compatibility Ideographs CJK Unified Ideographs Table 17-1. IRG Sources Chart for the Main CJK Block Figure 17-1. CJK Chart Format for the Main CJK Block Charts for CJK Extensions Figure 17-2. CJK Chart Format for CJK Extension A Figure 17-3. CJK Chart Format for CJK Extension B Compatibility Ideographs Figure 17-4. CJK Chart Format for Compatibility Ideographs 17.3 Hangul Syllables A Notational Conventions Code Points Character Names Character Blocks Sequences Rendering Figure A-1. Example of Rendering Properties and Property Values Miscellaneous Extended BNF Table A-1. Extended BNF Character Classes Table A-2. Character Class Examples Operators Table A-3. Operators B Unicode Publications and Resources B.1 The Unicode Consortium The Unicode Technical Committee Other Activities B.2 Unicode Publications B.3 Unicode Technical Standards UTS #6: A Standard Compression Scheme for Unicode UTS #10: Unicode Collation Algorithm UTS #18: Unicode Regular Expressions UTS #22: Character Mapping Markup Language (CharMapML) UTS #35: Unicode Locale Data Markup Language (LDML) UTS #37: Unicode Ideographic Variation Database UTS #39: Unicode Security Mechanisms B.4 Unicode Technical Reports UTR #16: UTF-EBCDIC UTR #17: Unicode Character Encoding Model UTR #20: Unicode in XML and Other Markup Languages UTR #23: The Unicode Character Property Model UTR #25: Unicode Support for Mathematics UTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) UTR #33: Unicode Conformance Model UTR #36: Unicode Security Considerations UTR #45: U-Source Ideographs B.5 Unicode Technical Notes B.6 Other Unicode Online Resources Unicode Online Resources Unicode Web Site Unicode Anonymous FTP Site Charts Character Index Conferences E-mail Discussion List FAQ (Frequently Asked Questions) Glossary Online Unicode Character Database Online Unihan Database Policies Unicode Common Locale Data Repository (CLDR) Updates and Errata Versions Where Is My Character? How to Contact the Unicode Consortium C Relationship to ISO/IEC 10646 C.1 History Table C-1. Timeline C.2 Encoding Forms in ISO/IEC 10646 UCS-4 UCS-2 Zero Extending Table C-2. Zero Extending C.3 UTF-8 and UTF-16 UTF-8 UTF-16 C.4 Synchronization of the Standards C.5 Identification of Features for the Unicode Standard C.6 Character Names C.7 Character Functional Specifications D Changes from Previous Versions D.1 Versions of the Unicode Standard Table D-1. Versions of Unicode and ISO/IEC 10646-1 Table D-2. Allocation of Code Points by Type Table D-3. Allocation of Code Points by Type (Early Versions) D.2 Clause and Definition Updates Table D-4. Version 5.1 Clause and Definition Updates Table D-5. Version 5.2 Clause and Definition Updates Table D-6. Version 6.0 Clause and Definition Updates D.3 Changes from Version 5.2 to Version 6.0 D.4 Changes from Version 5.1 to Version 5.2 D.5 Changes from Version 5.0 to Version 5.1 Arabic Shaping Bidirectional Behavior General Category Named Character Sequences New Property Definitions and Values Other Updates Unihan Stability Policies Important Clarification of UTF-8 Conformance Updates to Definitions of Character Sequences Updates to Table of Named Unicode Algorithms Updates to Default Algorithms Updates to Stability of Properties Updates for Security E Han Unification History E.1 Development of the URO E.2 Ideographic Rapporteur Group F Documentation of CJK Strokes Table F-1. CJK Strokes R References R.1 Source Standards and Specifications R.2 Source Dictionaries for Han Unification R.3 Other Sources for the Unicode Standard R.4 Selected Resources: Technical R.5 Selected Resources: Other I General Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
This page contains hyperlinks to The Unicode Standard, Version 6.0. The Unicode 6.0.0 page lists the contents with links to each PDF file.