|
202 | Extensions to NameAliases.txt for Unicode 6.1.0 | 2011.10.24 |
Status: | Closed | |
Originator: | UTC | |
Resolution: | The UTC decided to update NameAliases.txt for Unicode 6.1 with extensions for names and abbreviations for control characters and some format characters. Names and abbreviations will be added to at least cover the regex usage in Perl 5, including name aliases for U+0080, U+0081, and U+0099. The revised file will be organized in code point order, rather than by type of name alias. | |
Informal Discussion: | Unicode Mail List (Join) | |
Formal Feedback: | Reporting Form | |
Description of Issue:
The UTC is planning to extend the format and content of the Unicode Character Database file NameAliases.txt for Unicode 6.1.0. In addition to the current scope of NameAliases.txt, which covers the definition of formal name aliases for characters whose names have serious mistakes in them, the intent is to add various standard and de facto aliases for control characters, which have no names defined for them in the Unicode Standard, as well as various character abbreviations which are in widespread use.
Because NameAliases.txt is used as part of the input which enforces name uniqueness for the Unicode character namespace, adding aliases for control codes and commonly used abbreviations for characters would prevent accidental name collisions in the future for character "name" matches in implementations such as regular expressions.
The current scope of NameAliases.txt in Unicode 6.0 can be seen in:
http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt
The proposed scope of extensions to NameAliases.txt can be seen in:
http://www.unicode.org/review/pri202/NameAliasesProv-6.1.0.txt
The UTC would like public feedback on this planned extension to NameAliases.txt. In particular, feedback on the following issues would be welcome:
Note that the proposed update to NameAliases.txt contains multiple entries for some code points. This might prove to be a challenge to some existing parsers and implementations, which may expect UCD data files to always contain no more than a single entry per code point.
For information about how to discuss this issue and how to supply formal feedback, please see the feedback and discussion instructions. The accumulated feedback received so far on this issue is shown below, or you can look at a full page view.