UnicodeIUC19
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
Abstract

Character Conversions and Mapping Tables

Markus Scherer, George Rhoten & Ram Viswanadha - ICU Team of IBM in Cupertino, CA

Intended Audience: Software Engineer, Systems Analyst
Session Level: Intermediate

This talk discusses character conversions to and from Unicode, presents problems that can cause the loss of text data, and shows pragmatic ways to avoid such problems.

Text data is widely exchanged among networked systems. While modern Internet protocols and applications use Unicode more and more directly, a lot of text is still exchanged and processed in legacy encodings. Character conversion is performed whenever text is exchanged and processed in different encodings.

Character conversions can cause the loss of some of the text data for a number of reasons. Obvious problems are an insufficient repertoire in the target encoding and the lack of support for an encoding altogether. Some more obscure and unexpected problems include mismatches in conversion behavior and conversion data. Similarly, encoding names are only loosely standardized and inconsistently interpreted.

Parts of the ICU team are working with the UTC and interested parties in the industry on collecting and publishing mapping data for character conversions to and from Unicode. This project includes the assignment of unique identifiers for encodings, the collection of aliases, and the comparison of the Unicode mappings.

Using these mapping tables, ICU and other libraries can precisely duplicate the conversion results of other systems.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC19
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster