NOTE: Links to other websites will appear in a new browser window.
A list of terms and definitions related to internationalization and localization in the web
environment. The list is created and maintained by
the W3C I18N GEO Task Force.
The glossary is in support of HTML Techniques (Draft) If you would like to link to any of the definitions on this page, you can find the appropriate
fragment identifier (i.e. the link's anchor name) by
placing your mouse over the term. The identifier will appear in a tooltip. Append a hash mark ("#") and the identifier to the URL for this page:
www.i18nguy.com/markup/i18n-glossary.html.
|
W3C I18N GEO Task Force Home Page
Suggested that we add: BDO, LRE, PDF, LRM, ZWNJ. Do we want to document Unicode controls characters?
Term | Notes |
---|---|
abjad | A type of writing system where only consonants are generally written. |
abugida | A type of writing system whose basic characters denote consonants followed by a particular vowel, and in which diacritics denote the other vowels. |
ANSI | American National Standards Institute.
Microsoft's collective name for all or any Windows code pages. (As in "ANSI code page".) Sometimes used specifically for code page 1252, which is a superset of ISO/IEC 8859-1. |
ASCII | American Standard for Character Information Interchange. ISO 646. |
bidi | Internationalization industry jargon. Abbreviation for bidirectional text. |
Bidirectional text | Also abbreviated as "bidi", describes text that is primarily written from right-to-left, and which is often mixed with left-to-right text. Examples include text written in Hebrew and Arabic scripts. |
Basic Multilingual Plane (BMP) | TBD |
BMP | Basic Multilingual Plane |
BOM | Byte Order Mark, U+FEFF, Also used as Unicode Character Encoding Signature |
byte order mark | U+FEFF, also known as BOM and ZWNSP. Also used as Character Encoding Signature for Unicode encodings (UTF-8, UTF-16, et al.) |
character | A member of a set of elements used for the organization, control, or representation of data. For example, "LATIN CAPITAL LETTER A" names a character. |
character encoding | TBD |
character entity | TBD |
character set | TBD |
charset | TBD |
character encoding signature | TBD |
character escape | tbd |
character repertoire | A set of characters (in the mathematical sense) |
coded character set | TBD |
code point | TBD |
compatibility character | TBD |
complex script | TBD |
DBCS | Double-Byte Character Set. A specific type of MBCS, character encodings where characters are of varying byte length, limited to a maximum length of 2 bytes for characters. A character encoding where characters are represented by either one or two bytes. Sometimes DBC is used for double-byte character. |
diacritic | TBD |
document character set | TBD |
escape | see "character escape" |
fragment | TBD |
GEO | W3C Abbreviation for Guidelines, Education, and Outreach. See www.w3.org/International/geo/ |
glyph | TBD |
goober | A type of consideration for the internationalization of software or Web applications due to local legal, regulatory, or other governmental requirements. See Web Services Internationalization Usage Scenarios, Section 4.15 Legal and Regulatory Goobers |
HTTP | HyperText Transfer Protocol |
HTTP header | TBD |
i18n | Abbreviation. See internationalization. Also see "Origin of the abbreviation i18n". |
IANA | Internet Assigned Numbers Authority www.iana.org |
IANA Charset Registry | Registry for character encodings used by MIME, Web standards, and others.
www.iana.org/assignments/character-sets |
internationalization | Designing software to be usable around the world. |
IRI | W3C acronym for Internationalized Resource Identifier, an internationalized form of URI. See www.w3.org/International/O-URL-and-ident. |
MBCS | Multi-Byte Character Set. A type of character encoding where characters are of varying byte length. Characters may be encoded as 1, 2, 3 or 4 bytes for example in some encodings. |
MIME type | TBD |
mojibake (文字化け) | Japanese jargon for any of "garbage", "changed", "ghost" or "disguised" characters or what is shown when Japanese characters are not displayed correctly (various black boxes or other nonsense characters). Here are some examples that look like mojibake: █ █ (You should see some black boxes.) There can also be white boxes: █ █ or ǶǶǶ. In Japan, these are sometimes called "TOFU" |
NCR | Numeric Character Reference. (See HTML specification.) |
NFC | Unicode acronym for Normalization Form C |
NLS | Software Industry abbreviation for National Language System. General term refering to features, and libraries and related data supporting internationalization within an operating system or product. Example usage: "NLS Library". |
normalization | Unicode term normalization |
quirks mode | TBD |
PUA | Abbreviation for Unicode term: Private Use Area |
SBCS | Single-Byte Character Set. Some vendors refer to this as a code page. A character encoding where each character is represented by one 8-bit value. Sometimes SBC is used for single-byte character. |
standards mode | TBD |
supplementary character | TBD |
tofu (豆腐) | Japanese jargon for the white box character that is displayed by default for an unassigned or unknown character. For example: Ƕ. See mojibake |
transcoding | TBD |
UCS | Abbreviation for Unicode term: Universal Character Set which is specified by International Standard ISO/IEC 10646. Sometimes also used as Unicode Character Standard. |
Unicode | Unicode Character Standard (UCS), Universal Character Set. See Unicode ConsortiumAlso see ISO 10646. |
user agent (UA) | TBD |
UTF | Abbreviation, Unicode term for Unicode Transformation Format. Also see UTF-8, UTF-16, UTF-16LE, UTF-16BE, UTF-32, UTF-32BE, UTF-32LE |
virama | TBD |
W3C | Abbreviation for World Wide Web Consortium. See www.w3.org |
WAI | W3C abbreviation for Web Accessibility Initiative. See www.w3.org/WAI/ |
XML | eXtensible Markup Language |
XML declaration | TBD |
ZWNSP | Zero Width No-break Space. Deprecated. Formerly doubled as a Byte Order Mark, U+FEFF. |
Å, Å | The symbol for Ångstrom (U+212B) and the letter A-ring (U+00C5, or U+0041 and U+030A - A and Combining Ring Above). Scandanavian alphabets sort the letter A-ring after the letter Z. |