Showing posts with label Unicode 14. Show all posts
Showing posts with label Unicode 14. Show all posts

Friday, April 8, 2022

ICU 71 Released

ICU LogoUnicode® ICU 71 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). ICU 71 updates to CLDR 41 locale data with various additions and corrections.

ICU 71 adds phrase-based line breaking for Japanese. Existing line breaking methods follow standards and conventions for body text but do not work well for short Japanese text, such as in titles and headings. This new feature is optimized for these use cases.

ICU 71 adds support for Hindi written in Latin letters (hi_Latn). The CLDR data for this increasingly popular locale has been significantly revised and expanded. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”.

ICU 71 and CLDR 41 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15 which is planned for September.) We are also working to re-establish continuous performance testing for ICU, and on development towards future versions.

ICU 71 updates to the time zone data version 2022a. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

For details, please see https://icu.unicode.org/download/71.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, January 4, 2022

Unicode 14.0 Paperback Available

U14 paperback vol 1 image The Unicode 14.0 core specification is now available in paperback book form with an original cover design by Sophia Tai. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 14.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US $36.72, plus shipping and any applicable taxes.

These volumes do not include the Version 14.0 code charts, nor do they include the Version 14.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 14.0 - Core Specification Volume 1 and Volume 2.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, October 28, 2021

Unicode CLDR v40 now available!

[nest image] Unicode CLDR version 40 is now available, with approximately 140,000 new or modified data fields.

In this release, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case.
  • Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv) for all units of measurement.
  • Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
  • Phase 3 (v41) will further expand the units.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.



Please see the CLDR v40 Release Note for details, including:

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

ICU 70 Released

ICU LogoUnicode® ICU 70 has just been released. ICU 70 incorporates updates to Unicode 14, including new characters, scripts, emoji, and corresponding API constants. ICU 70 adds support for emoji properties of strings. It also updates to CLDR 40 locale data with many additions and corrections. ICU 70 also includes many other bug fixes and enhancements, especially for measurement unit formatting, and it can now be built and used with C++20 compilers.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see https://icu.unicode.org/download/70.

Note: Our website has moved. Please adjust your bookmarks.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Vithkuqi Sample Version 14.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 838 characters, for a total of 144,697 characters. These additions include five new scripts, for a total of 159 scripts, as well as 37 new emoji characters.

The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:
  • Arabic script additions that include honorifics and additions for Quranic use, and characters used to write languages across Africa, the Balkans, and South and Southeast Asia
  • The Vithkuqi script historically used to write Albanian and currently undergoing a modern revival
  • The Tangsa script used to write the Tangsa language, spoken in India and Myanmar
  • The Toto script used to write the Toto language in northeast India
  • Many Latin script additions for extended IPA
Popular symbol additions include:
  • 37 emoji characters, including several new emoji for emotion and hand gestures (smileys, hands, animals and nature, food and drink, transport, and activities). For the full list of new emoji characters, see emoji additions for Unicode 14.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.
Other symbol and notational additions include:
  • The som currency sign used in the Kyrgyz Republic
  • Znamenny musical notation developed in Russia
Support for other modern languages and scholarly work extends worldwide, including:
  • Cypro-Minoan, historically used primarily on the island of Cyprus
  • Old Uyghur, historically used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
  • Ahom, Balinese, Brahmi, Canadian aboriginal languages, Glagolitic, Kaithi, Kannada, Mongolian, Tagalog, Takri, and Telugu
  • Arabic support for Hausa, Wolof, Hindko, and Punjabi, and Ethiopic support for Gurage
Important chart font updates, including:
  • Significant updates to the CJK auxiliary blocks and enclosed alphanumerics
Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 14.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 14.0:
Three important Unicode specifications updated for Version 14.0:
The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, April 8, 2020

Unicode 14.0 Delayed for 6 Months

Due to COVID-19, the Unicode Consortium has decided to postpone the release of version 14.0 of the Unicode Standard by 6 months, from March to September of 2021. This delay will also impact related specifications and data, such as new emoji characters.

The Unicode Consortium relies heavily on the efforts of volunteers. “Under the current circumstances we’ve heard that our contributors have a lot on their plates at the moment and decided it was in the best interests of our volunteers and the organizations that depend on the standard to push out our release date,” said Mark Davis, President of the Consortium. “This year we simply can’t commit to the same schedule we’ve adhered to in the past.”

ICU and CLDR to stay on schedule

The two other main Unicode projects, ICU and CLDR, are maintaining their 6-month cycles for releases in the spring and fall, although the feature sets this year may be lighter. The CLDR project supplies language- and locale-specific data and specifications, while the ICU project supplies internationalization code libraries that allow operating systems and applications to use Unicode and CLDR data and specifications. These projects are impacted less by current conditions since they have always operated via virtual meetings and are more compartmentalized, meaning that it is easier to withhold a particular feature if it falls behind schedule without jeopardizing the whole release. Sub-projects of CLDR and ICU, such as the CLDR Message Formatting project, will also be little affected.

Emoji

This announcement does not affect the new emoji included in Unicode Standard version 13.0 announced on March 10, 2020.

Because of the lead time for developers to incorporate emoji into mobile phones, emoji that are finalized in January don’t appear on phones until the following September or so. For example, the emoji that were included in Release 13.0 in March 2020 won’t generally be on phones until the fall of 2020. With the delay of the release of Unicode 14.0, the deadline for submission of new emoji character proposals for Emoji 14.0 is also being postponed until September 2020.

The Consortium is considering whether it is feasible to release emoji sequences in an Emoji 13.1 release. These sequences make use of existing characters. An example from Emoji 13.0 is the black cat, which is internally a combination of the cat emoji and black large square emoji. Since sequences rely only on combinations of existing characters in the Unicode Standard, they can be implemented on a separate schedule, and don’t require a new version of Unicode or the encoding of new characters. Such an Emoji 13.1 release would be in time for release on mobile phones in 2021.

The Emoji Subcommittee will be accepting new emoji character proposals for Emoji 14.0 from June 15, 2020 until September 1, 2020. Any new emoji characters incorporated into Emoji 14.0 would appear on phones and other devices in 2022.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]