Change Management for the Unicode Collation Algorithm
As implementations of the Unicode Collation Algorithm
become more widespread,
stability and reliability of the UCA data table has become
more important. To ensure this, the UTC has approved some constraints on allowed
changes, and has established a more explicit process for tracking and implementing actual changes to
the Default Unicode Collation Element Table (DUCET) between releases of UCA.
Constraints on Changes to the Default Unicode Collation Element Table
1. Changes for characters which have been in the standard
for longer than 2 years should generally be disallowed. The UTC can
overrule this and mandate a change in a character weight
entry, but should only do so when it determines that there
is an egregious error or finds some other very strong
motivation for disturbing an established value. In less than
such extreme circumstances, solutions involving tailoring
should be preferred.
2. When a character weight has been published in UCA, but for
less than two years, any proposed change should be weighed
against the viability of a tailoring alternative, with
a presumption being for no change to DUCET, all things being
equal. This should be used to constrain against "tidying up"
proposals that disturb the table but which don't demonstrate
clear superiority to what already exists.
3. Exceptions to points 1 or 2 may be appropriate in order
to maintain synchronization with ISO/IEC 14651, but efforts
should be made in WG2 to ensure that destabilizing changes to
14651 are minimized as well.
4. The 2 year limit for point 1 may be relaxed in cases where
changes are proposed for weights for symbols, punctuation, and
format controls, if substantial reasons are provided for such
changes. This results from the fact that that such characters
are usually ignored in most collation, and there are few well-established
rules for their ordering; hence changes for their weights are less
likely to disturb the ordering of existing data or disrupt existing
tailorings. Such changes also do not destabilize ISO/IEC 14651,
because such characters are weighted in the 14651 Common Tailorable Template (CTT) table
as ignorables.
5. All reviewers should concentrate efforts during the
review of beta tables for extension to UCA before a new
version of the standard is published, to minimize the need to
make fixes after the fact that might run counter to the
principles 1 or 2.
6. The beta UCA tables and UCD tables should, if possible, be
issued during the same period to allow for sufficient review
of weights provided for the new characters.
Clarity in Specification of Changes to DUCET
1. Any proposed change to existing DUCET entries should be
specified in the tailoring syntax used by CLDR. In this
way they are more likely to be well-formed and unambiguous.
2. Any UTC-mandated change to DUCET will be reviewed by the
editorial committee during the process of implementing it
into the actual DUCET for the next revision of the UCA
standard. A proposed change may turn out not to be well-formed and
unambiguous, or have ramifications
in the table that were not obvious when the change was
proposed (such as an oversight regarding parallel treatment
of a weight change in a related script). If the data is not final —
that is, there is a an intervening UTC meeting before the UCA
release is to be made — the editorial committee is authorized
to make changes in the draft files that in its judgment, would
be most consistent with the goals and decisions of the UTC,
but should report this issue both in the PRI text associated
with the public review of the change to the table, and in its
report back to the UTC.
3. In the case of problematical changes as noted in 2, if
there is not sufficient time for a UTC decision before the
next mandated issuance of an update to UCA, the editorial
committee should complete the release of UCA without the
problematical change, so as not to hold up the release. The
presumption should be that such problematic changes need
further discussion and resolution by the UTC, and the default
action should then be to omit until clarified, rather than
incorporating problematical changes into the table which
may have to be retracted in a following release.
Tracking Proposed Changes to DUCET
1. In order not to lose track of proposed changes to DUCET,
each significant proposed change to the existing table should
be tracked using the CLDR bug-tracking process. (This is
appropriate, in part because changes to DUCET may result in
a requirement for further, cascading changes to existing
CLDR tailoring tables for collation.)
2. Before release of a new version of DUCET, the CLDR bug
database should then be reviewed to ensure that each currently
open bug on entries in the DUCET table is either:
- Fixed in the table (and marked closed);
- Not fixed in the table (and marked closed); or
- Not fixed in the table (and postponed to a future resolution,
with an explicit indication for the bug that it is not fixed
in the current release of the table).
3. Independently, implementers may submit bugs on collation
tables to the CLDR-TC. In many instances, such bugs will simply
result in changes to existing language-specific collation
table tailorings. But if, in the judgement of the CLDR-TC,
such a bug reveals a problem in the DUCET itself, the CLDR-TC
should file an appropriate bug in the CLDR bug database regarding
DUCET and bring the issue to the attention of the UTC for
resolution.