1 |
|
---|
2 |
|
---|
3 |
|
---|
4 |
|
---|
5 |
|
---|
6 |
|
---|
7 | Network Working Group K. Zeilenga
|
---|
8 | Request for Comments: 4518 OpenLDAP Foundation
|
---|
9 | Category: Standards Track June 2006
|
---|
10 |
|
---|
11 |
|
---|
12 | Lightweight Directory Access Protocol (LDAP):
|
---|
13 | Internationalized String Preparation
|
---|
14 |
|
---|
15 | Status of This Memo
|
---|
16 |
|
---|
17 | This document specifies an Internet standards track protocol for the
|
---|
18 | Internet community, and requests discussion and suggestions for
|
---|
19 | improvements. Please refer to the current edition of the "Internet
|
---|
20 | Official Protocol Standards" (STD 1) for the standardization state
|
---|
21 | and status of this protocol. Distribution of this memo is unlimited.
|
---|
22 |
|
---|
23 | Copyright Notice
|
---|
24 |
|
---|
25 | Copyright (C) The Internet Society (2006).
|
---|
26 |
|
---|
27 | Abstract
|
---|
28 |
|
---|
29 | The previous Lightweight Directory Access Protocol (LDAP) technical
|
---|
30 | specifications did not precisely define how character string matching
|
---|
31 | is to be performed. This led to a number of usability and
|
---|
32 | interoperability problems. This document defines string preparation
|
---|
33 | algorithms for character-based matching rules defined for use in
|
---|
34 | LDAP.
|
---|
35 |
|
---|
36 | 1. Introduction
|
---|
37 |
|
---|
38 | 1.1. Background
|
---|
39 |
|
---|
40 | A Lightweight Directory Access Protocol (LDAP) [RFC4510] matching
|
---|
41 | rule [RFC4517] defines an algorithm for determining whether a
|
---|
42 | presented value matches an attribute value in accordance with the
|
---|
43 | criteria defined for the rule. The proposition may be evaluated to
|
---|
44 | True, False, or Undefined.
|
---|
45 |
|
---|
46 | True - the attribute contains a matching value,
|
---|
47 |
|
---|
48 | False - the attribute contains no matching value,
|
---|
49 |
|
---|
50 | Undefined - it cannot be determined whether the attribute contains
|
---|
51 | a matching value.
|
---|
52 |
|
---|
53 |
|
---|
54 |
|
---|
55 |
|
---|
56 |
|
---|
57 |
|
---|
58 | Zeilenga Standards Track [Page 1]
|
---|
59 | |
---|
60 |
|
---|
61 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
62 |
|
---|
63 |
|
---|
64 | For instance, the caseIgnoreMatch matching rule may be used to
|
---|
65 | compare whether the commonName attribute contains a particular value
|
---|
66 | without regard for case and insignificant spaces.
|
---|
67 |
|
---|
68 | 1.2. X.500 String Matching Rules
|
---|
69 |
|
---|
70 | "X.520: Selected attribute types" [X.520] provides (among other
|
---|
71 | things) value syntaxes and matching rules for comparing values
|
---|
72 | commonly used in the directory [X.500]. These specifications are
|
---|
73 | inadequate for strings composed of Unicode [Unicode] characters.
|
---|
74 |
|
---|
75 | The caseIgnoreMatch matching rule [X.520], for example, is simply
|
---|
76 | defined as being a case-insensitive comparison where insignificant
|
---|
77 | spaces are ignored. For printableString, there is only one space
|
---|
78 | character and case mapping is bijective, hence this definition is
|
---|
79 | sufficient. However, for Unicode string types such as
|
---|
80 | universalString, this is not sufficient. For example, a case-
|
---|
81 | insensitive matching implementation that folded lowercase characters
|
---|
82 | to uppercase would yield different results than an implementation
|
---|
83 | that used uppercase to lowercase folding. Or one implementation may
|
---|
84 | view space as referring to only SPACE (U+0020), a second
|
---|
85 | implementation may view any character with the space separator (Zs)
|
---|
86 | property as a space, and another implementation may view any
|
---|
87 | character with the whitespace (WS) category as a space.
|
---|
88 |
|
---|
89 | The lack of precise specification for character string matching has
|
---|
90 | led to significant interoperability problems. When used in
|
---|
91 | certificate chain validation, security vulnerabilities can arise. To
|
---|
92 | address these problems, this document defines precise algorithms for
|
---|
93 | preparing character strings for matching.
|
---|
94 |
|
---|
95 | 1.3. Relationship to "stringprep"
|
---|
96 |
|
---|
97 | The character string preparation algorithms described in this
|
---|
98 | document are based upon the "stringprep" approach [RFC3454]. In
|
---|
99 | "stringprep", presented and stored values are first prepared for
|
---|
100 | comparison so that a character-by-character comparison yields the
|
---|
101 | "correct" result.
|
---|
102 |
|
---|
103 | The approach used here is a refinement of the "stringprep" [RFC3454]
|
---|
104 | approach. Each algorithm involves two additional preparation steps.
|
---|
105 |
|
---|
106 | a) Prior to applying the Unicode string preparation steps outlined in
|
---|
107 | "stringprep", the string is transcoded to Unicode.
|
---|
108 |
|
---|
109 | b) After applying the Unicode string preparation steps outlined in
|
---|
110 | "stringprep", the string is modified to appropriately handle
|
---|
111 | characters insignificant to the matching rule.
|
---|
112 |
|
---|
113 |
|
---|
114 |
|
---|
115 | Zeilenga Standards Track [Page 2]
|
---|
116 | |
---|
117 |
|
---|
118 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
119 |
|
---|
120 |
|
---|
121 | Hence, preparation of character strings for X.500 [X.500] matching
|
---|
122 | [X.501] involves the following steps:
|
---|
123 |
|
---|
124 | 1) Transcode
|
---|
125 | 2) Map
|
---|
126 | 3) Normalize
|
---|
127 | 4) Prohibit
|
---|
128 | 5) Check Bidi (Bidirectional)
|
---|
129 | 6) Insignificant Character Handling
|
---|
130 |
|
---|
131 | These steps are described in Section 2.
|
---|
132 |
|
---|
133 | It is noted that while various tables of Unicode characters included
|
---|
134 | or referenced by this specification are derived from Unicode
|
---|
135 | [Unicode] data, these tables are to be considered definitive for the
|
---|
136 | purpose of implementing this specification.
|
---|
137 |
|
---|
138 | 1.4. Relationship to the LDAP Technical Specification
|
---|
139 |
|
---|
140 | This document is an integral part of the LDAP technical specification
|
---|
141 | [RFC4510], which obsoletes the previously defined LDAP technical
|
---|
142 | specification [RFC3377] in its entirety.
|
---|
143 |
|
---|
144 | This document details new LDAP internationalized character string
|
---|
145 | preparation algorithms used by [RFC4517] and possible other technical
|
---|
146 | specifications defining LDAP syntaxes and/or matching rules.
|
---|
147 |
|
---|
148 | 1.5. Relationship to X.500
|
---|
149 |
|
---|
150 | LDAP is defined [RFC4510] in X.500 terms as an X.500 access
|
---|
151 | mechanism. As such, there is a strong desire for alignment between
|
---|
152 | LDAP and X.500 syntax and semantics. The character string
|
---|
153 | preparation algorithms described in this document are based upon
|
---|
154 | "Internationalized String Matching Rules for X.500" [XMATCH] proposal
|
---|
155 | to ITU/ISO Joint Study Group 2.
|
---|
156 |
|
---|
157 | 1.6. Conventions and Terms
|
---|
158 |
|
---|
159 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
---|
160 | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
---|
161 | document are to be interpreted as described in BCP 14 [RFC2119].
|
---|
162 |
|
---|
163 | Character names in this document use the notation for code points and
|
---|
164 | names from the Unicode Standard [Unicode]. For example, the letter
|
---|
165 | "a" may be represented as either <U+0061> or <LATIN SMALL LETTER A>.
|
---|
166 | In the lists of mappings and the prohibited characters, the "U+" is
|
---|
167 |
|
---|
168 |
|
---|
169 |
|
---|
170 |
|
---|
171 |
|
---|
172 | Zeilenga Standards Track [Page 3]
|
---|
173 | |
---|
174 |
|
---|
175 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
176 |
|
---|
177 |
|
---|
178 | left off to make the lists easier to read. The comments for
|
---|
179 | character ranges are shown in square brackets (such as "[CONTROL
|
---|
180 | CHARACTERS]") and do not come from the standard.
|
---|
181 |
|
---|
182 | Note: a glossary of terms used in Unicode can be found in [Glossary].
|
---|
183 | Information on the Unicode character encoding model can be found in
|
---|
184 | [CharModel].
|
---|
185 |
|
---|
186 | The term "combining mark", as used in this specification, refers to
|
---|
187 | any Unicode [Unicode] code point that has a mark property (Mn, Mc,
|
---|
188 | Me). Appendix A provides a definitive list of combining marks.
|
---|
189 |
|
---|
190 | 2. String Preparation
|
---|
191 |
|
---|
192 | The following six-step process SHALL be applied to each presented and
|
---|
193 | attribute value in preparation for character string matching rule
|
---|
194 | evaluation.
|
---|
195 |
|
---|
196 | 1) Transcode
|
---|
197 | 2) Map
|
---|
198 | 3) Normalize
|
---|
199 | 4) Prohibit
|
---|
200 | 5) Check bidi
|
---|
201 | 6) Insignificant Character Handling
|
---|
202 |
|
---|
203 | Failure in any step causes the assertion to evaluate to Undefined.
|
---|
204 |
|
---|
205 | The character repertoire of this process is Unicode 3.2 [Unicode].
|
---|
206 |
|
---|
207 | Note that this six-step process specification is intended to describe
|
---|
208 | expected matching behavior. Implementations are free to use
|
---|
209 | alternative processes so long as the matching rule evaluation
|
---|
210 | behavior provided is consistent with the behavior described by this
|
---|
211 | specification.
|
---|
212 |
|
---|
213 | 2.1. Transcode
|
---|
214 |
|
---|
215 | Each non-Unicode string value is transcoded to Unicode.
|
---|
216 |
|
---|
217 | PrintableString [X.680] values are transcoded directly to Unicode.
|
---|
218 |
|
---|
219 | UniversalString, UTF8String, and bmpString [X.680] values need not be
|
---|
220 | transcoded as they are Unicode-based strings (in the case of
|
---|
221 | bmpString, a subset of Unicode).
|
---|
222 |
|
---|
223 | TeletexString [X.680] values are transcoded to Unicode. As there is
|
---|
224 | no standard for mapping TeletexString values to Unicode, the mapping
|
---|
225 | is left a local matter.
|
---|
226 |
|
---|
227 |
|
---|
228 |
|
---|
229 | Zeilenga Standards Track [Page 4]
|
---|
230 | |
---|
231 |
|
---|
232 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
233 |
|
---|
234 |
|
---|
235 | For these and other reasons, use of TeletexString is NOT RECOMMENDED.
|
---|
236 |
|
---|
237 | The output is the transcoded string.
|
---|
238 |
|
---|
239 | 2.2. Map
|
---|
240 |
|
---|
241 | SOFT HYPHEN (U+00AD) and MONGOLIAN TODO SOFT HYPHEN (U+1806) code
|
---|
242 | points are mapped to nothing. COMBINING GRAPHEME JOINER (U+034F) and
|
---|
243 | VARIATION SELECTORs (U+180B-180D, FF00-FE0F) code points are also
|
---|
244 | mapped to nothing. The OBJECT REPLACEMENT CHARACTER (U+FFFC) is
|
---|
245 | mapped to nothing.
|
---|
246 |
|
---|
247 | CHARACTER TABULATION (U+0009), LINE FEED (LF) (U+000A), LINE
|
---|
248 | TABULATION (U+000B), FORM FEED (FF) (U+000C), CARRIAGE RETURN (CR)
|
---|
249 | (U+000D), and NEXT LINE (NEL) (U+0085) are mapped to SPACE (U+0020).
|
---|
250 |
|
---|
251 | All other control code (e.g., Cc) points or code points with a
|
---|
252 | control function (e.g., Cf) are mapped to nothing. The following is
|
---|
253 | a complete list of these code points: U+0000-0008, 000E-001F, 007F-
|
---|
254 | 0084, 0086-009F, 06DD, 070F, 180E, 200C-200F, 202A-202E, 2060-2063,
|
---|
255 | 206A-206F, FEFF, FFF9-FFFB, 1D173-1D17A, E0001, E0020-E007F.
|
---|
256 |
|
---|
257 | ZERO WIDTH SPACE (U+200B) is mapped to nothing. All other code
|
---|
258 | points with Separator (space, line, or paragraph) property (e.g., Zs,
|
---|
259 | Zl, or Zp) are mapped to SPACE (U+0020). The following is a complete
|
---|
260 | list of these code points: U+0020, 00A0, 1680, 2000-200A, 2028-2029,
|
---|
261 | 202F, 205F, 3000.
|
---|
262 |
|
---|
263 | For case ignore, numeric, and stored prefix string matching rules,
|
---|
264 | characters are case folded per B.2 of [RFC3454].
|
---|
265 |
|
---|
266 | The output is the mapped string.
|
---|
267 |
|
---|
268 | 2.3. Normalize
|
---|
269 |
|
---|
270 | The input string is to be normalized to Unicode Form KC
|
---|
271 | (compatibility composed) as described in [UAX15]. The output is the
|
---|
272 | normalized string.
|
---|
273 |
|
---|
274 | 2.4. Prohibit
|
---|
275 |
|
---|
276 | All Unassigned code points are prohibited. Unassigned code points
|
---|
277 | are listed in Table A.1 of [RFC3454].
|
---|
278 |
|
---|
279 | Characters that, per Section 5.8 of [RFC3454], change display
|
---|
280 | properties or are deprecated are prohibited. These characters are
|
---|
281 | listed in Table C.8 of [RFC3454].
|
---|
282 |
|
---|
283 |
|
---|
284 |
|
---|
285 |
|
---|
286 | Zeilenga Standards Track [Page 5]
|
---|
287 | |
---|
288 |
|
---|
289 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
290 |
|
---|
291 |
|
---|
292 | Private Use code points are prohibited. These characters are listed
|
---|
293 | in Table C.3 of [RFC3454].
|
---|
294 |
|
---|
295 | All non-character code points are prohibited. These code points are
|
---|
296 | listed in Table C.4 of [RFC3454].
|
---|
297 |
|
---|
298 | Surrogate codes are prohibited. These characters are listed in Table
|
---|
299 | C.5 of [RFC3454].
|
---|
300 |
|
---|
301 | The REPLACEMENT CHARACTER (U+FFFD) code point is prohibited.
|
---|
302 |
|
---|
303 | The step fails if the input string contains any prohibited code
|
---|
304 | point. Otherwise, the output is the input string.
|
---|
305 |
|
---|
306 | 2.5. Check bidi
|
---|
307 |
|
---|
308 | Bidirectional characters are ignored.
|
---|
309 |
|
---|
310 | 2.6. Insignificant Character Handling
|
---|
311 |
|
---|
312 | In this step, the string is modified to ensure proper handling of
|
---|
313 | characters insignificant to the matching rule. This modification
|
---|
314 | differs from matching rule to matching rule.
|
---|
315 |
|
---|
316 | Section 2.6.1 applies to case ignore and exact string matching.
|
---|
317 | Section 2.6.2 applies to numericString matching.
|
---|
318 | Section 2.6.3 applies to telephoneNumber matching.
|
---|
319 |
|
---|
320 | 2.6.1. Insignificant Space Handling
|
---|
321 |
|
---|
322 | For the purposes of this section, a space is defined to be the SPACE
|
---|
323 | (U+0020) code point followed by no combining marks.
|
---|
324 |
|
---|
325 | NOTE - The previous steps ensure that the string cannot contain
|
---|
326 | any code points in the separator class, other than SPACE
|
---|
327 | (U+0020).
|
---|
328 |
|
---|
329 | For input strings that are attribute values or non-substring
|
---|
330 | assertion values: If the input string contains no non-space
|
---|
331 | character, then the output is exactly two SPACEs. Otherwise (the
|
---|
332 | input string contains at least one non-space character), the string
|
---|
333 | is modified such that the string starts with exactly one space
|
---|
334 | character, ends with exactly one SPACE character, and any inner
|
---|
335 | (non-empty) sequence of space characters is replaced with exactly two
|
---|
336 | SPACE characters. For instance, the input strings
|
---|
337 | "foo<SPACE>bar<SPACE><SPACE>", result in the output
|
---|
338 | "<SPACE>foo<SPACE><SPACE>bar<SPACE>".
|
---|
339 |
|
---|
340 |
|
---|
341 |
|
---|
342 |
|
---|
343 | Zeilenga Standards Track [Page 6]
|
---|
344 | |
---|
345 |
|
---|
346 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
347 |
|
---|
348 |
|
---|
349 | For input strings that are substring assertion values: If the string
|
---|
350 | being prepared contains no non-space characters, then the output
|
---|
351 | string is exactly one SPACE. Otherwise, the following steps are
|
---|
352 | taken:
|
---|
353 |
|
---|
354 | - If the input string is an initial substring, it is modified to
|
---|
355 | start with exactly one SPACE character;
|
---|
356 |
|
---|
357 | - If the input string is an initial or an any substring that ends in
|
---|
358 | one or more space characters, it is modified to end with exactly
|
---|
359 | one SPACE character;
|
---|
360 |
|
---|
361 | - If the input string is an any or a final substring that starts in
|
---|
362 | one or more space characters, it is modified to start with exactly
|
---|
363 | one SPACE character; and
|
---|
364 |
|
---|
365 | - If the input string is a final substring, it is modified to end
|
---|
366 | with exactly one SPACE character.
|
---|
367 |
|
---|
368 | For instance, for the input string "foo<SPACE>bar<SPACE><SPACE>" as
|
---|
369 | an initial substring, the output would be
|
---|
370 | "<SPACE>foo<SPACE><SPACE>bar<SPACE>". As an any or final substring,
|
---|
371 | the same input would result in "foo<SPACE>bar<SPACE>".
|
---|
372 |
|
---|
373 | Appendix B discusses the rationale for the behavior.
|
---|
374 |
|
---|
375 | 2.6.2. numericString Insignificant Character Handling
|
---|
376 |
|
---|
377 | For the purposes of this section, a space is defined to be the SPACE
|
---|
378 | (U+0020) code point followed by no combining marks.
|
---|
379 |
|
---|
380 | All spaces are regarded as insignificant and are to be removed.
|
---|
381 |
|
---|
382 | For example, removal of spaces from the Form KC string:
|
---|
383 | "<SPACE><SPACE>123<SPACE><SPACE>456<SPACE><SPACE>"
|
---|
384 | would result in the output string:
|
---|
385 | "123456"
|
---|
386 | and the Form KC string:
|
---|
387 | "<SPACE><SPACE><SPACE>"
|
---|
388 | would result in the output string:
|
---|
389 | "" (an empty string).
|
---|
390 |
|
---|
391 | 2.6.3. telephoneNumber Insignificant Character Handling
|
---|
392 |
|
---|
393 | For the purposes of this section, a hyphen is defined to be a
|
---|
394 | HYPHEN-MINUS (U+002D), ARMENIAN HYPHEN (U+058A), HYPHEN (U+2010),
|
---|
395 | NON-BREAKING HYPHEN (U+2011), MINUS SIGN (U+2212), SMALL HYPHEN-MINUS
|
---|
396 | (U+FE63), or FULLWIDTH HYPHEN-MINUS (U+FF0D) code point followed by
|
---|
397 |
|
---|
398 |
|
---|
399 |
|
---|
400 | Zeilenga Standards Track [Page 7]
|
---|
401 | |
---|
402 |
|
---|
403 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
404 |
|
---|
405 |
|
---|
406 | no combining marks and a space is defined to be the SPACE (U+0020)
|
---|
407 | code point followed by no combining marks.
|
---|
408 |
|
---|
409 | All hyphens and spaces are considered insignificant and are to be
|
---|
410 | removed.
|
---|
411 |
|
---|
412 | For example, removal of hyphens and spaces from the Form KC string:
|
---|
413 | "<SPACE><HYPHEN>123<SPACE><SPACE>456<SPACE><HYPHEN>"
|
---|
414 | would result in the output string:
|
---|
415 | "123456"
|
---|
416 | and the Form KC string:
|
---|
417 | "<HYPHEN><HYPHEN><HYPHEN>"
|
---|
418 | would result in the (empty) output string:
|
---|
419 | "".
|
---|
420 |
|
---|
421 | 3. Security Considerations
|
---|
422 |
|
---|
423 | "Preparation of Internationalized Strings ("stringprep")" [RFC3454]
|
---|
424 | security considerations generally apply to the algorithms described
|
---|
425 | here.
|
---|
426 |
|
---|
427 | 4. Acknowledgements
|
---|
428 |
|
---|
429 | The approach used in this document is based upon design principles
|
---|
430 | and algorithms described in "Preparation of Internationalized Strings
|
---|
431 | ('stringprep')" [RFC3454] by Paul Hoffman and Marc Blanchet. Some
|
---|
432 | additional guidance was drawn from Unicode Technical Standards,
|
---|
433 | Technical Reports, and Notes.
|
---|
434 |
|
---|
435 | This document is a product of the IETF LDAP Revision (LDAPBIS)
|
---|
436 | Working Group.
|
---|
437 |
|
---|
438 | 5. References
|
---|
439 |
|
---|
440 | 5.1. Normative References
|
---|
441 |
|
---|
442 | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
---|
443 | Requirement Levels", BCP 14, RFC 2119, March 1997.
|
---|
444 |
|
---|
445 | [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
|
---|
446 | Internationalized Strings ("stringprep")", RFC 3454,
|
---|
447 | December 2002.
|
---|
448 |
|
---|
449 | [RFC4510] Zeilenga, K., "Lightweight Directory Access Protocol
|
---|
450 | (LDAP): Technical Specification Road Map", RFC 4510,
|
---|
451 | June 2006.
|
---|
452 |
|
---|
453 |
|
---|
454 |
|
---|
455 |
|
---|
456 |
|
---|
457 | Zeilenga Standards Track [Page 8]
|
---|
458 | |
---|
459 |
|
---|
460 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
461 |
|
---|
462 |
|
---|
463 | [RFC4517] Legg, S., Ed., "Lightweight Directory Access Protocol
|
---|
464 | (LDAP): Syntaxes and Matching Rules", RFC 4517, June
|
---|
465 | 2006.
|
---|
466 |
|
---|
467 | [Unicode] The Unicode Consortium, "The Unicode Standard, Version
|
---|
468 | 3.2.0" is defined by "The Unicode Standard, Version
|
---|
469 | 3.0" (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-
|
---|
470 | 61633-5), as amended by the "Unicode Standard Annex
|
---|
471 | #27: Unicode 3.1"
|
---|
472 | (http://www.unicode.org/reports/tr27/) and by the
|
---|
473 | "Unicode Standard Annex #28: Unicode 3.2"
|
---|
474 | (http://www.unicode.org/reports/tr28/).
|
---|
475 |
|
---|
476 | [UAX15] Davis, M. and M. Duerst, "Unicode Standard Annex #15:
|
---|
477 | Unicode Normalization Forms, Version 3.2.0".
|
---|
478 | <http://www.unicode.org/unicode/reports/tr15/tr15-
|
---|
479 | 22.html>, March 2002.
|
---|
480 |
|
---|
481 | [X.680] International Telecommunication Union -
|
---|
482 | Telecommunication Standardization Sector, "Abstract
|
---|
483 | Syntax Notation One (ASN.1) - Specification of Basic
|
---|
484 | Notation", X.680(2002) (also ISO/IEC 8824-1:2002).
|
---|
485 |
|
---|
486 | 5.2. Informative References
|
---|
487 |
|
---|
488 | [X.500] International Telecommunication Union -
|
---|
489 | Telecommunication Standardization Sector, "The
|
---|
490 | Directory -- Overview of concepts, models and
|
---|
491 | services," X.500(1993) (also ISO/IEC 9594-1:1994).
|
---|
492 |
|
---|
493 | [X.501] International Telecommunication Union -
|
---|
494 | Telecommunication Standardization Sector, "The
|
---|
495 | Directory -- Models," X.501(1993) (also ISO/IEC 9594-
|
---|
496 | 2:1994).
|
---|
497 |
|
---|
498 | [X.520] International Telecommunication Union -
|
---|
499 | Telecommunication Standardization Sector, "The
|
---|
500 | Directory: Selected Attribute Types", X.520(1993) (also
|
---|
501 | ISO/IEC 9594-6:1994).
|
---|
502 |
|
---|
503 | [Glossary] The Unicode Consortium, "Unicode Glossary",
|
---|
504 | <http://www.unicode.org/glossary/>.
|
---|
505 |
|
---|
506 | [CharModel] Whistler, K. and M. Davis, "Unicode Technical Report
|
---|
507 | #17, Character Encoding Model", UTR17,
|
---|
508 | <http://www.unicode.org/unicode/reports/tr17/>, August
|
---|
509 | 2000.
|
---|
510 |
|
---|
511 |
|
---|
512 |
|
---|
513 |
|
---|
514 | Zeilenga Standards Track [Page 9]
|
---|
515 | |
---|
516 |
|
---|
517 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
518 |
|
---|
519 |
|
---|
520 | [RFC3377] Hodges, J. and R. Morgan, "Lightweight Directory Access
|
---|
521 | Protocol (v3): Technical Specification", RFC 3377,
|
---|
522 | September 2002.
|
---|
523 |
|
---|
524 | [RFC4515] Smith, M., Ed. and T. Howes, "Lightweight Directory
|
---|
525 | Access Protocol (LDAP): String Representation of Search
|
---|
526 | Filters", RFC 4515, June 2006.
|
---|
527 |
|
---|
528 | [XMATCH] Zeilenga, K., "Internationalized String Matching Rules
|
---|
529 | for X.500", Work in Progress.
|
---|
530 |
|
---|
531 |
|
---|
532 |
|
---|
533 |
|
---|
534 |
|
---|
535 |
|
---|
536 |
|
---|
537 |
|
---|
538 |
|
---|
539 |
|
---|
540 |
|
---|
541 |
|
---|
542 |
|
---|
543 |
|
---|
544 |
|
---|
545 |
|
---|
546 |
|
---|
547 |
|
---|
548 |
|
---|
549 |
|
---|
550 |
|
---|
551 |
|
---|
552 |
|
---|
553 |
|
---|
554 |
|
---|
555 |
|
---|
556 |
|
---|
557 |
|
---|
558 |
|
---|
559 |
|
---|
560 |
|
---|
561 |
|
---|
562 |
|
---|
563 |
|
---|
564 |
|
---|
565 |
|
---|
566 |
|
---|
567 |
|
---|
568 |
|
---|
569 |
|
---|
570 |
|
---|
571 | Zeilenga Standards Track [Page 10]
|
---|
572 | |
---|
573 |
|
---|
574 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
575 |
|
---|
576 |
|
---|
577 | Appendix A. Combining Marks
|
---|
578 |
|
---|
579 | This appendix is normative.
|
---|
580 |
|
---|
581 | This table was derived from Unicode [Unicode] data files; it lists
|
---|
582 | all code points with the Mn, Mc, or Me properties. This table is to
|
---|
583 | be considered definitive for the purposes of implementation of this
|
---|
584 | specification.
|
---|
585 |
|
---|
586 | 0300-034F 0360-036F 0483-0486 0488-0489 0591-05A1
|
---|
587 | 05A3-05B9 05BB-05BC 05BF 05C1-05C2 05C4 064B-0655 0670
|
---|
588 | 06D6-06DC 06DE-06E4 06E7-06E8 06EA-06ED 0711 0730-074A
|
---|
589 | 07A6-07B0 0901-0903 093C 093E-094F 0951-0954 0962-0963
|
---|
590 | 0981-0983 09BC 09BE-09C4 09C7-09C8 09CB-09CD 09D7
|
---|
591 | 09E2-09E3 0A02 0A3C 0A3E-0A42 0A47-0A48 0A4B-0A4D
|
---|
592 | 0A70-0A71 0A81-0A83 0ABC 0ABE-0AC5 0AC7-0AC9 0ACB-0ACD
|
---|
593 | 0B01-0B03 0B3C 0B3E-0B43 0B47-0B48 0B4B-0B4D 0B56-0B57
|
---|
594 | 0B82 0BBE-0BC2 0BC6-0BC8 0BCA-0BCD 0BD7 0C01-0C03
|
---|
595 | 0C3E-0C44 0C46-0C48 0C4A-0C4D 0C55-0C56 0C82-0C83
|
---|
596 | 0CBE-0CC4 0CC6-0CC8 0CCA-0CCD 0CD5-0CD6 0D02-0D03
|
---|
597 | 0D3E-0D43 0D46-0D48 0D4A-0D4D 0D57 0D82-0D83 0DCA
|
---|
598 | 0DCF-0DD4 0DD6 0DD8-0DDF 0DF2-0DF3 0E31 0E34-0E3A
|
---|
599 | 0E47-0E4E 0EB1 0EB4-0EB9 0EBB-0EBC 0EC8-0ECD 0F18-0F19
|
---|
600 | 0F35 0F37 0F39 0F3E-0F3F 0F71-0F84 0F86-0F87 0F90-0F97
|
---|
601 | 0F99-0FBC 0FC6 102C-1032 1036-1039 1056-1059 1712-1714
|
---|
602 | 1732-1734 1752-1753 1772-1773 17B4-17D3 180B-180D 18A9
|
---|
603 | 20D0-20EA 302A-302F 3099-309A FB1E FE00-FE0F FE20-FE23
|
---|
604 | 1D165-1D169 1D16D-1D172 1D17B-1D182 1D185-1D18B
|
---|
605 | 1D1AA-1D1AD
|
---|
606 |
|
---|
607 | Appendix B. Substrings Matching
|
---|
608 |
|
---|
609 | This appendix is non-normative.
|
---|
610 |
|
---|
611 | In the absence of substrings matching, the insignificant space
|
---|
612 | handling for case ignore/exact matching could be simplified.
|
---|
613 | Specifically, the handling could be to require that all sequences of
|
---|
614 | one or more spaces be replaced with one space and, if the string
|
---|
615 | contains non-space characters, removal of all leading spaces and
|
---|
616 | trailing spaces.
|
---|
617 |
|
---|
618 | In the presence of substrings matching, this simplified space
|
---|
619 | handling would lead to unexpected and undesirable matching behavior.
|
---|
620 | For instance:
|
---|
621 |
|
---|
622 | 1) (CN=foo\20*\20bar) would match the CN value "foobar";
|
---|
623 |
|
---|
624 |
|
---|
625 |
|
---|
626 |
|
---|
627 |
|
---|
628 | Zeilenga Standards Track [Page 11]
|
---|
629 | |
---|
630 |
|
---|
631 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
632 |
|
---|
633 |
|
---|
634 | 2) (CN=*\20foobar\20*) would match "foobar", but
|
---|
635 | (CN=*\20*foobar*\20*) would not.
|
---|
636 |
|
---|
637 | Note to readers not familiar with LDAP substrings matching: the LDAP
|
---|
638 | filter [RFC4515] assertion (CN=A*B*C) says to "match any value (of
|
---|
639 | the attribute CN) that begins with A, contains B after A, ends with C
|
---|
640 | where C is also after B."
|
---|
641 |
|
---|
642 | The first case illustrates that this simplified space handling would
|
---|
643 | cause leading and trailing spaces in substrings of the string to be
|
---|
644 | regarded as insignificant. However, only leading and trailing (as
|
---|
645 | well as multiple consecutive spaces) of the string (as a whole) are
|
---|
646 | insignificant.
|
---|
647 |
|
---|
648 | The second case illustrates that this simplified space handling would
|
---|
649 | cause sub-partitioning failures. That is, if a prepared any
|
---|
650 | substring matches a partition of the attribute value, then an
|
---|
651 | assertion constructed by subdividing that substring into multiple
|
---|
652 | substrings should also match.
|
---|
653 |
|
---|
654 | In designing an appropriate approach for space handling for
|
---|
655 | substrings matching, one must study key aspects of X.500 case
|
---|
656 | exact/ignore matching. X.520 [X.520] says:
|
---|
657 |
|
---|
658 | The [substrings] rule returns TRUE if there is a partitioning of
|
---|
659 | the attribute value (into portions) such that:
|
---|
660 |
|
---|
661 | - the specified substrings (initial, any, final) match
|
---|
662 | different portions of the value in the order of the strings
|
---|
663 | sequence;
|
---|
664 |
|
---|
665 | - initial, if present, matches the first portion of the value;
|
---|
666 |
|
---|
667 | - final, if present, matches the last portion of the value;
|
---|
668 |
|
---|
669 | - any, if present, matches some arbitrary portion of the
|
---|
670 | value.
|
---|
671 |
|
---|
672 | That is, the substrings assertion (CN=foo\20*\20bar) matches the
|
---|
673 | attribute value "foo<SPACE><SPACE>bar" as the value can be
|
---|
674 | partitioned into the portions "foo<SPACE>" and "<SPACE>bar" meeting
|
---|
675 | the above requirements.
|
---|
676 |
|
---|
677 |
|
---|
678 |
|
---|
679 |
|
---|
680 |
|
---|
681 |
|
---|
682 |
|
---|
683 |
|
---|
684 |
|
---|
685 | Zeilenga Standards Track [Page 12]
|
---|
686 | |
---|
687 |
|
---|
688 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
689 |
|
---|
690 |
|
---|
691 | X.520 also says:
|
---|
692 |
|
---|
693 | [T]he following spaces are regarded as not significant:
|
---|
694 |
|
---|
695 | - leading spaces (i.e., those preceding the first character
|
---|
696 | that is not a space);
|
---|
697 |
|
---|
698 | - trailing spaces (i.e., those following the last character
|
---|
699 | that is not a space);
|
---|
700 |
|
---|
701 | - multiple consecutive spaces (these are taken as equivalent
|
---|
702 | to a single space character).
|
---|
703 |
|
---|
704 | This statement applies to the assertion values and attribute values
|
---|
705 | as whole strings, and not individually to substrings of an assertion
|
---|
706 | value. In particular, the statements should be taken to mean that if
|
---|
707 | an assertion value and attribute value match without any
|
---|
708 | consideration to insignificant characters, then that assertion value
|
---|
709 | should also match any attribute value that differs only by inclusion
|
---|
710 | nor removal of insignificant characters.
|
---|
711 |
|
---|
712 | Hence the assertion (CN=foo\20*\20bar) matches
|
---|
713 | "foo<SPACE><SPACE><SPACE>bar" and "foo<SPACE>bar" as these values
|
---|
714 | only differ from "foo<SPACE><SPACE>bar" by the inclusion or removal
|
---|
715 | of insignificant spaces.
|
---|
716 |
|
---|
717 | Astute readers of this text will also note that there are special
|
---|
718 | cases where the specified space handling does not ignore spaces that
|
---|
719 | could be considered insignificant. For instance, the assertion
|
---|
720 | (CN=\20*\20*\20) does not match "<SPACE><SPACE><SPACE>"
|
---|
721 | (insignificant spaces present in value) or " " (insignificant spaces
|
---|
722 | not present in value). However, as these cases have no practical
|
---|
723 | application that cannot be met by simple assertions, e.g., (cn=\20),
|
---|
724 | and this minor anomaly can only be fully addressed by a preparation
|
---|
725 | algorithm to be used in conjunction with character-by-character
|
---|
726 | partitioning and matching, the anomaly is considered acceptable.
|
---|
727 |
|
---|
728 | Author's Address
|
---|
729 |
|
---|
730 | Kurt D. Zeilenga
|
---|
731 | OpenLDAP Foundation
|
---|
732 |
|
---|
733 | EMail: Kurt@OpenLDAP.org
|
---|
734 |
|
---|
735 |
|
---|
736 |
|
---|
737 |
|
---|
738 |
|
---|
739 |
|
---|
740 |
|
---|
741 |
|
---|
742 | Zeilenga Standards Track [Page 13]
|
---|
743 | |
---|
744 |
|
---|
745 | RFC 4518 LDAP: Internationalized String Preparation June 2006
|
---|
746 |
|
---|
747 |
|
---|
748 | Full Copyright Statement
|
---|
749 |
|
---|
750 | Copyright (C) The Internet Society (2006).
|
---|
751 |
|
---|
752 | This document is subject to the rights, licenses and restrictions
|
---|
753 | contained in BCP 78, and except as set forth therein, the authors
|
---|
754 | retain all their rights.
|
---|
755 |
|
---|
756 | This document and the information contained herein are provided on an
|
---|
757 | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
|
---|
758 | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
|
---|
759 | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
|
---|
760 | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
|
---|
761 | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
|
---|
762 | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
---|
763 |
|
---|
764 | Intellectual Property
|
---|
765 |
|
---|
766 | The IETF takes no position regarding the validity or scope of any
|
---|
767 | Intellectual Property Rights or other rights that might be claimed to
|
---|
768 | pertain to the implementation or use of the technology described in
|
---|
769 | this document or the extent to which any license under such rights
|
---|
770 | might or might not be available; nor does it represent that it has
|
---|
771 | made any independent effort to identify any such rights. Information
|
---|
772 | on the procedures with respect to rights in RFC documents can be
|
---|
773 | found in BCP 78 and BCP 79.
|
---|
774 |
|
---|
775 | Copies of IPR disclosures made to the IETF Secretariat and any
|
---|
776 | assurances of licenses to be made available, or the result of an
|
---|
777 | attempt made to obtain a general license or permission for the use of
|
---|
778 | such proprietary rights by implementers or users of this
|
---|
779 | specification can be obtained from the IETF on-line IPR repository at
|
---|
780 | http://www.ietf.org/ipr.
|
---|
781 |
|
---|
782 | The IETF invites any interested party to bring to its attention any
|
---|
783 | copyrights, patents or patent applications, or other proprietary
|
---|
784 | rights that may cover technology that may be required to implement
|
---|
785 | this standard. Please address the information to the IETF at
|
---|
786 | ietf-ipr@ietf.org.
|
---|
787 |
|
---|
788 | Acknowledgement
|
---|
789 |
|
---|
790 | Funding for the RFC Editor function is provided by the IETF
|
---|
791 | Administrative Support Activity (IASA).
|
---|
792 |
|
---|
793 |
|
---|
794 |
|
---|
795 |
|
---|
796 |
|
---|
797 |
|
---|
798 |
|
---|
799 | Zeilenga Standards Track [Page 14]
|
---|
800 | |
---|
801 |
|
---|