1 |
|
---|
2 | /*
|
---|
3 | *@@sourcefile xmldefs.c:
|
---|
4 | * this file is just for xdoc and contains glossary items for
|
---|
5 | * XML. It is never compiled.
|
---|
6 | *
|
---|
7 | *@@added V0.9.6 (2000-10-29) [umoeller]
|
---|
8 | */
|
---|
9 |
|
---|
10 | /*
|
---|
11 | * Copyright (C) 2001 Ulrich Mller.
|
---|
12 | * This file is part of the "XWorkplace helpers" source package.
|
---|
13 | * This is free software; you can redistribute it and/or modify
|
---|
14 | * it under the terms of the GNU General Public License as published
|
---|
15 | * by the Free Software Foundation, in version 2 as it comes in the
|
---|
16 | * "COPYING" file of the XWorkplace main distribution.
|
---|
17 | * This program is distributed in the hope that it will be useful,
|
---|
18 | * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
19 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
---|
20 | * GNU General Public License for more details.
|
---|
21 | */
|
---|
22 |
|
---|
23 | /*
|
---|
24 | *@@category: Helpers\XML
|
---|
25 | * see xml.c.
|
---|
26 | */
|
---|
27 |
|
---|
28 | /*
|
---|
29 | *@@gloss: entities entities
|
---|
30 | * An "entity" is an XML storage unit. In the simplest case, an
|
---|
31 | * XML document has only one entity, which is an XML file.
|
---|
32 | * Except for the document entity (which is nameless), all
|
---|
33 | * entities are identified by their names.
|
---|
34 | *
|
---|
35 | * Entities are marked as either parsed or unparsed.
|
---|
36 | *
|
---|
37 | * The document entity serves as the root of the entity tree
|
---|
38 | * and a starting-point for an XML processor. Unlike other
|
---|
39 | * entities, the document entity has no name and might well
|
---|
40 | * appear on a processor input stream without any identification
|
---|
41 | * at all.
|
---|
42 | *
|
---|
43 | * Other than that, there are @internal_entities,
|
---|
44 | * @external_entities, and @parameter_entities.
|
---|
45 | *
|
---|
46 | * See @entity_references for how to reference entities.
|
---|
47 | */
|
---|
48 |
|
---|
49 | /*
|
---|
50 | *@@gloss: entity_references entity references
|
---|
51 | * An "entity reference" refers to the content of a named
|
---|
52 | * entity (see: @entities). It is included in "&" and ";"
|
---|
53 | * characters.
|
---|
54 | *
|
---|
55 | * If you declare @internal_entities in the @DTD, referencing
|
---|
56 | * them allows for text replacements as in SGML:
|
---|
57 | *
|
---|
58 | + This document was prepared on &PrepDate;.
|
---|
59 | *
|
---|
60 | * The same works for @external_entities though. Assuming
|
---|
61 | * that "SecondFile" has been declared in the DTD to point
|
---|
62 | * to another file,
|
---|
63 | *
|
---|
64 | + See the following README: &SecondFile;
|
---|
65 | *
|
---|
66 | * would then insert the complete contents of the second
|
---|
67 | * file into the document. The XML processor will parse
|
---|
68 | * that file as if it were at that position in the original
|
---|
69 | * document.
|
---|
70 | *
|
---|
71 | * An entity is "included" when its replacement text
|
---|
72 | * is retrieved and processed, in place of the reference itself,
|
---|
73 | * as though it were part of the document at the location the
|
---|
74 | * reference was recognized.
|
---|
75 | * The replacement text may contain
|
---|
76 | * both @content and (except for @parameter_entities)
|
---|
77 | * @markup, which must be recognized in the usual way, except
|
---|
78 | * that the replacement text of entities used to escape markup
|
---|
79 | * delimiters (the entities amp, lt, gt, apos, quot) is always
|
---|
80 | * treated as data. (The string "AT&T;" expands to "AT&T;"
|
---|
81 | * and the remaining ampersand is not recognized as an
|
---|
82 | * entity-reference delimiter.) A @character_reference is
|
---|
83 | * included when the indicated character is processed in
|
---|
84 | * place of the reference itself.
|
---|
85 | *
|
---|
86 | * The following are forbidden, and constitute fatal errors:
|
---|
87 | *
|
---|
88 | * -- the appearance of a reference to an unparsed entity;
|
---|
89 | *
|
---|
90 | * -- the appearance of any character or general-entity reference
|
---|
91 | * in the @DTD except within an EntityValue or AttValue;
|
---|
92 | *
|
---|
93 | * -- a reference to an external entity in an attribute value.
|
---|
94 | */
|
---|
95 |
|
---|
96 | /*
|
---|
97 | *@@gloss: internal_entities internal entities
|
---|
98 | * An "internal entity" has no separate physical storage.
|
---|
99 | * Its contents appear in the document's @DTD as an
|
---|
100 | * @entity_declaration, like this:
|
---|
101 | *
|
---|
102 | + <!ENTITY PrepDate "Feb 11, 2001">
|
---|
103 | *
|
---|
104 | * This can later be referenced with @entity_references
|
---|
105 | * and allows you to define shortcuts for frequently typed
|
---|
106 | * text or text that is expected to change, such as the
|
---|
107 | * revision status of a document.
|
---|
108 | *
|
---|
109 | * XML has five built-in internal entities:
|
---|
110 | *
|
---|
111 | * -- "&amp;" refers to the ampersand ("&") character,
|
---|
112 | * which normally introduces @markup and can therefore
|
---|
113 | * only be literally used in @comments, @processing_instructions,
|
---|
114 | * or @CDATA sections. This is also legal within the literal
|
---|
115 | * entity value of declarations of internal entities.
|
---|
116 | *
|
---|
117 | * -- "&lt;" and "&gt;" refer to the angle brackets
|
---|
118 | * ("<", ">") which normally introduce @elements.
|
---|
119 | * They must be escaped unless used in a @CDATA section.
|
---|
120 | *
|
---|
121 | * -- To allow values in an @attribute to contain both single and double
|
---|
122 | * quotes, the apostrophe or single-quote character (') may be
|
---|
123 | * represented as "&apos;", and the double-quote character
|
---|
124 | * (") as "&quot;".
|
---|
125 | *
|
---|
126 | * In addition, a @character_reference is a special case of an entity reference.
|
---|
127 | *
|
---|
128 | * An internal entity is always parsed.
|
---|
129 | *
|
---|
130 | * Also see @entities.
|
---|
131 | */
|
---|
132 |
|
---|
133 | /*
|
---|
134 | *@@gloss: parameter_entities parameter entities
|
---|
135 | * Parameter entities can only be references in the @DTD.
|
---|
136 | * A parameter entity is identified by placing "% " (percent-space)
|
---|
137 | * in front of its name in the declaration. The percent sign is
|
---|
138 | * also used in references to parameter entities, instead of the
|
---|
139 | * ampersand. Parameter entity references are immediately expanded
|
---|
140 | * in the DTD and their replacement text is
|
---|
141 | * part of the declaration, whereas normal @entity_references are not
|
---|
142 | * expanded.
|
---|
143 | */
|
---|
144 |
|
---|
145 | /*
|
---|
146 | *@@gloss: external_entities external entities
|
---|
147 | * As opposed to @internal_entities, "external entities" refer
|
---|
148 | * to different storage.
|
---|
149 | *
|
---|
150 | * They must have a "system ID" with the URI specifying where
|
---|
151 | * the entity can be retrieved. Those URIs may be absolute
|
---|
152 | * or relative. Unless otherwise provided (e.g. by a special
|
---|
153 | * XML element type defined by a particular @DTD, or
|
---|
154 | * @processing_instructions defined by a particular application
|
---|
155 | * specification), relative URIs are relative to the location
|
---|
156 | * of the resource within which the entity declaration occurs.
|
---|
157 | *
|
---|
158 | * Optionally, external entities may specify a "public ID"
|
---|
159 | * as well. An XML processor attempting to retrieve the entity's
|
---|
160 | * content may use the public identifier to try to generate an
|
---|
161 | * alternative URI. If the processor is unable to do so, it must
|
---|
162 | * use the URI specified in the system literal. Before a match
|
---|
163 | * is attempted, all strings of @whitespace in the public
|
---|
164 | * identifier must be normalized to single space characters (#x20),
|
---|
165 | * and leading and trailing white space must be removed.
|
---|
166 | *
|
---|
167 | * An external entity is not always parsed.
|
---|
168 | *
|
---|
169 | * External entities allow an XML document to refer to an external
|
---|
170 | * file. External entities contain either text or binary data. If
|
---|
171 | * they contain text, the content of the external file is inserted
|
---|
172 | * at the point of reference and parsed as part of the referring
|
---|
173 | * document. Binary data is not parsed and may only be referenced
|
---|
174 | * in an attribute that has been declared as ENTITY or ENTITIES.
|
---|
175 | * Binary data is used to reference figures and
|
---|
176 | * other non-XML content in the document.
|
---|
177 | *
|
---|
178 | * Examples of external entity declarations:
|
---|
179 | +
|
---|
180 | + <!ENTITY open-hatch
|
---|
181 | + SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
|
---|
182 | + <!ENTITY open-hatch
|
---|
183 | + PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
|
---|
184 | + "http://www.textuality.com/boilerplate/OpenHatch.xml">
|
---|
185 | + <!ENTITY hatch-pic
|
---|
186 | + SYSTEM "../grafix/OpenHatch.gif" NDATA gif >
|
---|
187 | *
|
---|
188 | * Character @encoding is processed on a per-external-entity basis.
|
---|
189 | * As a result, each external parsed entity in an XML document may
|
---|
190 | * use a different encoding for its characters.
|
---|
191 | *
|
---|
192 | * In the document entity, the encoding declaration is part of the XML
|
---|
193 | * @text_declaration.
|
---|
194 | *
|
---|
195 | * Also see @entities.
|
---|
196 | */
|
---|
197 |
|
---|
198 | /*
|
---|
199 | *@@gloss: external_parsed_entities external parsed entities
|
---|
200 | * An external parsed entity is an external entity that has
|
---|
201 | * been parsed, which is not necessarily the case.
|
---|
202 | *
|
---|
203 | * See @external_entities.
|
---|
204 | */
|
---|
205 |
|
---|
206 | /*
|
---|
207 | *@@gloss: markup markup
|
---|
208 | * XML "markup" encodes a description of the @document's storage
|
---|
209 | * layout and logical structure.
|
---|
210 | *
|
---|
211 | * Markup is either @elements, @entity_references, @comments, @CDATA
|
---|
212 | * section delimiters, @DTD's, and
|
---|
213 | * @processing_instructions.
|
---|
214 | *
|
---|
215 | * XML "text" consists of markup and @content.
|
---|
216 | */
|
---|
217 |
|
---|
218 | /*
|
---|
219 | *@@gloss: whitespace whitespace
|
---|
220 | * In XML, "whitespace" consists of one or more space (0x20)
|
---|
221 | * characters, carriage returns, line feeds, or tabs.
|
---|
222 | *
|
---|
223 | * Whitespace handling in XML can vary. In @markup, this is
|
---|
224 | * used to separate the various @entities of course. However,
|
---|
225 | * in @content (i.e. non-markup), an application may
|
---|
226 | * or may not be interested in white space. Whitespace
|
---|
227 | * handling can therefore be handled differently for each
|
---|
228 | * element with the use of the special "xml:space" @attribute.
|
---|
229 | */
|
---|
230 |
|
---|
231 | /*
|
---|
232 | *@@gloss: character_reference character reference
|
---|
233 | * Character references escape Unicode characters. They are
|
---|
234 | * a special case of @entity_references.
|
---|
235 | *
|
---|
236 | * They may be used to refer to a specific character in the
|
---|
237 | * ISO/IEC 10646 character set, for example one not directly
|
---|
238 | * accessible from available input devices.
|
---|
239 | *
|
---|
240 | * If the character reference thus begins with "&#x", the
|
---|
241 | * digits and letters up to the terminating ";" provide a
|
---|
242 | * hexadecimal representation of the character's code point in
|
---|
243 | * ISO/IEC 10646. If it begins just with "&#", the
|
---|
244 | * digits up to the terminating ";" provide a decimal
|
---|
245 | * representation of the character's code point.
|
---|
246 | */
|
---|
247 |
|
---|
248 | /*
|
---|
249 | *@@gloss: content content
|
---|
250 | * XML "text" consists of @markup and "content" (the XML spec
|
---|
251 | * calls this "character data"). Content is simply everything
|
---|
252 | * that is not markup.
|
---|
253 | *
|
---|
254 | * To access characters that would either otherwise be recognized
|
---|
255 | * as @markup or are difficult to reach via the keyboard, XML
|
---|
256 | * allows for using a @character_reference.
|
---|
257 | *
|
---|
258 | * Within @elements, content is any string of
|
---|
259 | * characters which does not contain the start-delimiter of
|
---|
260 | * any markup. In a @CDATA section, content is any
|
---|
261 | * string of characters not including the CDATA-section-close
|
---|
262 | * delimiter, "]]>".
|
---|
263 | *
|
---|
264 | * The character @encodings may vary between @external_parsed_entities.
|
---|
265 | */
|
---|
266 |
|
---|
267 | /*
|
---|
268 | *@@gloss: names names
|
---|
269 | * In XML, a "name" is a token beginning with a letter or one of a
|
---|
270 | * few punctuation characters, and continuing with letters,
|
---|
271 | * digits, hyphens, underscores, colons, or full stops,
|
---|
272 | * together known as name characters. The colon has a
|
---|
273 | * special meaning with XML namespaces.
|
---|
274 | */
|
---|
275 |
|
---|
276 | /*
|
---|
277 | *@@gloss: elements elements
|
---|
278 | * Elements are the most common form of XML @markup.
|
---|
279 | * They are identified by their @names.
|
---|
280 | *
|
---|
281 | * As opposed to HTML, there are two types of elements:
|
---|
282 | *
|
---|
283 | * A non-empty element starts and ends with a start-tag
|
---|
284 | * and an end-tag:
|
---|
285 | *
|
---|
286 | + <LI>...</LI>
|
---|
287 | *
|
---|
288 | * As opposed to HTML, an empty element must have an
|
---|
289 | * empty-element tag:
|
---|
290 | *
|
---|
291 | + <P /> <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" />
|
---|
292 | *
|
---|
293 | * An @attribute contains additional an parameter to an element.
|
---|
294 | * If the element has attributes, they must be in the start-tag
|
---|
295 | * (or empty-element tag).
|
---|
296 | *
|
---|
297 | * For non-empty elements, the text between the start-tag
|
---|
298 | * and end-tag is called the element's content and may
|
---|
299 | * contain other elements, character data, an entity
|
---|
300 | * reference, a @CDATA section, a processing instruction,
|
---|
301 | * or a comment.
|
---|
302 | *
|
---|
303 | * The XML specs break this into "content particles".
|
---|
304 | *
|
---|
305 | * An element has "mixed content" when it may contain
|
---|
306 | * @content, optionally interspersed with child
|
---|
307 | * elements. In this case, the types of the child
|
---|
308 | * elements may be constrained by a documents @DTD, but
|
---|
309 | * not their order or their number of occurrences.
|
---|
310 | */
|
---|
311 |
|
---|
312 | /*
|
---|
313 | *@@gloss: attribute attribute
|
---|
314 | * "Attributes" are name-value pairs that have been associated
|
---|
315 | * with @elements. Attributes can only appear in start-tags
|
---|
316 | * or empty-tags.
|
---|
317 | *
|
---|
318 | * Attributes are identified by their @names. Each such
|
---|
319 | * identifier may only appear once per element.
|
---|
320 | *
|
---|
321 | * As opposed to HTML, attribute values must be quoted (either
|
---|
322 | * in single or double quotes). You may use a @character_reference
|
---|
323 | * to escape quotes in attribute values.
|
---|
324 | *
|
---|
325 | * Example of an attribute:
|
---|
326 | *
|
---|
327 | + <IMG SRC="mypic.gif" />
|
---|
328 | *
|
---|
329 | * SRC="mypic.gif" is the attribute here.
|
---|
330 | *
|
---|
331 | * There are a few <B>special attributes</B> defined by XML.
|
---|
332 | * In @valid documents, these attributes, like any other,
|
---|
333 | * must be declared if they are used. These attributes are
|
---|
334 | * recursive, i.e. they are considered to apply to all elements
|
---|
335 | * within the content of the element where they are specified,
|
---|
336 | * unless overridden in a sub-element.
|
---|
337 | *
|
---|
338 | * -- "xml:space" may be attached to an element to signal
|
---|
339 | * that @whitespace should be preserved for this element.
|
---|
340 | *
|
---|
341 | * The value "default" signals that applications' default
|
---|
342 | * whitespace processing modes are acceptable for this
|
---|
343 | * element; the value "preserve" indicates the intent that
|
---|
344 | * applications preserve all the white space.
|
---|
345 | *
|
---|
346 | * -- "xml:lang" may be inserted in documents to specify the
|
---|
347 | * language used in the contents and attribute values of
|
---|
348 | * any element in an XML document.
|
---|
349 | *
|
---|
350 | * The value is either a two-letter language code (e.g. "en")
|
---|
351 | * or a combination of language and country code. Interestingly,
|
---|
352 | * the English W3C XML spec gives the following examples:
|
---|
353 | *
|
---|
354 | + <p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
|
---|
355 | + <p xml:lang="en-GB">What colour is it?</p>
|
---|
356 | + <p xml:lang="en-US">What color is it?</p>
|
---|
357 | + <sp who="Faust" desc='leise' xml:lang="de">
|
---|
358 | + <l>Habe nun, ach! Philosophie,</l>
|
---|
359 | + <l>Juristerei, und Medizin</l>
|
---|
360 | + <l>und leider auch Theologie</l>
|
---|
361 | + <l>durchaus studiert mit heiáem Bemh'n.</l>
|
---|
362 | + </sp>
|
---|
363 | */
|
---|
364 |
|
---|
365 | /*
|
---|
366 | *@@gloss: comments comments
|
---|
367 | * Comments may appear anywhere in a document outside other
|
---|
368 | * markup; in addition, they may appear within the @DTD at
|
---|
369 | * places allowed by the grammar. They are not part of the
|
---|
370 | * document's @content; an XML processor may, but
|
---|
371 | * need not, make it possible for an application to retrieve
|
---|
372 | * the text of comments (expat has a handler for this).
|
---|
373 | *
|
---|
374 | * Comments may contain any text except "--" (double-hyphen).
|
---|
375 | *
|
---|
376 | * Example of a comment:
|
---|
377 | *
|
---|
378 | + <!-- declarations for <head> & <body> -->
|
---|
379 | */
|
---|
380 |
|
---|
381 | /*
|
---|
382 | *@@gloss: CDATA CDATA
|
---|
383 | * CDATA sections can appear anywhere where @content
|
---|
384 | * is allowed. They are used to escape blocks of
|
---|
385 | * text containing characters which would otherwise be
|
---|
386 | * recognized as @markup.
|
---|
387 | *
|
---|
388 | * CDATA sections begin with the string <![CDATA[ and end
|
---|
389 | * with the string ]]>. Within a CDATA section, only the
|
---|
390 | * ]]> string is recognized as @markup, so that left angle
|
---|
391 | * brackets and ampersands may occur in their literal form.
|
---|
392 | * They need not (and cannot) be escaped using "&lt;" and
|
---|
393 | * "&amp;". (This implies that not even @comments are
|
---|
394 | * recognized).
|
---|
395 | *
|
---|
396 | * CDATA sections cannot nest.
|
---|
397 | *
|
---|
398 | * Examples:
|
---|
399 | *
|
---|
400 | + <![CDATA[<greeting>Hello, world!</greeting>]]>
|
---|
401 | +
|
---|
402 | + <![CDATA[
|
---|
403 | + *p = &q;
|
---|
404 | + b = (i <= 3);
|
---|
405 | + ]]>
|
---|
406 | */
|
---|
407 |
|
---|
408 | /*
|
---|
409 | *@@gloss: processing_instructions processing instructions
|
---|
410 | * "Processing instructions" (PIs) contain additional
|
---|
411 | * data for applications.
|
---|
412 | *
|
---|
413 | * Like @comments, they are not textually part of the XML
|
---|
414 | * document, but the XML processor is required to pass
|
---|
415 | * them to an application.
|
---|
416 | *
|
---|
417 | * PIs have the form:
|
---|
418 | *
|
---|
419 | + <?name pidata?>
|
---|
420 | *
|
---|
421 | *
|
---|
422 | * The "name", called the PI "target", identifies the PI to
|
---|
423 | * the application. Applications should process only the
|
---|
424 | * targets they recognize and ignore all other PIs. Any
|
---|
425 | * data that follows the PI target is optional, it is for
|
---|
426 | * the application that recognizes the target. The names
|
---|
427 | * used in PIs may be declared in a @notation_declaration in order to
|
---|
428 | * formally identify them.
|
---|
429 | *
|
---|
430 | * PI names beginning with "xml" are reserved.
|
---|
431 | */
|
---|
432 |
|
---|
433 | /*
|
---|
434 | *@@gloss: well-formed well-formed
|
---|
435 | * XML @documents (the sum of all @entities) are "well-formed"
|
---|
436 | * if the following conditions are met (among others):
|
---|
437 | *
|
---|
438 | * -- They contain one or more @elements.
|
---|
439 | *
|
---|
440 | * -- There is exactly one element, called the root, or document
|
---|
441 | * element, no part of which appears in the @content of any
|
---|
442 | * other element.
|
---|
443 | *
|
---|
444 | * -- For all other elements, if the start-tag is in the content
|
---|
445 | * of another element, the end-tag is in the content of the
|
---|
446 | * same element. More simply stated, the elements nest
|
---|
447 | * properly within each other. (This is unlike HTML.)
|
---|
448 | *
|
---|
449 | * -- Values of string @attributes cannot contain references to
|
---|
450 | * @external_entities.
|
---|
451 | *
|
---|
452 | * -- No attribute may appear more than once in the same element.
|
---|
453 | *
|
---|
454 | * -- All entities except the amp, lt, gt, apos, and quot must be
|
---|
455 | * declared before they are used. Binary @external_entities
|
---|
456 | * cannot be referenced in the flow of @content, it can only
|
---|
457 | * be used in an attribute declared as ENTITY or ENTITIES.
|
---|
458 | *
|
---|
459 | * -- Neither text nor @parameter_entities are allowed to be
|
---|
460 | * recursive, directly or indirectly.
|
---|
461 | */
|
---|
462 |
|
---|
463 | /*
|
---|
464 | *@@gloss: valid valid
|
---|
465 | * XML @documents are said to be "valid" if they have a @DTD
|
---|
466 | * associated and they confirm to it.
|
---|
467 | *
|
---|
468 | * Validating processors must report violations of the constraints
|
---|
469 | * expressed by the declarations in the @DTD, and failures to
|
---|
470 | * fulfill the validity constraints given in this specification.
|
---|
471 | * To accomplish this, validating XML processors must read and
|
---|
472 | * process the entire DTD and all @external_parsed_entities
|
---|
473 | * referenced in the document.
|
---|
474 | *
|
---|
475 | * Non-validating processors are required to check only the
|
---|
476 | * document entity (see @entitites), including the entire
|
---|
477 | * internal DTD subset, for whether it is @well-formed. While
|
---|
478 | * they are not required to check the document for validity,
|
---|
479 | * they are required to process all the declarations they
|
---|
480 | * read in the internal DTD subset and in any parameter entity
|
---|
481 | * that they read, up to the first reference to a parameter
|
---|
482 | * entity that they do not read; that is to say, they must
|
---|
483 | * use the information in those declarations to normalize
|
---|
484 | * @attribute values, include the replacement text of
|
---|
485 | * @internal_entities, and supply default attribute values.
|
---|
486 | * They must not process entity declarations or attribute-list
|
---|
487 | * declarations encountered after a reference to a
|
---|
488 | * parameter entity that is not read, since the entity may have
|
---|
489 | * contained overriding declarations.
|
---|
490 | */
|
---|
491 |
|
---|
492 | /*
|
---|
493 | *@@gloss: encodings encodings
|
---|
494 | * In an encoding declaration, the values "UTF-8", "UTF-16",
|
---|
495 | * "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used
|
---|
496 | * for the various encodings and transformations of Unicode /
|
---|
497 | * ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ...
|
---|
498 | * "ISO-8859-9" should be used for the parts of ISO 8859, and
|
---|
499 | * the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should
|
---|
500 | * be used for the various encoded forms of JIS X-0208-1997.
|
---|
501 | *
|
---|
502 | * All XML processors must be able to read @entities in either
|
---|
503 | * UTF-8 or UTF-16.
|
---|
504 | *
|
---|
505 | * Entities encoded in UTF-16 must begin with the ZERO WIDTH NO-BREAK
|
---|
506 | * SPACE character, #xFEFF). This is an encoding signature, not part
|
---|
507 | * of either the @markup or the @content of the XML @document.
|
---|
508 | * XML processors must be able to use this character to differentiate
|
---|
509 | * between UTF-8 and UTF-16 encoded documents.
|
---|
510 | *
|
---|
511 | * See XML_ParserCreate for the encodings directly supported
|
---|
512 | * by expat.
|
---|
513 | */
|
---|
514 |
|
---|
515 | /*
|
---|
516 | *@@gloss: text_declaration text declaration
|
---|
517 | * XML @documents and @external_parsed_entities may (and
|
---|
518 | * should) start with the XML text declaration, exactly like
|
---|
519 | * this:
|
---|
520 | *
|
---|
521 | + <?xml version="1.0" encoding="enc"?>
|
---|
522 | *
|
---|
523 | * where "1.0" is the only currently defined XML version
|
---|
524 | * and "enc" must be the encoding of the document.
|
---|
525 | *
|
---|
526 | * External parsed entities may begin with a text declaration,
|
---|
527 | * which looks like an XML declaration with just an encoding
|
---|
528 | * declaration:
|
---|
529 | *
|
---|
530 | + <?xml encoding="Big5"?>
|
---|
531 | *
|
---|
532 | * See @encodings.
|
---|
533 | *
|
---|
534 | * Example:
|
---|
535 | *
|
---|
536 | + <?xml version="1.0" encoding="ISO-8859-1"?>
|
---|
537 | */
|
---|
538 |
|
---|
539 | /*
|
---|
540 | *@@gloss: documents documents
|
---|
541 | * XML documents are made up of storage units called @entities,
|
---|
542 | * which contain either parsed or unparsed data. Parsed data is
|
---|
543 | * made up of characters, some of which form @content,
|
---|
544 | * and some of which form @markup.
|
---|
545 | *
|
---|
546 | * XML documents should start the with the XML @text_declaration.
|
---|
547 | *
|
---|
548 | * The function of the @markup in an XML document is to describe
|
---|
549 | * its storage and logical structure and to associate attribute-value
|
---|
550 | * pairs with its logical structures. XML provides a mechanism,
|
---|
551 | * the document type declaration (@DTD), to define constraints
|
---|
552 | * on the logical structure and to support the use of predefined
|
---|
553 | * storage units.
|
---|
554 | *
|
---|
555 | * A data object is an XML document if it is @well-formed.
|
---|
556 | * A well-formed XML document may in addition be @valid if it
|
---|
557 | * meets certain further constraints.
|
---|
558 | *
|
---|
559 | * A very simple XML document looks like this:
|
---|
560 | *
|
---|
561 | + <?xml version="1.0"?>
|
---|
562 | + <oldjoke>
|
---|
563 | + <burns>Say <quote>goodnight</quote>, Gracie.</burns>
|
---|
564 | + <allen><quote>Goodnight, Gracie.</quote></allen>
|
---|
565 | + <applause/>
|
---|
566 | + </oldjoke>
|
---|
567 | *
|
---|
568 | * This document is @well-formed, but not @valid (because it
|
---|
569 | * has no @DTD).
|
---|
570 | *
|
---|
571 | */
|
---|
572 |
|
---|
573 | /*
|
---|
574 | *@@gloss: element_declaration element declaration
|
---|
575 | * Element declarations identify the @names of elements and the
|
---|
576 | * nature of their content. They look like this:
|
---|
577 | +
|
---|
578 | + <!ELEMENT name contentmodel>
|
---|
579 | +
|
---|
580 | * The "name" of the element is obvious. The "contentmodel"
|
---|
581 | * is not. This specifies what may appear in the element
|
---|
582 | * and can be a list of:
|
---|
583 | *
|
---|
584 | * -- "#PCDATA", meaning "parsed character data" -- in
|
---|
585 | * other words, @content.
|
---|
586 | *
|
---|
587 | * -- Another element name with a specification about
|
---|
588 | * whether the element may or must appear once or
|
---|
589 | * more than once.
|
---|
590 | *
|
---|
591 | * -- "EMPTY" marks the element as being empty (i.e. no
|
---|
592 | * start- and end-tags, but a single tag only).
|
---|
593 | *
|
---|
594 | * The element specifyer can be:
|
---|
595 | *
|
---|
596 | * -- None: the subelement _must_ appear exactly once.
|
---|
597 | *
|
---|
598 | * -- "+": the subelement _must_ appear at _least_ once.
|
---|
599 | *
|
---|
600 | * -- "?": the subelement _may_ appear exactly once.
|
---|
601 | *
|
---|
602 | * -- "*": the subelement _may_ appear once or more than
|
---|
603 | * once or not at all. Note that this must always be
|
---|
604 | * specified with "#PCDATA".
|
---|
605 | *
|
---|
606 | * The list items can be separated with:
|
---|
607 | *
|
---|
608 | * -- Commas (",") indicate that the elements must appear
|
---|
609 | * in the same order.
|
---|
610 | *
|
---|
611 | * -- Vertical bars ("|") specify that the elements may
|
---|
612 | * occur alternatively.
|
---|
613 | *
|
---|
614 | * Examples:
|
---|
615 | +
|
---|
616 | + <!ELEMENT oldjoke (burns+, allen, applause?)>
|
---|
617 | + <!ELEMENT burns (#PCDATA | quote)*>
|
---|
618 | + <!ELEMENT allen (#PCDATA | quote)*>
|
---|
619 | + <!ELEMENT quote (#PCDATA)*>
|
---|
620 | + <!ELEMENT applause EMPTY>
|
---|
621 | *
|
---|
622 | * This defines that the element "oldjoke" must contain
|
---|
623 | * "burns" and "allen" and may contain "applause".
|
---|
624 | * Only "burns" may appear more than once.
|
---|
625 | */
|
---|
626 |
|
---|
627 | /*
|
---|
628 | *@@gloss: attribute_declaration attribute declaration
|
---|
629 | * Attribute declarations identify the @names of attributes
|
---|
630 | * of @elements and their possible values. They look like this:
|
---|
631 | *
|
---|
632 | + <!ATTLIST elementname
|
---|
633 | + attname atttype defaultvalue
|
---|
634 | + attname atttype defaultvalue
|
---|
635 | + ... >
|
---|
636 | *
|
---|
637 | * "elementname" is the element name for which the
|
---|
638 | * attributes are being defined.
|
---|
639 | *
|
---|
640 | * For each attribute, you must then specify three
|
---|
641 | * columns:
|
---|
642 | *
|
---|
643 | * -- "attname" is the attribute name.
|
---|
644 | *
|
---|
645 | * -- "atttype" is the attribute type (one of six values,
|
---|
646 | * see below).
|
---|
647 | *
|
---|
648 | * -- "defaultvalue" specifies the default value.
|
---|
649 | *
|
---|
650 | * The attribute type (specifying the value type) must be
|
---|
651 | * one of six:
|
---|
652 | *
|
---|
653 | * -- "CDATA" is any character data. (This has nothing to
|
---|
654 | * do with @CDATA sections.)
|
---|
655 | *
|
---|
656 | * -- "ID": the value must be a unique @name among the
|
---|
657 | * document. Only one such attribute is allowed per
|
---|
658 | * element.
|
---|
659 | *
|
---|
660 | * -- "IDREF" or "IDREFS": a reference to some other
|
---|
661 | * element which has an "ID" attribute with this value.
|
---|
662 | * "IDREFS" is the plural and may contain several of
|
---|
663 | * those separated by @whitespace.
|
---|
664 | *
|
---|
665 | * -- "ENTITY" or "ENTITIES": a reference to some an
|
---|
666 | * external entity (see @external_entities).
|
---|
667 | * "ENTITIES" is the plural and may contain several of
|
---|
668 | * those separated by @whitespace.
|
---|
669 | *
|
---|
670 | * -- "NMTOKEN" or "NMTOKENS": a single-word string.
|
---|
671 | * This is not a reference though.
|
---|
672 | * "NMTOKENS" is the plural and may contain several of
|
---|
673 | * those separated by @whitespace.
|
---|
674 | *
|
---|
675 | * -- an enumeration: an explicit list of allowed
|
---|
676 | * values for this attribute. Additionally, you can specify
|
---|
677 | * that the names must match a particular @notation_declaration.
|
---|
678 | *
|
---|
679 | * The "defaultvalue" (third column) can be one of these:
|
---|
680 | *
|
---|
681 | * -- "#REQUIRED": the attribute may not be omitted.
|
---|
682 | *
|
---|
683 | * -- "#IMPLIED": the attribute is optional, and there's
|
---|
684 | * no default value.
|
---|
685 | *
|
---|
686 | * -- "'value'": the attribute is optional, and it has
|
---|
687 | * this default.
|
---|
688 | *
|
---|
689 | * -- "#FIXED 'value'": the attribute is optional, but if
|
---|
690 | * it appears, it must have this value.
|
---|
691 | *
|
---|
692 | * Example:
|
---|
693 | *
|
---|
694 | + <!ATTLIST oldjoke
|
---|
695 | + name ID #REQUIRED
|
---|
696 | + label CDATA #IMPLIED
|
---|
697 | + status ( funny | notfunny ) 'funny'>
|
---|
698 | */
|
---|
699 |
|
---|
700 | /*
|
---|
701 | *@@gloss: entity_declaration entity declaration
|
---|
702 | * Entity declarations define @entities.
|
---|
703 | *
|
---|
704 | * An example of @internal_entities:
|
---|
705 | *
|
---|
706 | + <!ENTITY ATI "ArborText, Inc.">
|
---|
707 | *
|
---|
708 | * Examples of @external_entities:
|
---|
709 | *
|
---|
710 | + <!ENTITY boilerplate SYSTEM "/standard/legalnotice.xml">
|
---|
711 | + <!ENTITY ATIlogo SYSTEM "/standard/logo.gif" NDATA GIF87A>
|
---|
712 | */
|
---|
713 |
|
---|
714 | /*
|
---|
715 | *@@gloss: notation_declaration notation declaration
|
---|
716 | * Notation declarations identify specific types of external
|
---|
717 | * binary data. This information is passed to the processing
|
---|
718 | * application, which may make whatever use of it it wishes.
|
---|
719 | *
|
---|
720 | * Example:
|
---|
721 | *
|
---|
722 | + <!NOTATION GIF87A SYSTEM "GIF">
|
---|
723 | */
|
---|
724 |
|
---|
725 | /*
|
---|
726 | *@@gloss: DTD DTD
|
---|
727 | * The XML document type declaration contains or points to
|
---|
728 | * markup declarations that provide a grammar for a class of @documents.
|
---|
729 | * This grammar is known as a Document Type Definition, or DTD.
|
---|
730 | *
|
---|
731 | * The DTD must look like the following:
|
---|
732 | *
|
---|
733 | + <!DOCTYPE name ... >
|
---|
734 | *
|
---|
735 | * "name" must match the document's root element.
|
---|
736 | *
|
---|
737 | * "..." can be the reference to an external subset (being a special
|
---|
738 | * case of @external_entities):
|
---|
739 | *
|
---|
740 | + <!DOCTYPE name SYSTEM "whatever.dtd">
|
---|
741 | *
|
---|
742 | * or an internal subset in brackets, which contains the markup
|
---|
743 | * directly:
|
---|
744 | *
|
---|
745 | + <!DOCTYPE name [
|
---|
746 | + <!ELEMENT greeting (#PCDATA)>
|
---|
747 | + ]>
|
---|
748 | *
|
---|
749 | * You can even mix both.
|
---|
750 | *
|
---|
751 | * A markup declaration is either an @element_declaration, an
|
---|
752 | * @attribute_declaration, an @entity_declaration,
|
---|
753 | * or a @notation_declaration. These declarations may be contained
|
---|
754 | * in whole or in part within @parameter_entities.
|
---|
755 | */
|
---|