Changeset 38 for trunk/src/helpers/xmldefs.c
- Timestamp:
- Feb 17, 2001, 3:03:14 PM (25 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/src/helpers/xmldefs.c
r36 r38 22 22 23 23 /* 24 *@@category: Helpers\XML 25 * see xml.c. 24 *@@gloss: expat expat 25 * Expat is one of the most well-known XML processors (parsers). 26 * I (umoeller) have ported expat to the XWorkplace Helpers 27 * library. See xmlparse.c for an introduction to expat. See 28 * xml.c for an introduction to XML support in the XWorkplace 29 * Helpers in general. 30 */ 31 32 33 /* 34 *@@gloss: XML XML 35 * XML is the Extensible Markup Language, as defined by 36 * the W3C. XML isn't really a language, but a meta-language 37 * for describing markup languages. It is a simplified subset 38 * of SGML. 39 * 40 * You should be familiar with the following: 41 * 42 * -- XML parsers operate on XML @documents. 43 * 44 * -- Each XML document has both a physical and a logical 45 * structure. 46 * 47 * Physically, the document is composed of units called 48 * @entities. 49 * 50 * Logically, the document is composed of @markup and 51 * @content. Among other things, markup separates the content 52 * into @elements. 53 * 54 * -- The logical and physical structures must nest properly (be 55 * @well-formed) for each entity, which results in the entire 56 * XML document being well-formed as well. 26 57 */ 27 58 28 59 /* 29 60 *@@gloss: entities entities 30 * An "entity" is an XML storage unit. In the simplest case, an 31 * XML document has only one entity, which is an XML file. 32 * Except for the document entity (which is nameless), all 33 * entities are identified by their names. 34 * 35 * Entities are marked as either parsed or unparsed. 36 * 61 * An "entity" is an XML storage unit. It's a very abstract 62 * concept, and the term doesn't make much sense, but it was 63 * in SGML already, and XML chose to inherit it. 64 * 65 * In the simplest case, an XML document has only one entity, 66 * which is an XML file (or memory buffer from wherever). 37 67 * The document entity serves as the root of the entity tree 38 68 * and a starting-point for an XML processor. Unlike other … … 40 70 * appear on a processor input stream without any identification 41 71 * at all. 72 * 73 * Entities are defined to be either parsed or unparsed. 42 74 * 43 75 * Other than that, there are @internal_entities, … … 119 151 * They must be escaped unless used in a @CDATA section. 120 152 * 121 * -- To allow values in an @attributeto contain both single and double153 * -- To allow values in @attributes to contain both single and double 122 154 * quotes, the apostrophe or single-quote character (') may be 123 155 * represented as "&apos;", and the double-quote character 124 156 * (") as "&quot;". 125 157 * 126 * In addition, a@character_reference is a special case of an entity reference.158 * A numeric @character_reference is a special case of an entity reference. 127 159 * 128 160 * An internal entity is always parsed. … … 210 242 * 211 243 * Markup is either @elements, @entity_references, @comments, @CDATA 212 * section delimiters, @DTD's, and 213 * @processing_instructions. 244 * section delimiters, @DTD's, or @processing_instructions. 214 245 * 215 246 * XML "text" consists of markup and @content. … … 226 257 * or may not be interested in white space. Whitespace 227 258 * handling can therefore be handled differently for each 228 * element with the use of the special "xml:space" @attribute .259 * element with the use of the special "xml:space" @attributes. 229 260 */ 230 261 … … 291 322 + <P /> <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> 292 323 * 293 * An @attribute contains additional an parameter to an element.324 * In addition, @attributes contains extra parameters to elements. 294 325 * If the element has attributes, they must be in the start-tag 295 326 * (or empty-element tag). … … 311 342 312 343 /* 313 *@@gloss: attribute attribute344 *@@gloss: attributes attributes 314 345 * "Attributes" are name-value pairs that have been associated 315 346 * with @elements. Attributes can only appear in start-tags … … 370 401 * document's @content; an XML processor may, but 371 402 * need not, make it possible for an application to retrieve 372 * the text of comments ( expat has a handler for this).403 * the text of comments (@expat has a handler for this). 373 404 * 374 405 * Comments may contain any text except "--" (double-hyphen). … … 464 495 *@@gloss: valid valid 465 496 * XML @documents are said to be "valid" if they have a @DTD 466 * associated and they confirm to it. 497 * associated and they confirm to it. While XML documents 498 * must always be @well-formed, validation and validity is up 499 * to the implementation (i.e. at option to the application). 467 500 * 468 501 * Validating processors must report violations of the constraints … … 473 506 * referenced in the document. 474 507 * 475 * Non-validating processors are required to check only the 476 * document entity (see @entitites), including the entire 477 * internal DTD subset, for whether it is @well-formed. While 478 * they are not required to check the document for validity, 508 * Non-validating processors (such as @expat) are required to 509 * check only the document entity (see @entitites), including the 510 * entire internal DTD subset, for whether it is @well-formed. 511 * 512 * While they are not required to check the document for validity, 479 513 * they are required to process all the declarations they 480 514 * read in the internal DTD subset and in any parameter entity … … 482 516 * entity that they do not read; that is to say, they must 483 517 * use the information in those declarations to normalize 484 * @attribute values, include the replacement text of518 * values of @attributes, include the replacement text of 485 519 * @internal_entities, and supply default attribute values. 486 520 * They must not process entity declarations or attribute-list … … 492 526 /* 493 527 *@@gloss: encodings encodings 494 * In an encoding declaration, the values "UTF-8", "UTF-16", 495 * "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used 496 * for the various encodings and transformations of Unicode / 497 * ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... 498 * "ISO-8859-9" should be used for the parts of ISO 8859, and 499 * the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should 500 * be used for the various encoded forms of JIS X-0208-1997. 528 * XML supports a wide variety of character encodings. These 529 * must be specified in the XML @text_declaration. 530 * 531 * There are too many character encodings on the planet to 532 * be listed here. The most common ones are: 533 * 534 * -- "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" 535 * should be used for the various encodings and transformations 536 * of Unicode / ISO/IEC 10646. 537 * 538 * -- "ISO-8859-x" (with "x" being a number from 1 to 9) represent 539 * the various ISO 8859 ("Latin") encodings. 540 * 541 * -- "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for 542 * the various encoded forms of JIS X-0208-1997. 543 * 544 * Example of a @text_declaration: 545 * 546 + <?xml version="1.0" encoding="ISO-8859-2"?> 501 547 * 502 548 * All XML processors must be able to read @entities in either 503 * UTF-8 or UTF-16. 549 * UTF-8 or UTF-16. See XML_SetUnknownEncodingHandler for additional 550 * encodings directly supported by @expat. 504 551 * 505 552 * Entities encoded in UTF-16 must begin with the ZERO WIDTH NO-BREAK … … 508 555 * XML processors must be able to use this character to differentiate 509 556 * between UTF-8 and UTF-16 encoded documents. 510 *511 * See XML_ParserCreate for the encodings directly supported512 * by expat.513 557 */ 514 558 … … 576 620 * nature of their content. They look like this: 577 621 + 578 + <!ELEMENT name content model>622 + <!ELEMENT name contentspec> 579 623 + 580 * The "name" of the element is obvious. The "contentmodel" 624 * No element may be declared more than once. 625 * 626 * The "name" of the element is obvious. The "contentspec" 581 627 * is not. This specifies what may appear in the element 582 * and can be a list of: 583 * 584 * -- "#PCDATA", meaning "parsed character data" -- in 585 * other words, @content. 586 * 587 * -- Another element name with a specification about 588 * whether the element may or must appear once or 589 * more than once. 590 * 591 * -- "EMPTY" marks the element as being empty (i.e. no 592 * start- and end-tags, but a single tag only). 593 * 594 * The element specifyer can be: 595 * 596 * -- None: the subelement _must_ appear exactly once. 597 * 598 * -- "+": the subelement _must_ appear at _least_ once. 599 * 600 * -- "?": the subelement _may_ appear exactly once. 601 * 602 * -- "*": the subelement _may_ appear once or more than 603 * once or not at all. Note that this must always be 604 * specified with "#PCDATA". 605 * 606 * The list items can be separated with: 628 * and can be one of the following: 629 * 630 * -- "EMPTY" marks the element as being empty (i.e. 631 * having no content at all). 632 * 633 * -- "ANY" does not impose any restrictions. 634 * 635 * -- (mixed): a "list" which declares the element to have 636 * mixed content. See below. 637 * 638 * -- (children): a "list" which declares the element to 639 * have child elements only, but no content. See below. 640 * 641 * <B>(mixed): content with elements</B> 642 * 643 * With the (mixed) contentspec, an element may either contain 644 * @content only or @content with subelements. 645 * 646 * While the (children) contentspec allows you to define sequences 647 * and orders, this is not possible with (mixed). 648 * 649 * "contentspec" must then be a pair of parentheses, optionally 650 * followed by "*". In the brackets, there must be at least the 651 * keyword "#PCDATA", optionally followed by "|" and element 652 * names. Note that if no #PCDATA appears, the (children) model 653 * is assumed (see below). 654 * 655 * Examples: 656 * 657 + <!ELEMENT name (#PCDATA)* > 658 + <!ELEMENT name (#PCDATA | subname1 | subname2)* > 659 + <!ELEMENT name (#PCDATA) > 660 * 661 * Note that if you specify sub-element names, you must terminate 662 * the contentspec with "*". Again, there's no way to specify 663 * orders etc. with (mixed). 664 * 665 * <B>(children): Element content only</B> 666 * 667 * With the (children) contentspec, an element may contain 668 * only other elements (and @whitespace), but no other @content. 669 * 670 * This can become fairly complicated. "contentspec" then must be 671 * a "list" followed by a "repeater". 672 * 673 * A "repeater" can be: 674 * 675 * -- Nothing: the preceding item _must_ appear exactly once. 676 * 677 * -- "+": the preceding item _must_ appear at _least_ once. 678 * 679 * -- "?": the preceding item _may_ appear exactly once. 680 * 681 * -- "*": the preceding item _may_ appear once or more than 682 * once or not at all. 683 * 684 * Here's the most simple example (precluding that "SUBELEMENT" 685 * is a valid "list" here): 686 * 687 + <!ELEMENT name (SUBELEMENT)* > 688 * 689 * In other words, in (children) mode, "contentspec" must always 690 * be in brackets and is followed by a "repeater" (which can be 691 * nothing). 692 * 693 * About "lists"... since these declarations may nest, this is 694 * where the recursive definition of a "content particle" comes 695 * in: 696 * 697 * -- A "content particle" is either a sub-element name or 698 * a nested list, followed by a "repeater". 699 * 700 * -- A "list" is defined as an enumeration of content particles, 701 * enclosed in parentheses, where the content particles are 702 * separated by list separators. 703 * 704 * There are two types of list separators: 607 705 * 608 706 * -- Commas (",") indicate that the elements must appear 609 * in the s ame order.707 * in the specified order ("sequence"). 610 708 * 611 709 * -- Vertical bars ("|") specify that the elements may 612 * occur alternatively. 613 * 614 * Examples: 615 + 616 + <!ELEMENT oldjoke (burns+, allen, applause?)> 617 + <!ELEMENT burns (#PCDATA | quote)*> 618 + <!ELEMENT allen (#PCDATA | quote)*> 619 + <!ELEMENT quote (#PCDATA)*> 620 + <!ELEMENT applause EMPTY> 621 * 622 * This defines that the element "oldjoke" must contain 623 * "burns" and "allen" and may contain "applause". 624 * Only "burns" may appear more than once. 710 * occur alternatively ("choice"). 711 * 712 * The list separators cannot be mixed; the list must be 713 * either completely "sequence" or "choice". 714 * 715 * Examples of content particles: 716 * 717 + SUBELEMENT+ 718 + list* 719 * 720 * Examples of lists: 721 * 722 + ( cp | cp | cp | cp ) 723 + ( cp , cp , cp , cp ) 724 * 725 * Full examples for (children): 726 * 727 + <!ELEMENT oldjoke ( burns+, allen, applause? ) > 728 + | | +cp-+ | | 729 + | | | | 730 + | +------- list ---------+ | 731 + +-------contentspec--------+ 732 * 733 * This specifies a "seqlist" for the "oldjoke" element. The 734 * list is not nested, so the content particles are element 735 * names only. 736 * 737 * Within "oldjoke", "burns" must appear first and can appear 738 * once or several times. 739 * 740 * Next must be "allen", exactly once (since there's no repeater). 741 * 742 * Optionally ("?"), there can be "applause" at the end. 743 * 744 * Now, a nested example: 745 * 746 + <!ELEMENT WARPIN (REXX*, VARPROMPT*, MSG?, TITLE?, (GROUP | PCK)+), PAGE+) > 747 * 625 748 */ 626 749 … … 754 877 * in whole or in part within @parameter_entities. 755 878 */ 879 880 /* 881 *@@gloss: DOM DOM 882 * DOM is the "Document Object Model", as defined by the W3C. 883 * 884 * The DOM is a programming interface for @XML @documents. 885 * (XML is a metalanguage and describes the documents 886 * themselves. DOM is a programming interface -- an API -- 887 * to access XML documents.) 888 * 889 * The W3C calls this "a platform- and language-neutral 890 * interface that allows programs and scripts to dynamically 891 * access and update the content, structure and style of 892 * documents. The Document Object Model provides 893 * a standard set of objects for representing HTML and XML 894 * documents, a standard model of how these objects can 895 * be combined, and a standard interface for accessing and 896 * manipulating them. Vendors can support the DOM as an 897 * interface to their proprietary data structures and APIs, 898 * and content authors can write to the standard DOM 899 * interfaces rather than product-specific APIs, thus 900 * increasing interoperability on the Web." 901 * 902 * In short, DOM specifies that an XML document is broken 903 * up into a tree of "nodes", representing the various parts 904 * of an XML document. Such nodes represent @documents, 905 * @elements, @attributes, @processing_instructions, 906 * @comments, @content, and more. 907 * 908 * See xml.c for an introduction to XML and DOM support in 909 * the XWorkplace helpers. 910 * 911 * Example: Take this HTML table definition: 912 + 913 + <TABLE> 914 + <TBODY> 915 + <TR> 916 + <TD>Column 1-1</TD> 917 + <TD>Column 1-2</TD> 918 + </TR> 919 + <TR> 920 + <TD>Column 2-1</TD> 921 + <TD>Column 2-2</TD> 922 + </TR> 923 + </TBODY> 924 + </TABLE> 925 * 926 * In the DOM, this would be represented by a tree as follows: 927 + 928 + ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 929 + ³ TABLE ³ (only ELEMENT node in root DOCUMENT node) 930 + ÀÄÄÄÄÄÂÄÄÄÄÄÄÙ 931 + ³ 932 + ÚÄÄÄÄÄÁÄÄÄÄÄÄ¿ 933 + ³ TBODY ³ (only ELEMENT node in root "TABLE" node) 934 + ÀÄÄÄÄÄÂÄÄÄÄÄÄÙ 935 + ÚÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄ¿ 936 + ÚÄÄÄÄÄÁÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÁÄÄÄÄÄÄ¿ 937 + ³ TR ³ ³ TR ³ 938 + ÀÄÄÄÄÄÂÄÄÄÄÄÄÙ ÀÄÄÄÄÄÂÄÄÄÄÄÄÙ 939 + ÚÄÄÄÁÄÄÄÄÄÄ¿ ÚÄÄÄÁÄÄÄÄÄÄ¿ 940 + ÚÄÄÄÁÄ¿ ÚÄÄÁÄÄ¿ ÚÄÄÄÁÄ¿ ÚÄÄÁÄÄ¿ 941 + ³ TD ³ ³ TD ³ ³ TD ³ ³ TD ³ 942 + ÀÄÄÂÄÄÙ ÀÄÄÂÄÄÙ ÀÄÄÄÂÄÙ ÀÄÄÂÄÄÙ 943 + ÉÍÍÍÍÍÊÍÍÍÍ» ÉÍÍÍÍÊÍÍÍÍÍ» ÉÍÍÍÍÊÍÍÍÍÍ» ÉÍÍÊÍÍÍÍÍÍÍ» 944 + ºColumn 1-1º ºColumn 1-2º ºColumn 2-1º ºColumn 2-2º (one TEXT node in each parent node) 945 + ÈÍÍÍÍÍÍÍÍÍÍŒ ÈÍÍÍÍÍÍÍÍÍÍŒ ÈÍÍÍÍÍÍÍÍÍÍŒ ÈÍÍÍÍÍÍÍÍÍÍŒ 946 */ 947 948 /* 949 *@@gloss: DOM_DOCUMENT DOCUMENT 950 * representation of XML @documents in the @DOM. 951 * 952 * The xwphelpers implementation has the following differences 953 * to the DOM specs: 954 * 955 * -- The "doctype" member points to the documents @DTD, or is NULL. 956 * In our implementation, this is the pvExtra pointer, which points 957 * to a _DOMDTD. 958 * 959 * -- The "implementation" member points to a DOMImplementation object. 960 * This is not supported here. 961 * 962 * -- The "documentElement" member is a convenience pointer to the 963 * document's root element. We don't supply this field; instead, 964 * the llChildren list only contains a single ELEMENT node for the 965 * root element. 966 * 967 * -- The "createElement" method is implemented by xmlCreateElementNode. 968 * 969 * -- The "createAttribute" method is implemented by xmlCreateAttributeNode. 970 * 971 * -- The "createTextNode" method is implemented by xmlCreateTextNode, 972 * which has an extra parameter though. 973 * 974 * -- The "createComment" method is implemented by xmlCreateCommentNode. 975 * 976 * -- The "createProcessingInstruction" method is implemented by 977 * xmlCreatePINode. 978 * 979 * -- The "createDocumentFragment", "createCDATASection", and 980 * "createEntityReference" methods are not supported. 981 */ 982 983
Note:
See TracChangeset
for help on using the changeset viewer.