It’s often convenient to divide long XML documents into multiple files. The classic example is a book, customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally this has implemented via external entity references. For example,
<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "book.dtd"[
<!ENTITY chapter1 SYSTEM "malapropisms.xml">
<!ENTITY chapter2 SYSTEM "mispronunciations.xml">
<!ENTITY chapter3 SYSTEM "madeupwords.xml">
]>
<book>
<title>The Wit and Wisdom of George W. Bush</title>
&chapter1;
&chapter2;
&chapter3;
</book>
However, external entity references have a number of limitations. Among them:
The individual component files cannot be treated in isolation. They often aren’t themselves full, well-formed XML documents. They cannot have document type declarations.
-
The document must have a DTD, and the parser must read the DTD. Not all parsers do.
-
If any of the pieces are missing, then the entire document is malformed. There’s no option for error recovery.
-
Only entire files can be included. You can’t include just one paragraph from a document.
-
There’s no way to include unparsed text such as an example Java program or XML document in a technical book. Only well-formed XML can be included, and all such XML is parsed. (SGML actually had this ability, but it was one of the features XML removed in the process of simplification.)
XInclude is an emerging specification from the W3C that endeavors to create a mechanism for building large XML documents out of their component parts which does not have these limitations. XInclude can combine multiple documents and parts thereof independently of validation. Each piece can be a complete XML document, a part of an XML document, or a non-XML text document like a Java program or an e-mail message.
(more…)