Context Navigation

libxmllib.tex

Visit:

Last change on this file was 3225, checked in by bird, 18 years ago
Python 2.5
File size: 12.4 KB

Line
1	\section{\module{xmllib} ---
2	A parser for XML documents}
3
4	\declaremodule{standard}{xmllib}
5	\modulesynopsis{A parser for XML documents.}
6	\moduleauthor{Sjoerd Mullender}{Sjoerd.Mullender@cwi.nl}
7	\sectionauthor{Sjoerd Mullender}{Sjoerd.Mullender@cwi.nl}
8
9
10	\index{XML}
11	\index{Extensible Markup Language}
12
13	\deprecated{2.0}{Use \refmodule{xml.sax} instead. The newer XML
14	package includes full support for XML 1.0.}
15
16	\versionchanged[Added namespace support]{1.5.2}
17
18	This module defines a class \class{XMLParser} which serves as the basis
19	for parsing text files formatted in XML (Extensible Markup Language).
20
21	\begin{classdesc}{XMLParser}{}
22	The \class{XMLParser} class must be instantiated without
23	arguments.\footnote{Actually, a number of keyword arguments are
24	recognized which influence the parser to accept certain non-standard
25	constructs. The following keyword arguments are currently
26	recognized. The defaults for all of these is \code{0} (false) except
27	for the last one for which the default is \code{1} (true).
28	\var{accept_unquoted_attributes} (accept certain attribute values
29	without requiring quotes), \var{accept_missing_endtag_name} (accept
30	end tags that look like \code{</>}), \var{map_case} (map upper case to
31	lower case in tags and attributes), \var{accept_utf8} (allow UTF-8
32	characters in input; this is required according to the XML standard,
33	but Python does not as yet deal properly with these characters, so
34	this is not the default), \var{translate_attribute_references} (don't
35	attempt to translate character and entity references in attribute values).}
36	\end{classdesc}
37
38	This class provides the following interface methods and instance variables:
39
40	\begin{memberdesc}{attributes}
41	A mapping of element names to mappings. The latter mapping maps
42	attribute names that are valid for the element to the default value of
43	the attribute, or if there is no default to \code{None}. The default
44	value is the empty dictionary. This variable is meant to be
45	overridden, not extended since the default is shared by all instances
46	of \class{XMLParser}.
47	\end{memberdesc}
48
49	\begin{memberdesc}{elements}
50	A mapping of element names to tuples. The tuples contain a function
51	for handling the start and end tag respectively of the element, or
52	\code{None} if the method \method{unknown_starttag()} or
53	\method{unknown_endtag()} is to be called. The default value is the
54	empty dictionary. This variable is meant to be overridden, not
55	extended since the default is shared by all instances of
56	\class{XMLParser}.
57	\end{memberdesc}
58
59	\begin{memberdesc}{entitydefs}
60	A mapping of entitynames to their values. The default value contains
61	definitions for \code{'lt'}, \code{'gt'}, \code{'amp'}, \code{'quot'},
62	and \code{'apos'}.
63	\end{memberdesc}
64
65	\begin{methoddesc}{reset}{}
66	Reset the instance. Loses all unprocessed data. This is called
67	implicitly at the instantiation time.
68	\end{methoddesc}
69
70	\begin{methoddesc}{setnomoretags}{}
71	Stop processing tags. Treat all following input as literal input
72	(CDATA).
73	\end{methoddesc}
74
75	\begin{methoddesc}{setliteral}{}
76	Enter literal mode (CDATA mode). This mode is automatically exited
77	when the close tag matching the last unclosed open tag is encountered.
78	\end{methoddesc}
79
80	\begin{methoddesc}{feed}{data}
81	Feed some text to the parser. It is processed insofar as it consists
82	of complete tags; incomplete data is buffered until more data is
83	fed or \method{close()} is called.
84	\end{methoddesc}
85
86	\begin{methoddesc}{close}{}
87	Force processing of all buffered data as if it were followed by an
88	end-of-file mark. This method may be redefined by a derived class to
89	define additional processing at the end of the input, but the
90	redefined version should always call \method{close()}.
91	\end{methoddesc}
92
93	\begin{methoddesc}{translate_references}{data}
94	Translate all entity and character references in \var{data} and
95	return the translated string.
96	\end{methoddesc}
97
98	\begin{methoddesc}{getnamespace}{}
99	Return a mapping of namespace abbreviations to namespace URIs that are
100	currently in effect.
101	\end{methoddesc}
102
103	\begin{methoddesc}{handle_xml}{encoding, standalone}
104	This method is called when the \samp{<?xml ...?>} tag is processed.
105	The arguments are the values of the encoding and standalone attributes
106	in the tag. Both encoding and standalone are optional. The values
107	passed to \method{handle_xml()} default to \code{None} and the string
108	\code{'no'} respectively.
109	\end{methoddesc}
110
111	\begin{methoddesc}{handle_doctype}{tag, pubid, syslit, data}
112	This\index{DOCTYPE declaration} method is called when the
113	\samp{<!DOCTYPE...>} declaration is processed. The arguments are the
114	tag name of the root element, the Formal Public\index{Formal Public
115	Identifier} Identifier (or \code{None} if not specified), the system
116	identifier, and the uninterpreted contents of the internal DTD subset
117	as a string (or \code{None} if not present).
118	\end{methoddesc}
119
120	\begin{methoddesc}{handle_starttag}{tag, method, attributes}
121	This method is called to handle start tags for which a start tag
122	handler is defined in the instance variable \member{elements}. The
123	\var{tag} argument is the name of the tag, and the
124	\var{method} argument is the function (method) which should be used to
125	support semantic interpretation of the start tag. The
126	\var{attributes} argument is a dictionary of attributes, the key being
127	the \var{name} and the value being the \var{value} of the attribute
128	found inside the tag's \code{<>} brackets. Character and entity
129	references in the \var{value} have been interpreted. For instance,
130	for the start tag \code{<A HREF="http://www.cwi.nl/">}, this method
131	would be called as \code{handle_starttag('A', self.elements['A'][0],
132	\{'HREF': 'http://www.cwi.nl/'\})}. The base implementation simply
133	calls \var{method} with \var{attributes} as the only argument.
134	\end{methoddesc}
135
136	\begin{methoddesc}{handle_endtag}{tag, method}
137	This method is called to handle endtags for which an end tag handler
138	is defined in the instance variable \member{elements}. The \var{tag}
139	argument is the name of the tag, and the \var{method} argument is the
140	function (method) which should be used to support semantic
141	interpretation of the end tag. For instance, for the endtag
142	\code{</A>}, this method would be called as \code{handle_endtag('A',
143	self.elements['A'][1])}. The base implementation simply calls
144	\var{method}.
145	\end{methoddesc}
146
147	\begin{methoddesc}{handle_data}{data}
148	This method is called to process arbitrary data. It is intended to be
149	overridden by a derived class; the base class implementation does
150	nothing.
151	\end{methoddesc}
152
153	\begin{methoddesc}{handle_charref}{ref}
154	This method is called to process a character reference of the form
155	\samp{\&\#\var{ref};}. \var{ref} can either be a decimal number,
156	or a hexadecimal number when preceded by an \character{x}.
157	In the base implementation, \var{ref} must be a number in the
158	range 0-255. It translates the character to \ASCII{} and calls the
159	method \method{handle_data()} with the character as argument. If
160	\var{ref} is invalid or out of range, the method
161	\code{unknown_charref(\var{ref})} is called to handle the error. A
162	subclass must override this method to provide support for character
163	references outside of the \ASCII{} range.
164	\end{methoddesc}
165
166	\begin{methoddesc}{handle_comment}{comment}
167	This method is called when a comment is encountered. The
168	\var{comment} argument is a string containing the text between the
169	\samp{<!--} and \samp{-->} delimiters, but not the delimiters
170	themselves. For example, the comment \samp{<!--text-->} will
171	cause this method to be called with the argument \code{'text'}. The
172	default method does nothing.
173	\end{methoddesc}
174
175	\begin{methoddesc}{handle_cdata}{data}
176	This method is called when a CDATA element is encountered. The
177	\var{data} argument is a string containing the text between the
178	\samp{<![CDATA[} and \samp{]]>} delimiters, but not the delimiters
179	themselves. For example, the entity \samp{<![CDATA[text]]>} will
180	cause this method to be called with the argument \code{'text'}. The
181	default method does nothing, and is intended to be overridden.
182	\end{methoddesc}
183
184	\begin{methoddesc}{handle_proc}{name, data}
185	This method is called when a processing instruction (PI) is
186	encountered. The \var{name} is the PI target, and the \var{data}
187	argument is a string containing the text between the PI target and the
188	closing delimiter, but not the delimiter itself. For example, the
189	instruction \samp{<?XML text?>} will cause this method to be called
190	with the arguments \code{'XML'} and \code{'text'}. The default method
191	does nothing. Note that if a document starts with \samp{<?xml
192	..?>}, \method{handle_xml()} is called to handle it.
193	\end{methoddesc}
194
195	\begin{methoddesc}{handle_special}{data}
196	This method is called when a declaration is encountered. The
197	\var{data} argument is a string containing the text between the
198	\samp{<!} and \samp{>} delimiters, but not the delimiters
199	themselves. For example, the \index{ENTITY declaration}entity
200	declaration \samp{<!ENTITY text>} will cause this method to be called
201	with the argument \code{'ENTITY text'}. The default method does
202	nothing. Note that \samp{<!DOCTYPE ...>} is handled separately if it
203	is located at the start of the document.
204	\end{methoddesc}
205
206	\begin{methoddesc}{syntax_error}{message}
207	This method is called when a syntax error is encountered. The
208	\var{message} is a description of what was wrong. The default method
209	raises a \exception{RuntimeError} exception. If this method is
210	overridden, it is permissible for it to return. This method is only
211	called when the error can be recovered from. Unrecoverable errors
212	raise a \exception{RuntimeError} without first calling
213	\method{syntax_error()}.
214	\end{methoddesc}
215
216	\begin{methoddesc}{unknown_starttag}{tag, attributes}
217	This method is called to process an unknown start tag. It is intended
218	to be overridden by a derived class; the base class implementation
219	does nothing.
220	\end{methoddesc}
221
222	\begin{methoddesc}{unknown_endtag}{tag}
223	This method is called to process an unknown end tag. It is intended
224	to be overridden by a derived class; the base class implementation
225	does nothing.
226	\end{methoddesc}
227
228	\begin{methoddesc}{unknown_charref}{ref}
229	This method is called to process unresolvable numeric character
230	references. It is intended to be overridden by a derived class; the
231	base class implementation does nothing.
232	\end{methoddesc}
233
234	\begin{methoddesc}{unknown_entityref}{ref}
235	This method is called to process an unknown entity reference. It is
236	intended to be overridden by a derived class; the base class
237	implementation calls \method{syntax_error()} to signal an error.
238	\end{methoddesc}
239
240
241	\begin{seealso}
242	\seetitle[http://www.w3.org/TR/REC-xml]{Extensible Markup Language
243	(XML) 1.0}{The XML specification, published by the World
244	Wide Web Consortium (W3C), defines the syntax and
245	processor requirements for XML. References to additional
246	material on XML, including translations of the
247	specification, are available at
248	\url{http://www.w3.org/XML/}.}
249
250	\seetitle[http://www.python.org/topics/xml/]{Python and XML
251	Processing}{The Python XML Topic Guide provides a great
252	deal of information on using XML from Python and links to
253	other sources of information on XML.}
254
255	\seetitle[http://www.python.org/sigs/xml-sig/]{SIG for XML
256	Processing in Python}{The Python XML Special Interest
257	Group is developing substantial support for processing XML
258	from Python.}
259	\end{seealso}
260
261
262	\subsection{XML Namespaces \label{xml-namespace}}
263
264	This module has support for XML namespaces as defined in the XML
265	Namespaces proposed recommendation.
266	\indexii{XML}{namespaces}
267
268	Tag and attribute names that are defined in an XML namespace are
269	handled as if the name of the tag or element consisted of the
270	namespace (the URL that defines the namespace) followed by a
271	space and the name of the tag or attribute. For instance, the tag
272	\code{<html xmlns='http://www.w3.org/TR/REC-html40'>} is treated as if
273	the tag name was \code{'http://www.w3.org/TR/REC-html40 html'}, and
274	the tag \code{<html:a href='http://frob.com'>} inside the above
275	mentioned element is treated as if the tag name were
276	\code{'http://www.w3.org/TR/REC-html40 a'} and the attribute name as
277	if it were \code{'http://www.w3.org/TR/REC-html40 href'}.
278
279	An older draft of the XML Namespaces proposal is also recognized, but
280	triggers a warning.
281
282	\begin{seealso}
283	\seetitle[http://www.w3.org/TR/REC-xml-names/]{Namespaces in XML}{
284	This World Wide Web Consortium recommendation describes the
285	proper syntax and processing requirements for namespaces in
286	XML.}
287	\end{seealso}

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: vendor/python/2.5/Doc/lib/libxmllib.tex

Download in other formats: