1 | \section{\module{xml.parsers.expat} ---
|
---|
2 | Fast XML parsing using Expat}
|
---|
3 |
|
---|
4 | % Markup notes:
|
---|
5 | %
|
---|
6 | % Many of the attributes of the XMLParser objects are callbacks.
|
---|
7 | % Since signature information must be presented, these are described
|
---|
8 | % using the methoddesc environment. Since they are attributes which
|
---|
9 | % are set by client code, in-text references to these attributes
|
---|
10 | % should be marked using the \member macro and should not include the
|
---|
11 | % parentheses used when marking functions and methods.
|
---|
12 |
|
---|
13 | \declaremodule{standard}{xml.parsers.expat}
|
---|
14 | \modulesynopsis{An interface to the Expat non-validating XML parser.}
|
---|
15 | \moduleauthor{Paul Prescod}{paul@prescod.net}
|
---|
16 |
|
---|
17 | \versionadded{2.0}
|
---|
18 |
|
---|
19 | The \module{xml.parsers.expat} module is a Python interface to the
|
---|
20 | Expat\index{Expat} non-validating XML parser.
|
---|
21 | The module provides a single extension type, \class{xmlparser}, that
|
---|
22 | represents the current state of an XML parser. After an
|
---|
23 | \class{xmlparser} object has been created, various attributes of the object
|
---|
24 | can be set to handler functions. When an XML document is then fed to
|
---|
25 | the parser, the handler functions are called for the character data
|
---|
26 | and markup in the XML document.
|
---|
27 |
|
---|
28 | This module uses the \module{pyexpat}\refbimodindex{pyexpat} module to
|
---|
29 | provide access to the Expat parser. Direct use of the
|
---|
30 | \module{pyexpat} module is deprecated.
|
---|
31 |
|
---|
32 | This module provides one exception and one type object:
|
---|
33 |
|
---|
34 | \begin{excdesc}{ExpatError}
|
---|
35 | The exception raised when Expat reports an error. See section
|
---|
36 | \ref{expaterror-objects}, ``ExpatError Exceptions,'' for more
|
---|
37 | information on interpreting Expat errors.
|
---|
38 | \end{excdesc}
|
---|
39 |
|
---|
40 | \begin{excdesc}{error}
|
---|
41 | Alias for \exception{ExpatError}.
|
---|
42 | \end{excdesc}
|
---|
43 |
|
---|
44 | \begin{datadesc}{XMLParserType}
|
---|
45 | The type of the return values from the \function{ParserCreate()}
|
---|
46 | function.
|
---|
47 | \end{datadesc}
|
---|
48 |
|
---|
49 |
|
---|
50 | The \module{xml.parsers.expat} module contains two functions:
|
---|
51 |
|
---|
52 | \begin{funcdesc}{ErrorString}{errno}
|
---|
53 | Returns an explanatory string for a given error number \var{errno}.
|
---|
54 | \end{funcdesc}
|
---|
55 |
|
---|
56 | \begin{funcdesc}{ParserCreate}{\optional{encoding\optional{,
|
---|
57 | namespace_separator}}}
|
---|
58 | Creates and returns a new \class{xmlparser} object.
|
---|
59 | \var{encoding}, if specified, must be a string naming the encoding
|
---|
60 | used by the XML data. Expat doesn't support as many encodings as
|
---|
61 | Python does, and its repertoire of encodings can't be extended; it
|
---|
62 | supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
|
---|
63 | \var{encoding} is given it will override the implicit or explicit
|
---|
64 | encoding of the document.
|
---|
65 |
|
---|
66 | Expat can optionally do XML namespace processing for you, enabled by
|
---|
67 | providing a value for \var{namespace_separator}. The value must be a
|
---|
68 | one-character string; a \exception{ValueError} will be raised if the
|
---|
69 | string has an illegal length (\code{None} is considered the same as
|
---|
70 | omission). When namespace processing is enabled, element type names
|
---|
71 | and attribute names that belong to a namespace will be expanded. The
|
---|
72 | element name passed to the element handlers
|
---|
73 | \member{StartElementHandler} and \member{EndElementHandler}
|
---|
74 | will be the concatenation of the namespace URI, the namespace
|
---|
75 | separator character, and the local part of the name. If the namespace
|
---|
76 | separator is a zero byte (\code{chr(0)}) then the namespace URI and
|
---|
77 | the local part will be concatenated without any separator.
|
---|
78 |
|
---|
79 | For example, if \var{namespace_separator} is set to a space character
|
---|
80 | (\character{ }) and the following document is parsed:
|
---|
81 |
|
---|
82 | \begin{verbatim}
|
---|
83 | <?xml version="1.0"?>
|
---|
84 | <root xmlns = "http://default-namespace.org/"
|
---|
85 | xmlns:py = "http://www.python.org/ns/">
|
---|
86 | <py:elem1 />
|
---|
87 | <elem2 xmlns="" />
|
---|
88 | </root>
|
---|
89 | \end{verbatim}
|
---|
90 |
|
---|
91 | \member{StartElementHandler} will receive the following strings
|
---|
92 | for each element:
|
---|
93 |
|
---|
94 | \begin{verbatim}
|
---|
95 | http://default-namespace.org/ root
|
---|
96 | http://www.python.org/ns/ elem1
|
---|
97 | elem2
|
---|
98 | \end{verbatim}
|
---|
99 | \end{funcdesc}
|
---|
100 |
|
---|
101 |
|
---|
102 | \begin{seealso}
|
---|
103 | \seetitle[http://www.libexpat.org/]{The Expat XML Parser}
|
---|
104 | {Home page of the Expat project.}
|
---|
105 | \end{seealso}
|
---|
106 |
|
---|
107 |
|
---|
108 | \subsection{XMLParser Objects \label{xmlparser-objects}}
|
---|
109 |
|
---|
110 | \class{xmlparser} objects have the following methods:
|
---|
111 |
|
---|
112 | \begin{methoddesc}[xmlparser]{Parse}{data\optional{, isfinal}}
|
---|
113 | Parses the contents of the string \var{data}, calling the appropriate
|
---|
114 | handler functions to process the parsed data. \var{isfinal} must be
|
---|
115 | true on the final call to this method. \var{data} can be the empty
|
---|
116 | string at any time.
|
---|
117 | \end{methoddesc}
|
---|
118 |
|
---|
119 | \begin{methoddesc}[xmlparser]{ParseFile}{file}
|
---|
120 | Parse XML data reading from the object \var{file}. \var{file} only
|
---|
121 | needs to provide the \method{read(\var{nbytes})} method, returning the
|
---|
122 | empty string when there's no more data.
|
---|
123 | \end{methoddesc}
|
---|
124 |
|
---|
125 | \begin{methoddesc}[xmlparser]{SetBase}{base}
|
---|
126 | Sets the base to be used for resolving relative URIs in system
|
---|
127 | identifiers in declarations. Resolving relative identifiers is left
|
---|
128 | to the application: this value will be passed through as the
|
---|
129 | \var{base} argument to the \function{ExternalEntityRefHandler},
|
---|
130 | \function{NotationDeclHandler}, and
|
---|
131 | \function{UnparsedEntityDeclHandler} functions.
|
---|
132 | \end{methoddesc}
|
---|
133 |
|
---|
134 | \begin{methoddesc}[xmlparser]{GetBase}{}
|
---|
135 | Returns a string containing the base set by a previous call to
|
---|
136 | \method{SetBase()}, or \code{None} if
|
---|
137 | \method{SetBase()} hasn't been called.
|
---|
138 | \end{methoddesc}
|
---|
139 |
|
---|
140 | \begin{methoddesc}[xmlparser]{GetInputContext}{}
|
---|
141 | Returns the input data that generated the current event as a string.
|
---|
142 | The data is in the encoding of the entity which contains the text.
|
---|
143 | When called while an event handler is not active, the return value is
|
---|
144 | \code{None}.
|
---|
145 | \versionadded{2.1}
|
---|
146 | \end{methoddesc}
|
---|
147 |
|
---|
148 | \begin{methoddesc}[xmlparser]{ExternalEntityParserCreate}{context\optional{,
|
---|
149 | encoding}}
|
---|
150 | Create a ``child'' parser which can be used to parse an external
|
---|
151 | parsed entity referred to by content parsed by the parent parser. The
|
---|
152 | \var{context} parameter should be the string passed to the
|
---|
153 | \method{ExternalEntityRefHandler()} handler function, described below.
|
---|
154 | The child parser is created with the \member{ordered_attributes},
|
---|
155 | \member{returns_unicode} and \member{specified_attributes} set to the
|
---|
156 | values of this parser.
|
---|
157 | \end{methoddesc}
|
---|
158 |
|
---|
159 | \begin{methoddesc}[xmlparser]{UseForeignDTD}{\optional{flag}}
|
---|
160 | Calling this with a true value for \var{flag} (the default) will cause
|
---|
161 | Expat to call the \member{ExternalEntityRefHandler} with
|
---|
162 | \constant{None} for all arguments to allow an alternate DTD to be
|
---|
163 | loaded. If the document does not contain a document type declaration,
|
---|
164 | the \member{ExternalEntityRefHandler} will still be called, but the
|
---|
165 | \member{StartDoctypeDeclHandler} and \member{EndDoctypeDeclHandler}
|
---|
166 | will not be called.
|
---|
167 |
|
---|
168 | Passing a false value for \var{flag} will cancel a previous call that
|
---|
169 | passed a true value, but otherwise has no effect.
|
---|
170 |
|
---|
171 | This method can only be called before the \method{Parse()} or
|
---|
172 | \method{ParseFile()} methods are called; calling it after either of
|
---|
173 | those have been called causes \exception{ExpatError} to be raised with
|
---|
174 | the \member{code} attribute set to
|
---|
175 | \constant{errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING}.
|
---|
176 |
|
---|
177 | \versionadded{2.3}
|
---|
178 | \end{methoddesc}
|
---|
179 |
|
---|
180 |
|
---|
181 | \class{xmlparser} objects have the following attributes:
|
---|
182 |
|
---|
183 | \begin{memberdesc}[xmlparser]{buffer_size}
|
---|
184 | The size of the buffer used when \member{buffer_text} is true. This
|
---|
185 | value cannot be changed at this time.
|
---|
186 | \versionadded{2.3}
|
---|
187 | \end{memberdesc}
|
---|
188 |
|
---|
189 | \begin{memberdesc}[xmlparser]{buffer_text}
|
---|
190 | Setting this to true causes the \class{xmlparser} object to buffer
|
---|
191 | textual content returned by Expat to avoid multiple calls to the
|
---|
192 | \method{CharacterDataHandler()} callback whenever possible. This can
|
---|
193 | improve performance substantially since Expat normally breaks
|
---|
194 | character data into chunks at every line ending. This attribute is
|
---|
195 | false by default, and may be changed at any time.
|
---|
196 | \versionadded{2.3}
|
---|
197 | \end{memberdesc}
|
---|
198 |
|
---|
199 | \begin{memberdesc}[xmlparser]{buffer_used}
|
---|
200 | If \member{buffer_text} is enabled, the number of bytes stored in the
|
---|
201 | buffer. These bytes represent UTF-8 encoded text. This attribute has
|
---|
202 | no meaningful interpretation when \member{buffer_text} is false.
|
---|
203 | \versionadded{2.3}
|
---|
204 | \end{memberdesc}
|
---|
205 |
|
---|
206 | \begin{memberdesc}[xmlparser]{ordered_attributes}
|
---|
207 | Setting this attribute to a non-zero integer causes the attributes to
|
---|
208 | be reported as a list rather than a dictionary. The attributes are
|
---|
209 | presented in the order found in the document text. For each
|
---|
210 | attribute, two list entries are presented: the attribute name and the
|
---|
211 | attribute value. (Older versions of this module also used this
|
---|
212 | format.) By default, this attribute is false; it may be changed at
|
---|
213 | any time.
|
---|
214 | \versionadded{2.1}
|
---|
215 | \end{memberdesc}
|
---|
216 |
|
---|
217 | \begin{memberdesc}[xmlparser]{returns_unicode}
|
---|
218 | If this attribute is set to a non-zero integer, the handler functions
|
---|
219 | will be passed Unicode strings. If \member{returns_unicode} is
|
---|
220 | \constant{False}, 8-bit strings containing UTF-8 encoded data will be
|
---|
221 | passed to the handlers. This is \constant{True} by default when
|
---|
222 | Python is built with Unicode support.
|
---|
223 | \versionchanged[Can be changed at any time to affect the result
|
---|
224 | type]{1.6}
|
---|
225 | \end{memberdesc}
|
---|
226 |
|
---|
227 | \begin{memberdesc}[xmlparser]{specified_attributes}
|
---|
228 | If set to a non-zero integer, the parser will report only those
|
---|
229 | attributes which were specified in the document instance and not those
|
---|
230 | which were derived from attribute declarations. Applications which
|
---|
231 | set this need to be especially careful to use what additional
|
---|
232 | information is available from the declarations as needed to comply
|
---|
233 | with the standards for the behavior of XML processors. By default,
|
---|
234 | this attribute is false; it may be changed at any time.
|
---|
235 | \versionadded{2.1}
|
---|
236 | \end{memberdesc}
|
---|
237 |
|
---|
238 | The following attributes contain values relating to the most recent
|
---|
239 | error encountered by an \class{xmlparser} object, and will only have
|
---|
240 | correct values once a call to \method{Parse()} or \method{ParseFile()}
|
---|
241 | has raised a \exception{xml.parsers.expat.ExpatError} exception.
|
---|
242 |
|
---|
243 | \begin{memberdesc}[xmlparser]{ErrorByteIndex}
|
---|
244 | Byte index at which an error occurred.
|
---|
245 | \end{memberdesc}
|
---|
246 |
|
---|
247 | \begin{memberdesc}[xmlparser]{ErrorCode}
|
---|
248 | Numeric code specifying the problem. This value can be passed to the
|
---|
249 | \function{ErrorString()} function, or compared to one of the constants
|
---|
250 | defined in the \code{errors} object.
|
---|
251 | \end{memberdesc}
|
---|
252 |
|
---|
253 | \begin{memberdesc}[xmlparser]{ErrorColumnNumber}
|
---|
254 | Column number at which an error occurred.
|
---|
255 | \end{memberdesc}
|
---|
256 |
|
---|
257 | \begin{memberdesc}[xmlparser]{ErrorLineNumber}
|
---|
258 | Line number at which an error occurred.
|
---|
259 | \end{memberdesc}
|
---|
260 |
|
---|
261 | The following attributes contain values relating to the current parse
|
---|
262 | location in an \class{xmlparser} object. During a callback reporting
|
---|
263 | a parse event they indicate the location of the first of the sequence
|
---|
264 | of characters that generated the event. When called outside of a
|
---|
265 | callback, the position indicated will be just past the last parse
|
---|
266 | event (regardless of whether there was an associated callback).
|
---|
267 | \versionadded{2.4}
|
---|
268 |
|
---|
269 | \begin{memberdesc}[xmlparser]{CurrentByteIndex}
|
---|
270 | Current byte index in the parser input.
|
---|
271 | \end{memberdesc}
|
---|
272 |
|
---|
273 | \begin{memberdesc}[xmlparser]{CurrentColumnNumber}
|
---|
274 | Current column number in the parser input.
|
---|
275 | \end{memberdesc}
|
---|
276 |
|
---|
277 | \begin{memberdesc}[xmlparser]{CurrentLineNumber}
|
---|
278 | Current line number in the parser input.
|
---|
279 | \end{memberdesc}
|
---|
280 |
|
---|
281 | Here is the list of handlers that can be set. To set a handler on an
|
---|
282 | \class{xmlparser} object \var{o}, use
|
---|
283 | \code{\var{o}.\var{handlername} = \var{func}}. \var{handlername} must
|
---|
284 | be taken from the following list, and \var{func} must be a callable
|
---|
285 | object accepting the correct number of arguments. The arguments are
|
---|
286 | all strings, unless otherwise stated.
|
---|
287 |
|
---|
288 | \begin{methoddesc}[xmlparser]{XmlDeclHandler}{version, encoding, standalone}
|
---|
289 | Called when the XML declaration is parsed. The XML declaration is the
|
---|
290 | (optional) declaration of the applicable version of the XML
|
---|
291 | recommendation, the encoding of the document text, and an optional
|
---|
292 | ``standalone'' declaration. \var{version} and \var{encoding} will be
|
---|
293 | strings of the type dictated by the \member{returns_unicode}
|
---|
294 | attribute, and \var{standalone} will be \code{1} if the document is
|
---|
295 | declared standalone, \code{0} if it is declared not to be standalone,
|
---|
296 | or \code{-1} if the standalone clause was omitted.
|
---|
297 | This is only available with Expat version 1.95.0 or newer.
|
---|
298 | \versionadded{2.1}
|
---|
299 | \end{methoddesc}
|
---|
300 |
|
---|
301 | \begin{methoddesc}[xmlparser]{StartDoctypeDeclHandler}{doctypeName,
|
---|
302 | systemId, publicId,
|
---|
303 | has_internal_subset}
|
---|
304 | Called when Expat begins parsing the document type declaration
|
---|
305 | (\code{<!DOCTYPE \ldots}). The \var{doctypeName} is provided exactly
|
---|
306 | as presented. The \var{systemId} and \var{publicId} parameters give
|
---|
307 | the system and public identifiers if specified, or \code{None} if
|
---|
308 | omitted. \var{has_internal_subset} will be true if the document
|
---|
309 | contains and internal document declaration subset.
|
---|
310 | This requires Expat version 1.2 or newer.
|
---|
311 | \end{methoddesc}
|
---|
312 |
|
---|
313 | \begin{methoddesc}[xmlparser]{EndDoctypeDeclHandler}{}
|
---|
314 | Called when Expat is done parsing the document type declaration.
|
---|
315 | This requires Expat version 1.2 or newer.
|
---|
316 | \end{methoddesc}
|
---|
317 |
|
---|
318 | \begin{methoddesc}[xmlparser]{ElementDeclHandler}{name, model}
|
---|
319 | Called once for each element type declaration. \var{name} is the name
|
---|
320 | of the element type, and \var{model} is a representation of the
|
---|
321 | content model.
|
---|
322 | \end{methoddesc}
|
---|
323 |
|
---|
324 | \begin{methoddesc}[xmlparser]{AttlistDeclHandler}{elname, attname,
|
---|
325 | type, default, required}
|
---|
326 | Called for each declared attribute for an element type. If an
|
---|
327 | attribute list declaration declares three attributes, this handler is
|
---|
328 | called three times, once for each attribute. \var{elname} is the name
|
---|
329 | of the element to which the declaration applies and \var{attname} is
|
---|
330 | the name of the attribute declared. The attribute type is a string
|
---|
331 | passed as \var{type}; the possible values are \code{'CDATA'},
|
---|
332 | \code{'ID'}, \code{'IDREF'}, ...
|
---|
333 | \var{default} gives the default value for the attribute used when the
|
---|
334 | attribute is not specified by the document instance, or \code{None} if
|
---|
335 | there is no default value (\code{\#IMPLIED} values). If the attribute
|
---|
336 | is required to be given in the document instance, \var{required} will
|
---|
337 | be true.
|
---|
338 | This requires Expat version 1.95.0 or newer.
|
---|
339 | \end{methoddesc}
|
---|
340 |
|
---|
341 | \begin{methoddesc}[xmlparser]{StartElementHandler}{name, attributes}
|
---|
342 | Called for the start of every element. \var{name} is a string
|
---|
343 | containing the element name, and \var{attributes} is a dictionary
|
---|
344 | mapping attribute names to their values.
|
---|
345 | \end{methoddesc}
|
---|
346 |
|
---|
347 | \begin{methoddesc}[xmlparser]{EndElementHandler}{name}
|
---|
348 | Called for the end of every element.
|
---|
349 | \end{methoddesc}
|
---|
350 |
|
---|
351 | \begin{methoddesc}[xmlparser]{ProcessingInstructionHandler}{target, data}
|
---|
352 | Called for every processing instruction.
|
---|
353 | \end{methoddesc}
|
---|
354 |
|
---|
355 | \begin{methoddesc}[xmlparser]{CharacterDataHandler}{data}
|
---|
356 | Called for character data. This will be called for normal character
|
---|
357 | data, CDATA marked content, and ignorable whitespace. Applications
|
---|
358 | which must distinguish these cases can use the
|
---|
359 | \member{StartCdataSectionHandler}, \member{EndCdataSectionHandler},
|
---|
360 | and \member{ElementDeclHandler} callbacks to collect the required
|
---|
361 | information.
|
---|
362 | \end{methoddesc}
|
---|
363 |
|
---|
364 | \begin{methoddesc}[xmlparser]{UnparsedEntityDeclHandler}{entityName, base,
|
---|
365 | systemId, publicId,
|
---|
366 | notationName}
|
---|
367 | Called for unparsed (NDATA) entity declarations. This is only present
|
---|
368 | for version 1.2 of the Expat library; for more recent versions, use
|
---|
369 | \member{EntityDeclHandler} instead. (The underlying function in the
|
---|
370 | Expat library has been declared obsolete.)
|
---|
371 | \end{methoddesc}
|
---|
372 |
|
---|
373 | \begin{methoddesc}[xmlparser]{EntityDeclHandler}{entityName,
|
---|
374 | is_parameter_entity, value,
|
---|
375 | base, systemId,
|
---|
376 | publicId,
|
---|
377 | notationName}
|
---|
378 | Called for all entity declarations. For parameter and internal
|
---|
379 | entities, \var{value} will be a string giving the declared contents
|
---|
380 | of the entity; this will be \code{None} for external entities. The
|
---|
381 | \var{notationName} parameter will be \code{None} for parsed entities,
|
---|
382 | and the name of the notation for unparsed entities.
|
---|
383 | \var{is_parameter_entity} will be true if the entity is a parameter
|
---|
384 | entity or false for general entities (most applications only need to
|
---|
385 | be concerned with general entities).
|
---|
386 | This is only available starting with version 1.95.0 of the Expat
|
---|
387 | library.
|
---|
388 | \versionadded{2.1}
|
---|
389 | \end{methoddesc}
|
---|
390 |
|
---|
391 | \begin{methoddesc}[xmlparser]{NotationDeclHandler}{notationName, base,
|
---|
392 | systemId, publicId}
|
---|
393 | Called for notation declarations. \var{notationName}, \var{base}, and
|
---|
394 | \var{systemId}, and \var{publicId} are strings if given. If the
|
---|
395 | public identifier is omitted, \var{publicId} will be \code{None}.
|
---|
396 | \end{methoddesc}
|
---|
397 |
|
---|
398 | \begin{methoddesc}[xmlparser]{StartNamespaceDeclHandler}{prefix, uri}
|
---|
399 | Called when an element contains a namespace declaration. Namespace
|
---|
400 | declarations are processed before the \member{StartElementHandler} is
|
---|
401 | called for the element on which declarations are placed.
|
---|
402 | \end{methoddesc}
|
---|
403 |
|
---|
404 | \begin{methoddesc}[xmlparser]{EndNamespaceDeclHandler}{prefix}
|
---|
405 | Called when the closing tag is reached for an element
|
---|
406 | that contained a namespace declaration. This is called once for each
|
---|
407 | namespace declaration on the element in the reverse of the order for
|
---|
408 | which the \member{StartNamespaceDeclHandler} was called to indicate
|
---|
409 | the start of each namespace declaration's scope. Calls to this
|
---|
410 | handler are made after the corresponding \member{EndElementHandler}
|
---|
411 | for the end of the element.
|
---|
412 | \end{methoddesc}
|
---|
413 |
|
---|
414 | \begin{methoddesc}[xmlparser]{CommentHandler}{data}
|
---|
415 | Called for comments. \var{data} is the text of the comment, excluding
|
---|
416 | the leading `\code{<!-}\code{-}' and trailing `\code{-}\code{->}'.
|
---|
417 | \end{methoddesc}
|
---|
418 |
|
---|
419 | \begin{methoddesc}[xmlparser]{StartCdataSectionHandler}{}
|
---|
420 | Called at the start of a CDATA section. This and
|
---|
421 | \member{EndCdataSectionHandler} are needed to be able to identify
|
---|
422 | the syntactical start and end for CDATA sections.
|
---|
423 | \end{methoddesc}
|
---|
424 |
|
---|
425 | \begin{methoddesc}[xmlparser]{EndCdataSectionHandler}{}
|
---|
426 | Called at the end of a CDATA section.
|
---|
427 | \end{methoddesc}
|
---|
428 |
|
---|
429 | \begin{methoddesc}[xmlparser]{DefaultHandler}{data}
|
---|
430 | Called for any characters in the XML document for
|
---|
431 | which no applicable handler has been specified. This means
|
---|
432 | characters that are part of a construct which could be reported, but
|
---|
433 | for which no handler has been supplied.
|
---|
434 | \end{methoddesc}
|
---|
435 |
|
---|
436 | \begin{methoddesc}[xmlparser]{DefaultHandlerExpand}{data}
|
---|
437 | This is the same as the \function{DefaultHandler},
|
---|
438 | but doesn't inhibit expansion of internal entities.
|
---|
439 | The entity reference will not be passed to the default handler.
|
---|
440 | \end{methoddesc}
|
---|
441 |
|
---|
442 | \begin{methoddesc}[xmlparser]{NotStandaloneHandler}{} Called if the
|
---|
443 | XML document hasn't been declared as being a standalone document.
|
---|
444 | This happens when there is an external subset or a reference to a
|
---|
445 | parameter entity, but the XML declaration does not set standalone to
|
---|
446 | \code{yes} in an XML declaration. If this handler returns \code{0},
|
---|
447 | then the parser will throw an \constant{XML_ERROR_NOT_STANDALONE}
|
---|
448 | error. If this handler is not set, no exception is raised by the
|
---|
449 | parser for this condition.
|
---|
450 | \end{methoddesc}
|
---|
451 |
|
---|
452 | \begin{methoddesc}[xmlparser]{ExternalEntityRefHandler}{context, base,
|
---|
453 | systemId, publicId}
|
---|
454 | Called for references to external entities. \var{base} is the current
|
---|
455 | base, as set by a previous call to \method{SetBase()}. The public and
|
---|
456 | system identifiers, \var{systemId} and \var{publicId}, are strings if
|
---|
457 | given; if the public identifier is not given, \var{publicId} will be
|
---|
458 | \code{None}. The \var{context} value is opaque and should only be
|
---|
459 | used as described below.
|
---|
460 |
|
---|
461 | For external entities to be parsed, this handler must be implemented.
|
---|
462 | It is responsible for creating the sub-parser using
|
---|
463 | \code{ExternalEntityParserCreate(\var{context})}, initializing it with
|
---|
464 | the appropriate callbacks, and parsing the entity. This handler
|
---|
465 | should return an integer; if it returns \code{0}, the parser will
|
---|
466 | throw an \constant{XML_ERROR_EXTERNAL_ENTITY_HANDLING} error,
|
---|
467 | otherwise parsing will continue.
|
---|
468 |
|
---|
469 | If this handler is not provided, external entities are reported by the
|
---|
470 | \member{DefaultHandler} callback, if provided.
|
---|
471 | \end{methoddesc}
|
---|
472 |
|
---|
473 |
|
---|
474 | \subsection{ExpatError Exceptions \label{expaterror-objects}}
|
---|
475 | \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
|
---|
476 |
|
---|
477 | \exception{ExpatError} exceptions have a number of interesting
|
---|
478 | attributes:
|
---|
479 |
|
---|
480 | \begin{memberdesc}[ExpatError]{code}
|
---|
481 | Expat's internal error number for the specific error. This will
|
---|
482 | match one of the constants defined in the \code{errors} object from
|
---|
483 | this module.
|
---|
484 | \versionadded{2.1}
|
---|
485 | \end{memberdesc}
|
---|
486 |
|
---|
487 | \begin{memberdesc}[ExpatError]{lineno}
|
---|
488 | Line number on which the error was detected. The first line is
|
---|
489 | numbered \code{1}.
|
---|
490 | \versionadded{2.1}
|
---|
491 | \end{memberdesc}
|
---|
492 |
|
---|
493 | \begin{memberdesc}[ExpatError]{offset}
|
---|
494 | Character offset into the line where the error occurred. The first
|
---|
495 | column is numbered \code{0}.
|
---|
496 | \versionadded{2.1}
|
---|
497 | \end{memberdesc}
|
---|
498 |
|
---|
499 |
|
---|
500 | \subsection{Example \label{expat-example}}
|
---|
501 |
|
---|
502 | The following program defines three handlers that just print out their
|
---|
503 | arguments.
|
---|
504 |
|
---|
505 | \begin{verbatim}
|
---|
506 | import xml.parsers.expat
|
---|
507 |
|
---|
508 | # 3 handler functions
|
---|
509 | def start_element(name, attrs):
|
---|
510 | print 'Start element:', name, attrs
|
---|
511 | def end_element(name):
|
---|
512 | print 'End element:', name
|
---|
513 | def char_data(data):
|
---|
514 | print 'Character data:', repr(data)
|
---|
515 |
|
---|
516 | p = xml.parsers.expat.ParserCreate()
|
---|
517 |
|
---|
518 | p.StartElementHandler = start_element
|
---|
519 | p.EndElementHandler = end_element
|
---|
520 | p.CharacterDataHandler = char_data
|
---|
521 |
|
---|
522 | p.Parse("""<?xml version="1.0"?>
|
---|
523 | <parent id="top"><child1 name="paul">Text goes here</child1>
|
---|
524 | <child2 name="fred">More text</child2>
|
---|
525 | </parent>""", 1)
|
---|
526 | \end{verbatim}
|
---|
527 |
|
---|
528 | The output from this program is:
|
---|
529 |
|
---|
530 | \begin{verbatim}
|
---|
531 | Start element: parent {'id': 'top'}
|
---|
532 | Start element: child1 {'name': 'paul'}
|
---|
533 | Character data: 'Text goes here'
|
---|
534 | End element: child1
|
---|
535 | Character data: '\n'
|
---|
536 | Start element: child2 {'name': 'fred'}
|
---|
537 | Character data: 'More text'
|
---|
538 | End element: child2
|
---|
539 | Character data: '\n'
|
---|
540 | End element: parent
|
---|
541 | \end{verbatim}
|
---|
542 |
|
---|
543 |
|
---|
544 | \subsection{Content Model Descriptions \label{expat-content-models}}
|
---|
545 | \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
|
---|
546 |
|
---|
547 | Content modules are described using nested tuples. Each tuple
|
---|
548 | contains four values: the type, the quantifier, the name, and a tuple
|
---|
549 | of children. Children are simply additional content module
|
---|
550 | descriptions.
|
---|
551 |
|
---|
552 | The values of the first two fields are constants defined in the
|
---|
553 | \code{model} object of the \module{xml.parsers.expat} module. These
|
---|
554 | constants can be collected in two groups: the model type group and the
|
---|
555 | quantifier group.
|
---|
556 |
|
---|
557 | The constants in the model type group are:
|
---|
558 |
|
---|
559 | \begin{datadescni}{XML_CTYPE_ANY}
|
---|
560 | The element named by the model name was declared to have a content
|
---|
561 | model of \code{ANY}.
|
---|
562 | \end{datadescni}
|
---|
563 |
|
---|
564 | \begin{datadescni}{XML_CTYPE_CHOICE}
|
---|
565 | The named element allows a choice from a number of options; this is
|
---|
566 | used for content models such as \code{(A | B | C)}.
|
---|
567 | \end{datadescni}
|
---|
568 |
|
---|
569 | \begin{datadescni}{XML_CTYPE_EMPTY}
|
---|
570 | Elements which are declared to be \code{EMPTY} have this model type.
|
---|
571 | \end{datadescni}
|
---|
572 |
|
---|
573 | \begin{datadescni}{XML_CTYPE_MIXED}
|
---|
574 | \end{datadescni}
|
---|
575 |
|
---|
576 | \begin{datadescni}{XML_CTYPE_NAME}
|
---|
577 | \end{datadescni}
|
---|
578 |
|
---|
579 | \begin{datadescni}{XML_CTYPE_SEQ}
|
---|
580 | Models which represent a series of models which follow one after the
|
---|
581 | other are indicated with this model type. This is used for models
|
---|
582 | such as \code{(A, B, C)}.
|
---|
583 | \end{datadescni}
|
---|
584 |
|
---|
585 |
|
---|
586 | The constants in the quantifier group are:
|
---|
587 |
|
---|
588 | \begin{datadescni}{XML_CQUANT_NONE}
|
---|
589 | No modifier is given, so it can appear exactly once, as for \code{A}.
|
---|
590 | \end{datadescni}
|
---|
591 |
|
---|
592 | \begin{datadescni}{XML_CQUANT_OPT}
|
---|
593 | The model is optional: it can appear once or not at all, as for
|
---|
594 | \code{A?}.
|
---|
595 | \end{datadescni}
|
---|
596 |
|
---|
597 | \begin{datadescni}{XML_CQUANT_PLUS}
|
---|
598 | The model must occur one or more times (like \code{A+}).
|
---|
599 | \end{datadescni}
|
---|
600 |
|
---|
601 | \begin{datadescni}{XML_CQUANT_REP}
|
---|
602 | The model must occur zero or more times, as for \code{A*}.
|
---|
603 | \end{datadescni}
|
---|
604 |
|
---|
605 |
|
---|
606 | \subsection{Expat error constants \label{expat-errors}}
|
---|
607 |
|
---|
608 | The following constants are provided in the \code{errors} object of
|
---|
609 | the \refmodule{xml.parsers.expat} module. These constants are useful
|
---|
610 | in interpreting some of the attributes of the \exception{ExpatError}
|
---|
611 | exception objects raised when an error has occurred.
|
---|
612 |
|
---|
613 | The \code{errors} object has the following attributes:
|
---|
614 |
|
---|
615 | \begin{datadescni}{XML_ERROR_ASYNC_ENTITY}
|
---|
616 | \end{datadescni}
|
---|
617 |
|
---|
618 | \begin{datadescni}{XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF}
|
---|
619 | An entity reference in an attribute value referred to an external
|
---|
620 | entity instead of an internal entity.
|
---|
621 | \end{datadescni}
|
---|
622 |
|
---|
623 | \begin{datadescni}{XML_ERROR_BAD_CHAR_REF}
|
---|
624 | A character reference referred to a character which is illegal in XML
|
---|
625 | (for example, character \code{0}, or `\code{\&\#0;}').
|
---|
626 | \end{datadescni}
|
---|
627 |
|
---|
628 | \begin{datadescni}{XML_ERROR_BINARY_ENTITY_REF}
|
---|
629 | An entity reference referred to an entity which was declared with a
|
---|
630 | notation, so cannot be parsed.
|
---|
631 | \end{datadescni}
|
---|
632 |
|
---|
633 | \begin{datadescni}{XML_ERROR_DUPLICATE_ATTRIBUTE}
|
---|
634 | An attribute was used more than once in a start tag.
|
---|
635 | \end{datadescni}
|
---|
636 |
|
---|
637 | \begin{datadescni}{XML_ERROR_INCORRECT_ENCODING}
|
---|
638 | \end{datadescni}
|
---|
639 |
|
---|
640 | \begin{datadescni}{XML_ERROR_INVALID_TOKEN}
|
---|
641 | Raised when an input byte could not properly be assigned to a
|
---|
642 | character; for example, a NUL byte (value \code{0}) in a UTF-8 input
|
---|
643 | stream.
|
---|
644 | \end{datadescni}
|
---|
645 |
|
---|
646 | \begin{datadescni}{XML_ERROR_JUNK_AFTER_DOC_ELEMENT}
|
---|
647 | Something other than whitespace occurred after the document element.
|
---|
648 | \end{datadescni}
|
---|
649 |
|
---|
650 | \begin{datadescni}{XML_ERROR_MISPLACED_XML_PI}
|
---|
651 | An XML declaration was found somewhere other than the start of the
|
---|
652 | input data.
|
---|
653 | \end{datadescni}
|
---|
654 |
|
---|
655 | \begin{datadescni}{XML_ERROR_NO_ELEMENTS}
|
---|
656 | The document contains no elements (XML requires all documents to
|
---|
657 | contain exactly one top-level element)..
|
---|
658 | \end{datadescni}
|
---|
659 |
|
---|
660 | \begin{datadescni}{XML_ERROR_NO_MEMORY}
|
---|
661 | Expat was not able to allocate memory internally.
|
---|
662 | \end{datadescni}
|
---|
663 |
|
---|
664 | \begin{datadescni}{XML_ERROR_PARAM_ENTITY_REF}
|
---|
665 | A parameter entity reference was found where it was not allowed.
|
---|
666 | \end{datadescni}
|
---|
667 |
|
---|
668 | \begin{datadescni}{XML_ERROR_PARTIAL_CHAR}
|
---|
669 | An incomplete character was found in the input.
|
---|
670 | \end{datadescni}
|
---|
671 |
|
---|
672 | \begin{datadescni}{XML_ERROR_RECURSIVE_ENTITY_REF}
|
---|
673 | An entity reference contained another reference to the same entity;
|
---|
674 | possibly via a different name, and possibly indirectly.
|
---|
675 | \end{datadescni}
|
---|
676 |
|
---|
677 | \begin{datadescni}{XML_ERROR_SYNTAX}
|
---|
678 | Some unspecified syntax error was encountered.
|
---|
679 | \end{datadescni}
|
---|
680 |
|
---|
681 | \begin{datadescni}{XML_ERROR_TAG_MISMATCH}
|
---|
682 | An end tag did not match the innermost open start tag.
|
---|
683 | \end{datadescni}
|
---|
684 |
|
---|
685 | \begin{datadescni}{XML_ERROR_UNCLOSED_TOKEN}
|
---|
686 | Some token (such as a start tag) was not closed before the end of the
|
---|
687 | stream or the next token was encountered.
|
---|
688 | \end{datadescni}
|
---|
689 |
|
---|
690 | \begin{datadescni}{XML_ERROR_UNDEFINED_ENTITY}
|
---|
691 | A reference was made to a entity which was not defined.
|
---|
692 | \end{datadescni}
|
---|
693 |
|
---|
694 | \begin{datadescni}{XML_ERROR_UNKNOWN_ENCODING}
|
---|
695 | The document encoding is not supported by Expat.
|
---|
696 | \end{datadescni}
|
---|
697 |
|
---|
698 | \begin{datadescni}{XML_ERROR_UNCLOSED_CDATA_SECTION}
|
---|
699 | A CDATA marked section was not closed.
|
---|
700 | \end{datadescni}
|
---|
701 |
|
---|
702 | \begin{datadescni}{XML_ERROR_EXTERNAL_ENTITY_HANDLING}
|
---|
703 | \end{datadescni}
|
---|
704 |
|
---|
705 | \begin{datadescni}{XML_ERROR_NOT_STANDALONE}
|
---|
706 | The parser determined that the document was not ``standalone'' though
|
---|
707 | it declared itself to be in the XML declaration, and the
|
---|
708 | \member{NotStandaloneHandler} was set and returned \code{0}.
|
---|
709 | \end{datadescni}
|
---|
710 |
|
---|
711 | \begin{datadescni}{XML_ERROR_UNEXPECTED_STATE}
|
---|
712 | \end{datadescni}
|
---|
713 |
|
---|
714 | \begin{datadescni}{XML_ERROR_ENTITY_DECLARED_IN_PE}
|
---|
715 | \end{datadescni}
|
---|
716 |
|
---|
717 | \begin{datadescni}{XML_ERROR_FEATURE_REQUIRES_XML_DTD}
|
---|
718 | An operation was requested that requires DTD support to be compiled
|
---|
719 | in, but Expat was configured without DTD support. This should never
|
---|
720 | be reported by a standard build of the \module{xml.parsers.expat}
|
---|
721 | module.
|
---|
722 | \end{datadescni}
|
---|
723 |
|
---|
724 | \begin{datadescni}{XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING}
|
---|
725 | A behavioral change was requested after parsing started that can only
|
---|
726 | be changed before parsing has started. This is (currently) only
|
---|
727 | raised by \method{UseForeignDTD()}.
|
---|
728 | \end{datadescni}
|
---|
729 |
|
---|
730 | \begin{datadescni}{XML_ERROR_UNBOUND_PREFIX}
|
---|
731 | An undeclared prefix was found when namespace processing was enabled.
|
---|
732 | \end{datadescni}
|
---|
733 |
|
---|
734 | \begin{datadescni}{XML_ERROR_UNDECLARING_PREFIX}
|
---|
735 | The document attempted to remove the namespace declaration associated
|
---|
736 | with a prefix.
|
---|
737 | \end{datadescni}
|
---|
738 |
|
---|
739 | \begin{datadescni}{XML_ERROR_INCOMPLETE_PE}
|
---|
740 | A parameter entity contained incomplete markup.
|
---|
741 | \end{datadescni}
|
---|
742 |
|
---|
743 | \begin{datadescni}{XML_ERROR_XML_DECL}
|
---|
744 | The document contained no document element at all.
|
---|
745 | \end{datadescni}
|
---|
746 |
|
---|
747 | \begin{datadescni}{XML_ERROR_TEXT_DECL}
|
---|
748 | There was an error parsing a text declaration in an external entity.
|
---|
749 | \end{datadescni}
|
---|
750 |
|
---|
751 | \begin{datadescni}{XML_ERROR_PUBLICID}
|
---|
752 | Characters were found in the public id that are not allowed.
|
---|
753 | \end{datadescni}
|
---|
754 |
|
---|
755 | \begin{datadescni}{XML_ERROR_SUSPENDED}
|
---|
756 | The requested operation was made on a suspended parser, but isn't
|
---|
757 | allowed. This includes attempts to provide additional input or to
|
---|
758 | stop the parser.
|
---|
759 | \end{datadescni}
|
---|
760 |
|
---|
761 | \begin{datadescni}{XML_ERROR_NOT_SUSPENDED}
|
---|
762 | An attempt to resume the parser was made when the parser had not been
|
---|
763 | suspended.
|
---|
764 | \end{datadescni}
|
---|
765 |
|
---|
766 | \begin{datadescni}{XML_ERROR_ABORTED}
|
---|
767 | This should not be reported to Python applications.
|
---|
768 | \end{datadescni}
|
---|
769 |
|
---|
770 | \begin{datadescni}{XML_ERROR_FINISHED}
|
---|
771 | The requested operation was made on a parser which was finished
|
---|
772 | parsing input, but isn't allowed. This includes attempts to provide
|
---|
773 | additional input or to stop the parser.
|
---|
774 | \end{datadescni}
|
---|
775 |
|
---|
776 | \begin{datadescni}{XML_ERROR_SUSPEND_PE}
|
---|
777 | \end{datadescni}
|
---|