[2] | 1 |
|
---|
| 2 | :mod:`xml.sax.xmlreader` --- Interface for XML parsers
|
---|
| 3 | ======================================================
|
---|
| 4 |
|
---|
| 5 | .. module:: xml.sax.xmlreader
|
---|
| 6 | :synopsis: Interface which SAX-compliant XML parsers must implement.
|
---|
| 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
|
---|
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
|
---|
| 9 |
|
---|
| 10 |
|
---|
| 11 | .. versionadded:: 2.0
|
---|
| 12 |
|
---|
| 13 | SAX parsers implement the :class:`XMLReader` interface. They are implemented in
|
---|
| 14 | a Python module, which must provide a function :func:`create_parser`. This
|
---|
| 15 | function is invoked by :func:`xml.sax.make_parser` with no arguments to create
|
---|
| 16 | a new parser object.
|
---|
| 17 |
|
---|
| 18 |
|
---|
| 19 | .. class:: XMLReader()
|
---|
| 20 |
|
---|
| 21 | Base class which can be inherited by SAX parsers.
|
---|
| 22 |
|
---|
| 23 |
|
---|
| 24 | .. class:: IncrementalParser()
|
---|
| 25 |
|
---|
| 26 | In some cases, it is desirable not to parse an input source at once, but to feed
|
---|
| 27 | chunks of the document as they get available. Note that the reader will normally
|
---|
| 28 | not read the entire file, but read it in chunks as well; still :meth:`parse`
|
---|
| 29 | won't return until the entire document is processed. So these interfaces should
|
---|
| 30 | be used if the blocking behaviour of :meth:`parse` is not desirable.
|
---|
| 31 |
|
---|
| 32 | When the parser is instantiated it is ready to begin accepting data from the
|
---|
| 33 | feed method immediately. After parsing has been finished with a call to close
|
---|
| 34 | the reset method must be called to make the parser ready to accept new data,
|
---|
| 35 | either from feed or using the parse method.
|
---|
| 36 |
|
---|
| 37 | Note that these methods must *not* be called during parsing, that is, after
|
---|
| 38 | parse has been called and before it returns.
|
---|
| 39 |
|
---|
| 40 | By default, the class also implements the parse method of the XMLReader
|
---|
| 41 | interface using the feed, close and reset methods of the IncrementalParser
|
---|
| 42 | interface as a convenience to SAX 2.0 driver writers.
|
---|
| 43 |
|
---|
| 44 |
|
---|
| 45 | .. class:: Locator()
|
---|
| 46 |
|
---|
| 47 | Interface for associating a SAX event with a document location. A locator object
|
---|
| 48 | will return valid results only during calls to DocumentHandler methods; at any
|
---|
| 49 | other time, the results are unpredictable. If information is not available,
|
---|
| 50 | methods may return ``None``.
|
---|
| 51 |
|
---|
| 52 |
|
---|
| 53 | .. class:: InputSource([systemId])
|
---|
| 54 |
|
---|
| 55 | Encapsulation of the information needed by the :class:`XMLReader` to read
|
---|
| 56 | entities.
|
---|
| 57 |
|
---|
| 58 | This class may include information about the public identifier, system
|
---|
| 59 | identifier, byte stream (possibly with character encoding information) and/or
|
---|
| 60 | the character stream of an entity.
|
---|
| 61 |
|
---|
| 62 | Applications will create objects of this class for use in the
|
---|
| 63 | :meth:`XMLReader.parse` method and for returning from
|
---|
| 64 | EntityResolver.resolveEntity.
|
---|
| 65 |
|
---|
| 66 | An :class:`InputSource` belongs to the application, the :class:`XMLReader` is
|
---|
| 67 | not allowed to modify :class:`InputSource` objects passed to it from the
|
---|
| 68 | application, although it may make copies and modify those.
|
---|
| 69 |
|
---|
| 70 |
|
---|
| 71 | .. class:: AttributesImpl(attrs)
|
---|
| 72 |
|
---|
| 73 | This is an implementation of the :class:`Attributes` interface (see section
|
---|
| 74 | :ref:`attributes-objects`). This is a dictionary-like object which
|
---|
| 75 | represents the element attributes in a :meth:`startElement` call. In addition
|
---|
| 76 | to the most useful dictionary operations, it supports a number of other
|
---|
| 77 | methods as described by the interface. Objects of this class should be
|
---|
| 78 | instantiated by readers; *attrs* must be a dictionary-like object containing
|
---|
| 79 | a mapping from attribute names to attribute values.
|
---|
| 80 |
|
---|
| 81 |
|
---|
| 82 | .. class:: AttributesNSImpl(attrs, qnames)
|
---|
| 83 |
|
---|
| 84 | Namespace-aware variant of :class:`AttributesImpl`, which will be passed to
|
---|
| 85 | :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but
|
---|
| 86 | understands attribute names as two-tuples of *namespaceURI* and
|
---|
| 87 | *localname*. In addition, it provides a number of methods expecting qualified
|
---|
| 88 | names as they appear in the original document. This class implements the
|
---|
| 89 | :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`).
|
---|
| 90 |
|
---|
| 91 |
|
---|
| 92 | .. _xmlreader-objects:
|
---|
| 93 |
|
---|
| 94 | XMLReader Objects
|
---|
| 95 | -----------------
|
---|
| 96 |
|
---|
| 97 | The :class:`XMLReader` interface supports the following methods:
|
---|
| 98 |
|
---|
| 99 |
|
---|
| 100 | .. method:: XMLReader.parse(source)
|
---|
| 101 |
|
---|
| 102 | Process an input source, producing SAX events. The *source* object can be a
|
---|
| 103 | system identifier (a string identifying the input source -- typically a file
|
---|
| 104 | name or an URL), a file-like object, or an :class:`InputSource` object. When
|
---|
| 105 | :meth:`parse` returns, the input is completely processed, and the parser object
|
---|
| 106 | can be discarded or reset. As a limitation, the current implementation only
|
---|
| 107 | accepts byte streams; processing of character streams is for further study.
|
---|
| 108 |
|
---|
| 109 |
|
---|
| 110 | .. method:: XMLReader.getContentHandler()
|
---|
| 111 |
|
---|
[391] | 112 | Return the current :class:`~xml.sax.handler.ContentHandler`.
|
---|
[2] | 113 |
|
---|
| 114 |
|
---|
| 115 | .. method:: XMLReader.setContentHandler(handler)
|
---|
| 116 |
|
---|
[391] | 117 | Set the current :class:`~xml.sax.handler.ContentHandler`. If no
|
---|
| 118 | :class:`~xml.sax.handler.ContentHandler` is set, content events will be
|
---|
| 119 | discarded.
|
---|
[2] | 120 |
|
---|
| 121 |
|
---|
| 122 | .. method:: XMLReader.getDTDHandler()
|
---|
| 123 |
|
---|
[391] | 124 | Return the current :class:`~xml.sax.handler.DTDHandler`.
|
---|
[2] | 125 |
|
---|
| 126 |
|
---|
| 127 | .. method:: XMLReader.setDTDHandler(handler)
|
---|
| 128 |
|
---|
[391] | 129 | Set the current :class:`~xml.sax.handler.DTDHandler`. If no
|
---|
| 130 | :class:`~xml.sax.handler.DTDHandler` is set, DTD
|
---|
[2] | 131 | events will be discarded.
|
---|
| 132 |
|
---|
| 133 |
|
---|
| 134 | .. method:: XMLReader.getEntityResolver()
|
---|
| 135 |
|
---|
[391] | 136 | Return the current :class:`~xml.sax.handler.EntityResolver`.
|
---|
[2] | 137 |
|
---|
| 138 |
|
---|
| 139 | .. method:: XMLReader.setEntityResolver(handler)
|
---|
| 140 |
|
---|
[391] | 141 | Set the current :class:`~xml.sax.handler.EntityResolver`. If no
|
---|
| 142 | :class:`~xml.sax.handler.EntityResolver` is set,
|
---|
[2] | 143 | attempts to resolve an external entity will result in opening the system
|
---|
| 144 | identifier for the entity, and fail if it is not available.
|
---|
| 145 |
|
---|
| 146 |
|
---|
| 147 | .. method:: XMLReader.getErrorHandler()
|
---|
| 148 |
|
---|
[391] | 149 | Return the current :class:`~xml.sax.handler.ErrorHandler`.
|
---|
[2] | 150 |
|
---|
| 151 |
|
---|
| 152 | .. method:: XMLReader.setErrorHandler(handler)
|
---|
| 153 |
|
---|
[391] | 154 | Set the current error handler. If no :class:`~xml.sax.handler.ErrorHandler`
|
---|
| 155 | is set, errors will be raised as exceptions, and warnings will be printed.
|
---|
[2] | 156 |
|
---|
| 157 |
|
---|
| 158 | .. method:: XMLReader.setLocale(locale)
|
---|
| 159 |
|
---|
| 160 | Allow an application to set the locale for errors and warnings.
|
---|
| 161 |
|
---|
| 162 | SAX parsers are not required to provide localization for errors and warnings; if
|
---|
[391] | 163 | they cannot support the requested locale, however, they must raise a SAX
|
---|
[2] | 164 | exception. Applications may request a locale change in the middle of a parse.
|
---|
| 165 |
|
---|
| 166 |
|
---|
| 167 | .. method:: XMLReader.getFeature(featurename)
|
---|
| 168 |
|
---|
| 169 | Return the current setting for feature *featurename*. If the feature is not
|
---|
| 170 | recognized, :exc:`SAXNotRecognizedException` is raised. The well-known
|
---|
| 171 | featurenames are listed in the module :mod:`xml.sax.handler`.
|
---|
| 172 |
|
---|
| 173 |
|
---|
| 174 | .. method:: XMLReader.setFeature(featurename, value)
|
---|
| 175 |
|
---|
| 176 | Set the *featurename* to *value*. If the feature is not recognized,
|
---|
| 177 | :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not
|
---|
| 178 | supported by the parser, *SAXNotSupportedException* is raised.
|
---|
| 179 |
|
---|
| 180 |
|
---|
| 181 | .. method:: XMLReader.getProperty(propertyname)
|
---|
| 182 |
|
---|
| 183 | Return the current setting for property *propertyname*. If the property is not
|
---|
| 184 | recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known
|
---|
| 185 | propertynames are listed in the module :mod:`xml.sax.handler`.
|
---|
| 186 |
|
---|
| 187 |
|
---|
| 188 | .. method:: XMLReader.setProperty(propertyname, value)
|
---|
| 189 |
|
---|
| 190 | Set the *propertyname* to *value*. If the property is not recognized,
|
---|
| 191 | :exc:`SAXNotRecognizedException` is raised. If the property or its setting is
|
---|
| 192 | not supported by the parser, *SAXNotSupportedException* is raised.
|
---|
| 193 |
|
---|
| 194 |
|
---|
| 195 | .. _incremental-parser-objects:
|
---|
| 196 |
|
---|
| 197 | IncrementalParser Objects
|
---|
| 198 | -------------------------
|
---|
| 199 |
|
---|
| 200 | Instances of :class:`IncrementalParser` offer the following additional methods:
|
---|
| 201 |
|
---|
| 202 |
|
---|
| 203 | .. method:: IncrementalParser.feed(data)
|
---|
| 204 |
|
---|
| 205 | Process a chunk of *data*.
|
---|
| 206 |
|
---|
| 207 |
|
---|
| 208 | .. method:: IncrementalParser.close()
|
---|
| 209 |
|
---|
| 210 | Assume the end of the document. That will check well-formedness conditions that
|
---|
| 211 | can be checked only at the end, invoke handlers, and may clean up resources
|
---|
| 212 | allocated during parsing.
|
---|
| 213 |
|
---|
| 214 |
|
---|
| 215 | .. method:: IncrementalParser.reset()
|
---|
| 216 |
|
---|
| 217 | This method is called after close has been called to reset the parser so that it
|
---|
| 218 | is ready to parse new documents. The results of calling parse or feed after
|
---|
| 219 | close without calling reset are undefined.
|
---|
| 220 |
|
---|
| 221 |
|
---|
| 222 | .. _locator-objects:
|
---|
| 223 |
|
---|
| 224 | Locator Objects
|
---|
| 225 | ---------------
|
---|
| 226 |
|
---|
| 227 | Instances of :class:`Locator` provide these methods:
|
---|
| 228 |
|
---|
| 229 |
|
---|
| 230 | .. method:: Locator.getColumnNumber()
|
---|
| 231 |
|
---|
| 232 | Return the column number where the current event ends.
|
---|
| 233 |
|
---|
| 234 |
|
---|
| 235 | .. method:: Locator.getLineNumber()
|
---|
| 236 |
|
---|
| 237 | Return the line number where the current event ends.
|
---|
| 238 |
|
---|
| 239 |
|
---|
| 240 | .. method:: Locator.getPublicId()
|
---|
| 241 |
|
---|
| 242 | Return the public identifier for the current event.
|
---|
| 243 |
|
---|
| 244 |
|
---|
| 245 | .. method:: Locator.getSystemId()
|
---|
| 246 |
|
---|
| 247 | Return the system identifier for the current event.
|
---|
| 248 |
|
---|
| 249 |
|
---|
| 250 | .. _input-source-objects:
|
---|
| 251 |
|
---|
| 252 | InputSource Objects
|
---|
| 253 | -------------------
|
---|
| 254 |
|
---|
| 255 |
|
---|
| 256 | .. method:: InputSource.setPublicId(id)
|
---|
| 257 |
|
---|
| 258 | Sets the public identifier of this :class:`InputSource`.
|
---|
| 259 |
|
---|
| 260 |
|
---|
| 261 | .. method:: InputSource.getPublicId()
|
---|
| 262 |
|
---|
| 263 | Returns the public identifier of this :class:`InputSource`.
|
---|
| 264 |
|
---|
| 265 |
|
---|
| 266 | .. method:: InputSource.setSystemId(id)
|
---|
| 267 |
|
---|
| 268 | Sets the system identifier of this :class:`InputSource`.
|
---|
| 269 |
|
---|
| 270 |
|
---|
| 271 | .. method:: InputSource.getSystemId()
|
---|
| 272 |
|
---|
| 273 | Returns the system identifier of this :class:`InputSource`.
|
---|
| 274 |
|
---|
| 275 |
|
---|
| 276 | .. method:: InputSource.setEncoding(encoding)
|
---|
| 277 |
|
---|
| 278 | Sets the character encoding of this :class:`InputSource`.
|
---|
| 279 |
|
---|
| 280 | The encoding must be a string acceptable for an XML encoding declaration (see
|
---|
| 281 | section 4.3.3 of the XML recommendation).
|
---|
| 282 |
|
---|
| 283 | The encoding attribute of the :class:`InputSource` is ignored if the
|
---|
| 284 | :class:`InputSource` also contains a character stream.
|
---|
| 285 |
|
---|
| 286 |
|
---|
| 287 | .. method:: InputSource.getEncoding()
|
---|
| 288 |
|
---|
| 289 | Get the character encoding of this InputSource.
|
---|
| 290 |
|
---|
| 291 |
|
---|
| 292 | .. method:: InputSource.setByteStream(bytefile)
|
---|
| 293 |
|
---|
| 294 | Set the byte stream (a Python file-like object which does not perform
|
---|
| 295 | byte-to-character conversion) for this input source.
|
---|
| 296 |
|
---|
| 297 | The SAX parser will ignore this if there is also a character stream specified,
|
---|
| 298 | but it will use a byte stream in preference to opening a URI connection itself.
|
---|
| 299 |
|
---|
| 300 | If the application knows the character encoding of the byte stream, it should
|
---|
| 301 | set it with the setEncoding method.
|
---|
| 302 |
|
---|
| 303 |
|
---|
| 304 | .. method:: InputSource.getByteStream()
|
---|
| 305 |
|
---|
| 306 | Get the byte stream for this input source.
|
---|
| 307 |
|
---|
| 308 | The getEncoding method will return the character encoding for this byte stream,
|
---|
| 309 | or None if unknown.
|
---|
| 310 |
|
---|
| 311 |
|
---|
| 312 | .. method:: InputSource.setCharacterStream(charfile)
|
---|
| 313 |
|
---|
| 314 | Set the character stream for this input source. (The stream must be a Python 1.6
|
---|
| 315 | Unicode-wrapped file-like that performs conversion to Unicode strings.)
|
---|
| 316 |
|
---|
| 317 | If there is a character stream specified, the SAX parser will ignore any byte
|
---|
| 318 | stream and will not attempt to open a URI connection to the system identifier.
|
---|
| 319 |
|
---|
| 320 |
|
---|
| 321 | .. method:: InputSource.getCharacterStream()
|
---|
| 322 |
|
---|
| 323 | Get the character stream for this input source.
|
---|
| 324 |
|
---|
| 325 |
|
---|
| 326 | .. _attributes-objects:
|
---|
| 327 |
|
---|
| 328 | The :class:`Attributes` Interface
|
---|
| 329 | ---------------------------------
|
---|
| 330 |
|
---|
| 331 | :class:`Attributes` objects implement a portion of the mapping protocol,
|
---|
[391] | 332 | including the methods :meth:`~collections.Mapping.copy`,
|
---|
| 333 | :meth:`~collections.Mapping.get`,
|
---|
| 334 | :meth:`~collections.Mapping.has_key`,
|
---|
| 335 | :meth:`~collections.Mapping.items`,
|
---|
| 336 | :meth:`~collections.Mapping.keys`,
|
---|
| 337 | and :meth:`~collections.Mapping.values`. The following methods
|
---|
| 338 | are also provided:
|
---|
[2] | 339 |
|
---|
| 340 |
|
---|
| 341 | .. method:: Attributes.getLength()
|
---|
| 342 |
|
---|
| 343 | Return the number of attributes.
|
---|
| 344 |
|
---|
| 345 |
|
---|
| 346 | .. method:: Attributes.getNames()
|
---|
| 347 |
|
---|
| 348 | Return the names of the attributes.
|
---|
| 349 |
|
---|
| 350 |
|
---|
| 351 | .. method:: Attributes.getType(name)
|
---|
| 352 |
|
---|
| 353 | Returns the type of the attribute *name*, which is normally ``'CDATA'``.
|
---|
| 354 |
|
---|
| 355 |
|
---|
| 356 | .. method:: Attributes.getValue(name)
|
---|
| 357 |
|
---|
| 358 | Return the value of attribute *name*.
|
---|
| 359 |
|
---|
| 360 | .. getValueByQName, getNameByQName, getQNameByName, getQNames available
|
---|
| 361 | .. here already, but documented only for derived class.
|
---|
| 362 |
|
---|
| 363 |
|
---|
| 364 | .. _attributes-ns-objects:
|
---|
| 365 |
|
---|
| 366 | The :class:`AttributesNS` Interface
|
---|
| 367 | -----------------------------------
|
---|
| 368 |
|
---|
| 369 | This interface is a subtype of the :class:`Attributes` interface (see section
|
---|
| 370 | :ref:`attributes-objects`). All methods supported by that interface are also
|
---|
| 371 | available on :class:`AttributesNS` objects.
|
---|
| 372 |
|
---|
| 373 | The following methods are also available:
|
---|
| 374 |
|
---|
| 375 |
|
---|
| 376 | .. method:: AttributesNS.getValueByQName(name)
|
---|
| 377 |
|
---|
| 378 | Return the value for a qualified name.
|
---|
| 379 |
|
---|
| 380 |
|
---|
| 381 | .. method:: AttributesNS.getNameByQName(name)
|
---|
| 382 |
|
---|
| 383 | Return the ``(namespace, localname)`` pair for a qualified *name*.
|
---|
| 384 |
|
---|
| 385 |
|
---|
| 386 | .. method:: AttributesNS.getQNameByName(name)
|
---|
| 387 |
|
---|
| 388 | Return the qualified name for a ``(namespace, localname)`` pair.
|
---|
| 389 |
|
---|
| 390 |
|
---|
| 391 | .. method:: AttributesNS.getQNames()
|
---|
| 392 |
|
---|
| 393 | Return the qualified names of all attributes.
|
---|
| 394 |
|
---|