[2] | 1 |
|
---|
| 2 | :mod:`xml.sax.handler` --- Base classes for SAX handlers
|
---|
| 3 | ========================================================
|
---|
| 4 |
|
---|
| 5 | .. module:: xml.sax.handler
|
---|
| 6 | :synopsis: Base classes for SAX event handlers.
|
---|
| 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
|
---|
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
|
---|
| 9 |
|
---|
| 10 |
|
---|
| 11 | .. versionadded:: 2.0
|
---|
| 12 |
|
---|
| 13 | The SAX API defines four kinds of handlers: content handlers, DTD handlers,
|
---|
| 14 | error handlers, and entity resolvers. Applications normally only need to
|
---|
| 15 | implement those interfaces whose events they are interested in; they can
|
---|
| 16 | implement the interfaces in a single object or in multiple objects. Handler
|
---|
| 17 | implementations should inherit from the base classes provided in the module
|
---|
| 18 | :mod:`xml.sax.handler`, so that all methods get default implementations.
|
---|
| 19 |
|
---|
| 20 |
|
---|
| 21 | .. class:: ContentHandler
|
---|
| 22 |
|
---|
| 23 | This is the main callback interface in SAX, and the one most important to
|
---|
| 24 | applications. The order of events in this interface mirrors the order of the
|
---|
| 25 | information in the document.
|
---|
| 26 |
|
---|
| 27 |
|
---|
| 28 | .. class:: DTDHandler
|
---|
| 29 |
|
---|
| 30 | Handle DTD events.
|
---|
| 31 |
|
---|
| 32 | This interface specifies only those DTD events required for basic parsing
|
---|
| 33 | (unparsed entities and attributes).
|
---|
| 34 |
|
---|
| 35 |
|
---|
| 36 | .. class:: EntityResolver
|
---|
| 37 |
|
---|
| 38 | Basic interface for resolving entities. If you create an object implementing
|
---|
| 39 | this interface, then register the object with your Parser, the parser will call
|
---|
| 40 | the method in your object to resolve all external entities.
|
---|
| 41 |
|
---|
| 42 |
|
---|
| 43 | .. class:: ErrorHandler
|
---|
| 44 |
|
---|
| 45 | Interface used by the parser to present error and warning messages to the
|
---|
| 46 | application. The methods of this object control whether errors are immediately
|
---|
| 47 | converted to exceptions or are handled in some other way.
|
---|
| 48 |
|
---|
| 49 | In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants
|
---|
| 50 | for the feature and property names.
|
---|
| 51 |
|
---|
| 52 |
|
---|
| 53 | .. data:: feature_namespaces
|
---|
| 54 |
|
---|
[391] | 55 | | value: ``"http://xml.org/sax/features/namespaces"``
|
---|
| 56 | | true: Perform Namespace processing.
|
---|
| 57 | | false: Optionally do not perform Namespace processing (implies
|
---|
| 58 | namespace-prefixes; default).
|
---|
| 59 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 60 |
|
---|
| 61 |
|
---|
| 62 | .. data:: feature_namespace_prefixes
|
---|
| 63 |
|
---|
[391] | 64 | | value: ``"http://xml.org/sax/features/namespace-prefixes"``
|
---|
| 65 | | true: Report the original prefixed names and attributes used for Namespace
|
---|
| 66 | declarations.
|
---|
| 67 | | false: Do not report attributes used for Namespace declarations, and
|
---|
| 68 | optionally do not report original prefixed names (default).
|
---|
| 69 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 70 |
|
---|
| 71 |
|
---|
| 72 | .. data:: feature_string_interning
|
---|
| 73 |
|
---|
[391] | 74 | | value: ``"http://xml.org/sax/features/string-interning"``
|
---|
| 75 | | true: All element names, prefixes, attribute names, Namespace URIs, and
|
---|
| 76 | local names are interned using the built-in intern function.
|
---|
| 77 | | false: Names are not necessarily interned, although they may be (default).
|
---|
| 78 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 79 |
|
---|
| 80 |
|
---|
| 81 | .. data:: feature_validation
|
---|
| 82 |
|
---|
[391] | 83 | | value: ``"http://xml.org/sax/features/validation"``
|
---|
| 84 | | true: Report all validation errors (implies external-general-entities and
|
---|
| 85 | external-parameter-entities).
|
---|
| 86 | | false: Do not report validation errors.
|
---|
| 87 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 88 |
|
---|
| 89 |
|
---|
| 90 | .. data:: feature_external_ges
|
---|
| 91 |
|
---|
[391] | 92 | | value: ``"http://xml.org/sax/features/external-general-entities"``
|
---|
| 93 | | true: Include all external general (text) entities.
|
---|
| 94 | | false: Do not include external general entities.
|
---|
| 95 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 96 |
|
---|
| 97 |
|
---|
| 98 | .. data:: feature_external_pes
|
---|
| 99 |
|
---|
[391] | 100 | | value: ``"http://xml.org/sax/features/external-parameter-entities"``
|
---|
| 101 | | true: Include all external parameter entities, including the external DTD
|
---|
| 102 | subset.
|
---|
| 103 | | false: Do not include any external parameter entities, even the external
|
---|
| 104 | DTD subset.
|
---|
| 105 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 106 |
|
---|
| 107 |
|
---|
| 108 | .. data:: all_features
|
---|
| 109 |
|
---|
| 110 | List of all features.
|
---|
| 111 |
|
---|
| 112 |
|
---|
| 113 | .. data:: property_lexical_handler
|
---|
| 114 |
|
---|
[391] | 115 | | value: ``"http://xml.org/sax/properties/lexical-handler"``
|
---|
| 116 | | data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2)
|
---|
| 117 | | description: An optional extension handler for lexical events like
|
---|
| 118 | comments.
|
---|
| 119 | | access: read/write
|
---|
[2] | 120 |
|
---|
| 121 |
|
---|
| 122 | .. data:: property_declaration_handler
|
---|
| 123 |
|
---|
[391] | 124 | | value: ``"http://xml.org/sax/properties/declaration-handler"``
|
---|
| 125 | | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2)
|
---|
| 126 | | description: An optional extension handler for DTD-related events other
|
---|
| 127 | than notations and unparsed entities.
|
---|
| 128 | | access: read/write
|
---|
[2] | 129 |
|
---|
| 130 |
|
---|
| 131 | .. data:: property_dom_node
|
---|
| 132 |
|
---|
[391] | 133 | | value: ``"http://xml.org/sax/properties/dom-node"``
|
---|
| 134 | | data type: org.w3c.dom.Node (not supported in Python 2)
|
---|
| 135 | | description: When parsing, the current DOM node being visited if this is
|
---|
| 136 | a DOM iterator; when not parsing, the root DOM node for iteration.
|
---|
| 137 | | access: (parsing) read-only; (not parsing) read/write
|
---|
[2] | 138 |
|
---|
| 139 |
|
---|
| 140 | .. data:: property_xml_string
|
---|
| 141 |
|
---|
[391] | 142 | | value: ``"http://xml.org/sax/properties/xml-string"``
|
---|
| 143 | | data type: String
|
---|
| 144 | | description: The literal string of characters that was the source for the
|
---|
| 145 | current event.
|
---|
| 146 | | access: read-only
|
---|
[2] | 147 |
|
---|
| 148 |
|
---|
| 149 | .. data:: all_properties
|
---|
| 150 |
|
---|
| 151 | List of all known property names.
|
---|
| 152 |
|
---|
| 153 |
|
---|
| 154 | .. _content-handler-objects:
|
---|
| 155 |
|
---|
| 156 | ContentHandler Objects
|
---|
| 157 | ----------------------
|
---|
| 158 |
|
---|
| 159 | Users are expected to subclass :class:`ContentHandler` to support their
|
---|
| 160 | application. The following methods are called by the parser on the appropriate
|
---|
| 161 | events in the input document:
|
---|
| 162 |
|
---|
| 163 |
|
---|
| 164 | .. method:: ContentHandler.setDocumentLocator(locator)
|
---|
| 165 |
|
---|
| 166 | Called by the parser to give the application a locator for locating the origin
|
---|
| 167 | of document events.
|
---|
| 168 |
|
---|
| 169 | SAX parsers are strongly encouraged (though not absolutely required) to supply a
|
---|
| 170 | locator: if it does so, it must supply the locator to the application by
|
---|
| 171 | invoking this method before invoking any of the other methods in the
|
---|
| 172 | DocumentHandler interface.
|
---|
| 173 |
|
---|
| 174 | The locator allows the application to determine the end position of any
|
---|
| 175 | document-related event, even if the parser is not reporting an error. Typically,
|
---|
| 176 | the application will use this information for reporting its own errors (such as
|
---|
| 177 | character content that does not match an application's business rules). The
|
---|
| 178 | information returned by the locator is probably not sufficient for use with a
|
---|
| 179 | search engine.
|
---|
| 180 |
|
---|
| 181 | Note that the locator will return correct information only during the invocation
|
---|
| 182 | of the events in this interface. The application should not attempt to use it at
|
---|
| 183 | any other time.
|
---|
| 184 |
|
---|
| 185 |
|
---|
| 186 | .. method:: ContentHandler.startDocument()
|
---|
| 187 |
|
---|
| 188 | Receive notification of the beginning of a document.
|
---|
| 189 |
|
---|
| 190 | The SAX parser will invoke this method only once, before any other methods in
|
---|
| 191 | this interface or in DTDHandler (except for :meth:`setDocumentLocator`).
|
---|
| 192 |
|
---|
| 193 |
|
---|
| 194 | .. method:: ContentHandler.endDocument()
|
---|
| 195 |
|
---|
| 196 | Receive notification of the end of a document.
|
---|
| 197 |
|
---|
| 198 | The SAX parser will invoke this method only once, and it will be the last method
|
---|
| 199 | invoked during the parse. The parser shall not invoke this method until it has
|
---|
| 200 | either abandoned parsing (because of an unrecoverable error) or reached the end
|
---|
| 201 | of input.
|
---|
| 202 |
|
---|
| 203 |
|
---|
| 204 | .. method:: ContentHandler.startPrefixMapping(prefix, uri)
|
---|
| 205 |
|
---|
| 206 | Begin the scope of a prefix-URI Namespace mapping.
|
---|
| 207 |
|
---|
| 208 | The information from this event is not necessary for normal Namespace
|
---|
| 209 | processing: the SAX XML reader will automatically replace prefixes for element
|
---|
| 210 | and attribute names when the ``feature_namespaces`` feature is enabled (the
|
---|
| 211 | default).
|
---|
| 212 |
|
---|
| 213 | There are cases, however, when applications need to use prefixes in character
|
---|
| 214 | data or in attribute values, where they cannot safely be expanded automatically;
|
---|
| 215 | the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the
|
---|
| 216 | information to the application to expand prefixes in those contexts itself, if
|
---|
| 217 | necessary.
|
---|
| 218 |
|
---|
| 219 | .. XXX This is not really the default, is it? MvL
|
---|
| 220 |
|
---|
| 221 | Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not
|
---|
| 222 | guaranteed to be properly nested relative to each-other: all
|
---|
| 223 | :meth:`startPrefixMapping` events will occur before the corresponding
|
---|
| 224 | :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur
|
---|
| 225 | after the corresponding :meth:`endElement` event, but their order is not
|
---|
| 226 | guaranteed.
|
---|
| 227 |
|
---|
| 228 |
|
---|
| 229 | .. method:: ContentHandler.endPrefixMapping(prefix)
|
---|
| 230 |
|
---|
| 231 | End the scope of a prefix-URI mapping.
|
---|
| 232 |
|
---|
| 233 | See :meth:`startPrefixMapping` for details. This event will always occur after
|
---|
| 234 | the corresponding :meth:`endElement` event, but the order of
|
---|
| 235 | :meth:`endPrefixMapping` events is not otherwise guaranteed.
|
---|
| 236 |
|
---|
| 237 |
|
---|
| 238 | .. method:: ContentHandler.startElement(name, attrs)
|
---|
| 239 |
|
---|
| 240 | Signals the start of an element in non-namespace mode.
|
---|
| 241 |
|
---|
| 242 | The *name* parameter contains the raw XML 1.0 name of the element type as a
|
---|
[391] | 243 | string and the *attrs* parameter holds an object of the
|
---|
| 244 | :class:`~xml.sax.xmlreader.Attributes`
|
---|
[2] | 245 | interface (see :ref:`attributes-objects`) containing the attributes of
|
---|
| 246 | the element. The object passed as *attrs* may be re-used by the parser; holding
|
---|
| 247 | on to a reference to it is not a reliable way to keep a copy of the attributes.
|
---|
| 248 | To keep a copy of the attributes, use the :meth:`copy` method of the *attrs*
|
---|
| 249 | object.
|
---|
| 250 |
|
---|
| 251 |
|
---|
| 252 | .. method:: ContentHandler.endElement(name)
|
---|
| 253 |
|
---|
| 254 | Signals the end of an element in non-namespace mode.
|
---|
| 255 |
|
---|
| 256 | The *name* parameter contains the name of the element type, just as with the
|
---|
| 257 | :meth:`startElement` event.
|
---|
| 258 |
|
---|
| 259 |
|
---|
| 260 | .. method:: ContentHandler.startElementNS(name, qname, attrs)
|
---|
| 261 |
|
---|
| 262 | Signals the start of an element in namespace mode.
|
---|
| 263 |
|
---|
| 264 | The *name* parameter contains the name of the element type as a ``(uri,
|
---|
| 265 | localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in
|
---|
| 266 | the source document, and the *attrs* parameter holds an instance of the
|
---|
[391] | 267 | :class:`~xml.sax.xmlreader.AttributesNS` interface (see
|
---|
| 268 | :ref:`attributes-ns-objects`)
|
---|
[2] | 269 | containing the attributes of the element. If no namespace is associated with
|
---|
| 270 | the element, the *uri* component of *name* will be ``None``. The object passed
|
---|
| 271 | as *attrs* may be re-used by the parser; holding on to a reference to it is not
|
---|
| 272 | a reliable way to keep a copy of the attributes. To keep a copy of the
|
---|
| 273 | attributes, use the :meth:`copy` method of the *attrs* object.
|
---|
| 274 |
|
---|
| 275 | Parsers may set the *qname* parameter to ``None``, unless the
|
---|
| 276 | ``feature_namespace_prefixes`` feature is activated.
|
---|
| 277 |
|
---|
| 278 |
|
---|
| 279 | .. method:: ContentHandler.endElementNS(name, qname)
|
---|
| 280 |
|
---|
| 281 | Signals the end of an element in namespace mode.
|
---|
| 282 |
|
---|
| 283 | The *name* parameter contains the name of the element type, just as with the
|
---|
| 284 | :meth:`startElementNS` method, likewise the *qname* parameter.
|
---|
| 285 |
|
---|
| 286 |
|
---|
| 287 | .. method:: ContentHandler.characters(content)
|
---|
| 288 |
|
---|
| 289 | Receive notification of character data.
|
---|
| 290 |
|
---|
| 291 | The Parser will call this method to report each chunk of character data. SAX
|
---|
| 292 | parsers may return all contiguous character data in a single chunk, or they may
|
---|
| 293 | split it into several chunks; however, all of the characters in any single event
|
---|
| 294 | must come from the same external entity so that the Locator provides useful
|
---|
| 295 | information.
|
---|
| 296 |
|
---|
| 297 | *content* may be a Unicode string or a byte string; the ``expat`` reader module
|
---|
| 298 | produces always Unicode strings.
|
---|
| 299 |
|
---|
| 300 | .. note::
|
---|
| 301 |
|
---|
| 302 | The earlier SAX 1 interface provided by the Python XML Special Interest Group
|
---|
| 303 | used a more Java-like interface for this method. Since most parsers used from
|
---|
| 304 | Python did not take advantage of the older interface, the simpler signature was
|
---|
| 305 | chosen to replace it. To convert old code to the new interface, use *content*
|
---|
| 306 | instead of slicing content with the old *offset* and *length* parameters.
|
---|
| 307 |
|
---|
| 308 |
|
---|
| 309 | .. method:: ContentHandler.ignorableWhitespace(whitespace)
|
---|
| 310 |
|
---|
| 311 | Receive notification of ignorable whitespace in element content.
|
---|
| 312 |
|
---|
| 313 | Validating Parsers must use this method to report each chunk of ignorable
|
---|
| 314 | whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating
|
---|
| 315 | parsers may also use this method if they are capable of parsing and using
|
---|
| 316 | content models.
|
---|
| 317 |
|
---|
| 318 | SAX parsers may return all contiguous whitespace in a single chunk, or they may
|
---|
| 319 | split it into several chunks; however, all of the characters in any single event
|
---|
| 320 | must come from the same external entity, so that the Locator provides useful
|
---|
| 321 | information.
|
---|
| 322 |
|
---|
| 323 |
|
---|
| 324 | .. method:: ContentHandler.processingInstruction(target, data)
|
---|
| 325 |
|
---|
| 326 | Receive notification of a processing instruction.
|
---|
| 327 |
|
---|
| 328 | The Parser will invoke this method once for each processing instruction found:
|
---|
| 329 | note that processing instructions may occur before or after the main document
|
---|
| 330 | element.
|
---|
| 331 |
|
---|
| 332 | A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a
|
---|
| 333 | text declaration (XML 1.0, section 4.3.1) using this method.
|
---|
| 334 |
|
---|
| 335 |
|
---|
| 336 | .. method:: ContentHandler.skippedEntity(name)
|
---|
| 337 |
|
---|
| 338 | Receive notification of a skipped entity.
|
---|
| 339 |
|
---|
| 340 | The Parser will invoke this method once for each entity skipped. Non-validating
|
---|
| 341 | processors may skip entities if they have not seen the declarations (because,
|
---|
| 342 | for example, the entity was declared in an external DTD subset). All processors
|
---|
| 343 | may skip external entities, depending on the values of the
|
---|
| 344 | ``feature_external_ges`` and the ``feature_external_pes`` properties.
|
---|
| 345 |
|
---|
| 346 |
|
---|
| 347 | .. _dtd-handler-objects:
|
---|
| 348 |
|
---|
| 349 | DTDHandler Objects
|
---|
| 350 | ------------------
|
---|
| 351 |
|
---|
| 352 | :class:`DTDHandler` instances provide the following methods:
|
---|
| 353 |
|
---|
| 354 |
|
---|
| 355 | .. method:: DTDHandler.notationDecl(name, publicId, systemId)
|
---|
| 356 |
|
---|
| 357 | Handle a notation declaration event.
|
---|
| 358 |
|
---|
| 359 |
|
---|
| 360 | .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata)
|
---|
| 361 |
|
---|
| 362 | Handle an unparsed entity declaration event.
|
---|
| 363 |
|
---|
| 364 |
|
---|
| 365 | .. _entity-resolver-objects:
|
---|
| 366 |
|
---|
| 367 | EntityResolver Objects
|
---|
| 368 | ----------------------
|
---|
| 369 |
|
---|
| 370 |
|
---|
| 371 | .. method:: EntityResolver.resolveEntity(publicId, systemId)
|
---|
| 372 |
|
---|
| 373 | Resolve the system identifier of an entity and return either the system
|
---|
| 374 | identifier to read from as a string, or an InputSource to read from. The default
|
---|
| 375 | implementation returns *systemId*.
|
---|
| 376 |
|
---|
| 377 |
|
---|
| 378 | .. _sax-error-handler:
|
---|
| 379 |
|
---|
| 380 | ErrorHandler Objects
|
---|
| 381 | --------------------
|
---|
| 382 |
|
---|
| 383 | Objects with this interface are used to receive error and warning information
|
---|
[391] | 384 | from the :class:`~xml.sax.xmlreader.XMLReader`. If you create an object that
|
---|
| 385 | implements this interface, then register the object with your
|
---|
| 386 | :class:`~xml.sax.xmlreader.XMLReader`, the parser
|
---|
[2] | 387 | will call the methods in your object to report all warnings and errors. There
|
---|
| 388 | are three levels of errors available: warnings, (possibly) recoverable errors,
|
---|
| 389 | and unrecoverable errors. All methods take a :exc:`SAXParseException` as the
|
---|
| 390 | only parameter. Errors and warnings may be converted to an exception by raising
|
---|
| 391 | the passed-in exception object.
|
---|
| 392 |
|
---|
| 393 |
|
---|
| 394 | .. method:: ErrorHandler.error(exception)
|
---|
| 395 |
|
---|
| 396 | Called when the parser encounters a recoverable error. If this method does not
|
---|
| 397 | raise an exception, parsing may continue, but further document information
|
---|
| 398 | should not be expected by the application. Allowing the parser to continue may
|
---|
| 399 | allow additional errors to be discovered in the input document.
|
---|
| 400 |
|
---|
| 401 |
|
---|
| 402 | .. method:: ErrorHandler.fatalError(exception)
|
---|
| 403 |
|
---|
| 404 | Called when the parser encounters an error it cannot recover from; parsing is
|
---|
| 405 | expected to terminate when this method returns.
|
---|
| 406 |
|
---|
| 407 |
|
---|
| 408 | .. method:: ErrorHandler.warning(exception)
|
---|
| 409 |
|
---|
| 410 | Called when the parser presents minor warning information to the application.
|
---|
| 411 | Parsing is expected to continue when this method returns, and document
|
---|
| 412 | information will continue to be passed to the application. Raising an exception
|
---|
| 413 | in this method will cause parsing to end.
|
---|
| 414 |
|
---|