[388] | 1 | .. _xml:
|
---|
| 2 |
|
---|
| 3 | XML Processing Modules
|
---|
| 4 | ======================
|
---|
| 5 |
|
---|
| 6 | .. module:: xml
|
---|
| 7 | :synopsis: Package containing XML processing modules
|
---|
| 8 | .. sectionauthor:: Christian Heimes <christian@python.org>
|
---|
| 9 | .. sectionauthor:: Georg Brandl <georg@python.org>
|
---|
| 10 |
|
---|
| 11 |
|
---|
| 12 | Python's interfaces for processing XML are grouped in the ``xml`` package.
|
---|
| 13 |
|
---|
| 14 | .. warning::
|
---|
| 15 |
|
---|
| 16 | The XML modules are not secure against erroneous or maliciously
|
---|
| 17 | constructed data. If you need to parse untrusted or unauthenticated data see
|
---|
| 18 | :ref:`xml-vulnerabilities`.
|
---|
| 19 |
|
---|
| 20 | It is important to note that modules in the :mod:`xml` package require that
|
---|
| 21 | there be at least one SAX-compliant XML parser available. The Expat parser is
|
---|
| 22 | included with Python, so the :mod:`xml.parsers.expat` module will always be
|
---|
| 23 | available.
|
---|
| 24 |
|
---|
| 25 | The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the
|
---|
| 26 | definition of the Python bindings for the DOM and SAX interfaces.
|
---|
| 27 |
|
---|
| 28 | The XML handling submodules are:
|
---|
| 29 |
|
---|
| 30 | * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight
|
---|
| 31 |
|
---|
| 32 | ..
|
---|
| 33 |
|
---|
| 34 | * :mod:`xml.dom`: the DOM API definition
|
---|
| 35 | * :mod:`xml.dom.minidom`: a lightweight DOM implementation
|
---|
| 36 | * :mod:`xml.dom.pulldom`: support for building partial DOM trees
|
---|
| 37 |
|
---|
| 38 | ..
|
---|
| 39 |
|
---|
| 40 | * :mod:`xml.sax`: SAX2 base classes and convenience functions
|
---|
| 41 | * :mod:`xml.parsers.expat`: the Expat parser binding
|
---|
| 42 |
|
---|
| 43 |
|
---|
| 44 | .. _xml-vulnerabilities:
|
---|
| 45 |
|
---|
| 46 | XML vulnerabilities
|
---|
| 47 | ===================
|
---|
| 48 |
|
---|
| 49 | The XML processing modules are not secure against maliciously constructed data.
|
---|
| 50 | An attacker can abuse vulnerabilities for e.g. denial of service attacks, to
|
---|
| 51 | access local files, to generate network connections to other machines, or
|
---|
| 52 | to or circumvent firewalls. The attacks on XML abuse unfamiliar features
|
---|
| 53 | like inline `DTD`_ (document type definition) with entities.
|
---|
| 54 |
|
---|
| 55 | The following table gives an overview of the known attacks and if the various
|
---|
| 56 | modules are vulnerable to them.
|
---|
| 57 |
|
---|
| 58 | ========================= ======== ========= ========= ======== =========
|
---|
| 59 | kind sax etree minidom pulldom xmlrpc
|
---|
| 60 | ========================= ======== ========= ========= ======== =========
|
---|
| 61 | billion laughs **Yes** **Yes** **Yes** **Yes** **Yes**
|
---|
| 62 | quadratic blowup **Yes** **Yes** **Yes** **Yes** **Yes**
|
---|
| 63 | external entity expansion **Yes** No (1) No (2) **Yes** No (3)
|
---|
| 64 | DTD retrieval **Yes** No No **Yes** No
|
---|
| 65 | decompression bomb No No No No **Yes**
|
---|
| 66 | ========================= ======== ========= ========= ======== =========
|
---|
| 67 |
|
---|
| 68 | 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a
|
---|
| 69 | ParserError when an entity occurs.
|
---|
| 70 | 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns
|
---|
| 71 | the unexpanded entity verbatim.
|
---|
| 72 | 3. :mod:`xmlrpclib` doesn't expand external entities and omits them.
|
---|
| 73 |
|
---|
| 74 |
|
---|
| 75 | billion laughs / exponential entity expansion
|
---|
| 76 | The `Billion Laughs`_ attack -- also known as exponential entity expansion --
|
---|
| 77 | uses multiple levels of nested entities. Each entity refers to another entity
|
---|
| 78 | several times, the final entity definition contains a small string. Eventually
|
---|
| 79 | the small string is expanded to several gigabytes. The exponential expansion
|
---|
| 80 | consumes lots of CPU time, too.
|
---|
| 81 |
|
---|
| 82 | quadratic blowup entity expansion
|
---|
| 83 | A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses
|
---|
| 84 | entity expansion, too. Instead of nested entities it repeats one large entity
|
---|
| 85 | with a couple of thousand chars over and over again. The attack isn't as
|
---|
| 86 | efficient as the exponential case but it avoids triggering countermeasures of
|
---|
| 87 | parsers against heavily nested entities.
|
---|
| 88 |
|
---|
| 89 | external entity expansion
|
---|
| 90 | Entity declarations can contain more than just text for replacement. They can
|
---|
| 91 | also point to external resources by public identifiers or system identifiers.
|
---|
| 92 | System identifiers are standard URIs or can refer to local files. The XML
|
---|
| 93 | parser retrieves the resource with e.g. HTTP or FTP requests and embeds the
|
---|
| 94 | content into the XML document.
|
---|
| 95 |
|
---|
| 96 | DTD retrieval
|
---|
| 97 | Some XML libraries like Python's mod:'xml.dom.pulldom' retrieve document type
|
---|
| 98 | definitions from remote or local locations. The feature has similar
|
---|
| 99 | implications as the external entity expansion issue.
|
---|
| 100 |
|
---|
| 101 | decompression bomb
|
---|
| 102 | The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries
|
---|
| 103 | that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed
|
---|
| 104 | files. For an attacker it can reduce the amount of transmitted data by three
|
---|
| 105 | magnitudes or more.
|
---|
| 106 |
|
---|
| 107 | The documentation of `defusedxml`_ on PyPI has further information about
|
---|
| 108 | all known attack vectors with examples and references.
|
---|
| 109 |
|
---|
| 110 | defused packages
|
---|
| 111 | ----------------
|
---|
| 112 |
|
---|
| 113 | These external packages are recommended for any code that parses
|
---|
| 114 | untrusted XML data.
|
---|
| 115 |
|
---|
| 116 | `defusedxml`_ is a pure Python package with modified subclasses of all stdlib
|
---|
| 117 | XML parsers that prevent any potentially malicious operation. The
|
---|
| 118 | package also ships with example exploits and extended documentation on more
|
---|
| 119 | XML exploits like xpath injection.
|
---|
| 120 |
|
---|
| 121 | `defusedexpat`_ provides a modified libexpat and patched replacement
|
---|
| 122 | :mod:`pyexpat` extension module with countermeasures against entity expansion
|
---|
| 123 | DoS attacks. Defusedexpat still allows a sane and configurable amount of entity
|
---|
| 124 | expansions. The modifications will be merged into future releases of Python.
|
---|
| 125 |
|
---|
| 126 | The workarounds and modifications are not included in patch releases as they
|
---|
| 127 | break backward compatibility. After all inline DTD and entity expansion are
|
---|
| 128 | well-defined XML features.
|
---|
| 129 |
|
---|
| 130 |
|
---|
| 131 | .. _defusedxml: https://pypi.python.org/pypi/defusedxml/
|
---|
| 132 | .. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/
|
---|
| 133 | .. _Billion Laughs: http://en.wikipedia.org/wiki/Billion_laughs
|
---|
| 134 | .. _ZIP bomb: http://en.wikipedia.org/wiki/Zip_bomb
|
---|
| 135 | .. _DTD: http://en.wikipedia.org/wiki/Document_Type_Definition
|
---|