20.4. XMLを扱うモジュール群

Source code: Lib/xml/


PythonのXMLを扱うインタフェースは xml パッケージにまとめられています。

警告

The XML modules are not secure against erroneous or maliciously constructed data. If you need to parse untrusted or unauthenticated data see the XML の脆弱性 and The defusedxml and defusedexpat Packages sections.

注意すべき重要な点として、 xml パッケージは少なくとも一つの SAX に対応した XML パーザが利用可能でなければなりません。Expat パーザが Python に取り込まれているので、 xml.parsers.expat モジュールは常に利用できます。

xml.dom および xml.sax パッケージのドキュメントは Python による DOM および SAX インタフェースへのバインディングに関する定義です。

XML に関連するサブモジュール:

20.4.1. XML の脆弱性

The XML processing modules are not secure against maliciously constructed data. An attacker can abuse XML features to carry out denial of service attacks, access local files, generate network connections to other machines, or circumvent firewalls.

The following table gives an overview of the known attacks and whether the various modules are vulnerable to them.

種類 sax etree minidom pulldom xmlrpc
billion laughs 脆弱 脆弱 脆弱 脆弱 脆弱
quadratic blowup 脆弱 脆弱 脆弱 脆弱 脆弱
external entity expansion 脆弱 安全 (1) 安全 (2) 脆弱 安全 (3)
DTD retrieval 脆弱 安全 安全 脆弱 安全
decompression bomb 安全 安全 安全 安全 脆弱
  1. xml.etree.ElementTree doesn’t expand external entities and raises a ParserError when an entity occurs.
  2. xml.dom.minidom は外部エンティティを展開せず、展開前のエンティティをそのまま返します。
  3. xmlrpclib は外部エンティティを展開せず、除外します。
billion laughs / exponential entity expansion
The Billion Laughs attack – also known as exponential entity expansion – uses multiple levels of nested entities. Each entity refers to another entity several times, and the final entity definition contains a small string. The exponential expansion results in several gigabytes of text and consumes lots of memory and CPU time.
quadratic blowup entity expansion
A quadratic blowup attack is similar to a Billion Laughs attack; it abuses entity expansion, too. Instead of nested entities it repeats one large entity with a couple of thousand chars over and over again. The attack isn’t as efficient as the exponential case but it avoids triggering parser countermeasures that forbid deeply-nested entities.
external entity expansion
Entity declarations can contain more than just text for replacement. They can also point to external resources or local files. The XML parser accesses the resource and embeds the content into the XML document.
DTD retrieval
Python の xml.dom.pulldom のような XML ライブラリは DTD をリモートやローカルの場所から読み込みます。 この機能には外部エンティティ展開の問題と同じことが予想されます。
decompression bomb
Decompression bombs (aka ZIP bomb) apply to all XML libraries that can parse compressed XML streams such as gzipped HTTP streams or LZMA-compressed files. For an attacker it can reduce the amount of transmitted data by three magnitudes or more.

The documentation for defusedxml on PyPI has further information about all known attack vectors with examples and references.

20.4.2. The defusedxml and defusedexpat Packages

defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. Use of this package is recommended for any server code that parses untrusted XML data. The package also ships with example exploits and extended documentation on more XML exploits such as XPath injection.

defusedexpat provides a modified libexpat and a patched pyexpat module that have countermeasures against entity expansion DoS attacks. The defusedexpat module still allows a sane and configurable amount of entity expansions. The modifications may be included in some future release of Python, but will not be included in any bugfix releases of Python because they break backward compatibility.