19. 结构化标记处理工具¶
Python 支持各种模块,以处理各种形式的结构化数据标记。 这包括使用标准通用标记语言(SGML)和超文本标记语言(HTML)的模块,以及使用可扩展标记语言(XML)的几个接口。
It is important to note that modules in the xml
package require that
there be at least one SAX-compliant XML parser available. Starting with Python
2.3, the Expat parser is included with Python, so the xml.parsers.expat
module will always be available. You may still want to be aware of the PyXML
add-on package; that package provides an
extended set of XML libraries for Python.
The documentation for the xml.dom
and xml.sax
packages are the
definition of the Python bindings for the DOM and SAX interfaces.
- 19.1.
HTMLParser
— Simple HTML and XHTML parser - 19.2.
sgmllib
— Simple SGML parser - 19.3.
htmllib
— A parser for HTML documents - 19.4.
htmlentitydefs
— Definitions of HTML general entities - 19.5. XML处理模块
- 19.6. XML 漏洞
- 19.7.
xml.etree.ElementTree
— ElementTree XML API - 19.8.
xml.dom
— The Document Object Model API - 19.9.
xml.dom.minidom
— Minimal DOM implementation - 19.10.
xml.dom.pulldom
— Support for building partial DOM trees - 19.11.
xml.sax
— Support for SAX2 parsers - 19.12.
xml.sax.handler
— Base classes for SAX handlers - 19.13.
xml.sax.saxutils
— SAX 工具集 - 19.14.
xml.sax.xmlreader
— Interface for XML parsers - 19.15.
xml.parsers.expat
— Fast XML parsing using Expat