定义扩展模块

A C extension for CPython is a shared library (for example, a .so file on Linux, .pyd DLL on Windows), which is loadable into the Python process (for example, it is compiled with compatible compiler settings), and which exports an export hook function (or an old-style initialization function).

要在默认情况下可被导入 (也就是说,通过 importlib.machinery.ExtensionFileLoader),共享库必须在 sys.path 中可用,并且必须命名为模块名之后加一个在 importlib.machinery.EXTENSION_SUFFIXES 中列出的扩展名。

备注

构建、打包和分发扩展模块最好使用第三方工具完成,并且超出了本文的范围。一个合适的工具是Setuptools,其文档可以在 https://setuptools.pypa.io/en/latest/setuptools.html 上找到。

Extension export hook

在 3.15.0a2 (unreleased) 版本加入: Support for the PyModExport_<name> export hook was added in Python 3.15. The older way of defining modules is still available: consult either the PyInit function section or earlier versions of this documentation if you plan to support earlier Python versions.

The export hook must be an exported function with the following signature:

PyModuleDef_Slot *PyModExport_modulename(void)

For modules with ASCII-only names, the export hook must be named PyModExport_<name>, with <name> replaced by the module's name.

For non-ASCII module names, the export hook must instead be named PyModExportU_<name> (note the U), with <name> encoded using Python's punycode encoding with hyphens replaced by underscores. In Python:

def hook_name(name):
    try:
        suffix = b'_' + name.encode('ascii')
    except UnicodeEncodeError:
        suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
    return b'PyModExport' + suffix

The export hook returns an array of PyModuleDef_Slot entries, terminated by an entry with a slot ID of 0. These slots describe how the module should be created and initialized.

This array must remain valid and constant until interpreter shutdown. Typically, it should use static storage. Prefer using the Py_mod_create and Py_mod_exec slots for any dynamic behavior.

The export hook may return NULL with an exception set to signal failure.

It is recommended to define the export hook function using a helper macro:

PyMODEXPORT_FUNC
属于 稳定 ABI 自 3.15 版起.

Declare an extension module export hook. This macro:

  • specifies the PyModuleDef_Slot* return type,

  • 添加平台所需的任何特殊链接声明,以及

  • 对于 C++,将函数声明为 extern "C"

例如,一个名为 spam 模块可以这样定义:

PyABIInfo_VAR(abi_info);

static PyModuleDef_Slot spam_slots[] = {
    {Py_mod_abi, &abi_info},
    {Py_mod_name, "spam"},
    {Py_mod_init, spam_init_function},
    ...
    {0, NULL},
};

PyMODEXPORT_FUNC
PyModExport_spam(void)
{
    return spam_slots;
}

The export hook is typically the only non-static item defined in the module's C source.

The hook should be kept short -- ideally, one line as above. If you do need to use Python C API in this function, it is recommended to call PyABIInfo_Check(&abi_info, "modulename") first to raise an exception, rather than crash, in common cases of ABI mismatch.

备注

It is possible to export multiple modules from a single shared library by defining multiple export hooks. However, importing them requires a custom importer or suitably named copies/links of the extension file, because Python's import machinery only finds the function corresponding to the filename. See the Multiple modules in one library section in PEP 489 for details.

多阶段初始化

The process of creating an extension module follows several phases:

  • Python finds and calls the export hook to get information on how to create the module.

  • Before any substantial code is executed, Python can determine which capabilities the module supports, and it can adjust the environment or refuse loading an incompatible extension. Slots like Py_mod_abi, Py_mod_gil and Py_mod_multiple_interpreters influence this step.

  • By default, Python itself then creates the module object -- that is, it does the equivalent of calling __new__() when creating an object. This step can be overridden using the Py_mod_create slot.

  • Python sets initial module attributes like __package__ and __loader__, and inserts the module object into sys.modules.

  • Afterwards, the module object is initialized in an extension-specific way -- the equivalent of __init__() when creating an object, or of executing top-level code in a Python-language module. The behavior is specified using the Py_mod_exec slot.

This is called multi-phase initialization to distinguish it from the legacy (but still supported) single-phase initialization, where an initialization function returns a fully constructed module.

在 3.5 版本发生变更: 增加了对多阶段初始化的支持 (PEP 489)。

多个模块实例

By default, extension modules are not singletons. For example, if the sys.modules entry is removed and the module is re-imported, a new module object is created and, typically, populated with fresh method and type objects. The old module is subject to normal garbage collection. This mirrors the behavior of pure-Python modules.

额外的模块实例可能会在 子解释器 中或者 Python 运行时重新初始化之后 (Py_Finalize()Py_Initialize()) 被创建。 在这些情况下,模块实例间共享 Python 对象可能导致程序崩溃或未定义的行为。

为避免这种问题,每个扩展模块的实例都应当是 隔离的: 对一个实例的修改不应隐式地影响其他的实例,以及模块所拥有的全部状态,包括对 Python 对象的引发,都应当是特定模块实例专属的。 请参阅 隔离扩展模块 了解更多的细节和实用的指南。

一个避免这些问题的简单方式是 针对重复的初始化引发一个错误

所有模块都应当支持 子解释器,否则就要显式地发出缺乏支持的信号 。 这往往是通过隔离或阻止重复的初始化,如上文所述。 一个模块也可以使用 Py_mod_multiple_interpreters 槽位将其限制于主解释器中。

PyInit function

自 3.15.0a2 (unreleased) 版本弃用: This functionality is soft deprecated. It will not get new features, but there are no plans to remove it.

Instead of PyModExport_modulename(), an extension module can define an older-style initialization function with the signature:

PyObject *PyInit_modulename(void)

Its name should be PyInit_<name>, with <name> replaced by the name of the module. For non-ASCII module names, use PyInitU_<name> instead, with <name> encoded in the same way as for the export hook (that is, using Punycode with underscores).

If a module exports both PyInit_<name> and PyModExport_<name>, the PyInit_<name> function is ignored.

Like with PyMODEXPORT_FUNC, it is recommended to define the initialization function using a helper macro:

PyMODINIT_FUNC

声明一个扩展模块初始化函数。 这个宏:

  • 指定了 PyObject* 返回类型,

  • 添加平台所需的任何特殊链接声明,以及

  • 对于 C++,将函数声明为 extern "C"

Normally, the initialization function (PyInit_modulename) returns a PyModuleDef instance with non-NULL m_slots. This allows Python to use multi-phase initialization.

Before it is returned, the PyModuleDef instance must be initialized using the following function:

PyObject *PyModuleDef_Init(PyModuleDef *def)
属于 稳定 ABI 自 3.5 版起.

确保模块定义是一个正确初始化的 Python 对象,并正确报告其类型以及引用计数。

返回强制转换为 PyObject*def,或者如果出生错误则返回 NULL

Calling this function is required before returning a PyModuleDef from a module initialization function. It should not be used in other contexts.

请注意 Python 会假定 PyModuleDef 结构体是静态分配的。 此函数可以返回一个新引用或借入引用;这个引用不可被释放。

在 3.5 版本加入.

例如,一个名为 spam 模块可以这样定义:

static struct PyModuleDef spam_module = {
    .m_base = PyModuleDef_HEAD_INIT,
    .m_name = "spam",
    ...
};

PyMODINIT_FUNC
PyInit_spam(void)
{
    return PyModuleDef_Init(&spam_module);
}

旧式的单阶段初始化

自 3.15.0a2 (unreleased) 版本弃用: Single-phase initialization is soft deprecated. It is a legacy mechanism to initialize extension modules, with known drawbacks and design flaws. Extension module authors are encouraged to use multi-phase initialization instead.

However, there are no plans to remove support for it.

In single-phase initialization, the old-style initializaton function (PyInit_modulename) should create, populate and return a module object. This is typically done using PyModule_Create() and functions like PyModule_AddObjectRef().

单阶段初始化与 默认方式 的主要区别如下:

  • 单阶段模块本质上是(更准确地说,包含)"单例对象"。

    当模块首次初始化时,Python会保存该模块 __dict__ 中的内容(通常包括模块的函数和类型等)。

    对于后续导入操作,Python不会再次调用初始化函数,而是创建一个带有新 __dict__ 的模块对象,并将已保存的内容复制到其中。例如,假设有一个单阶段模块 _testsinglephase [1] 定义了函数 sum 和异常类 error

    >>> import sys
    >>> import _testsinglephase as one
    >>> del sys.modules['_testsinglephase']
    >>> import _testsinglephase as two
    >>> one is two
    False
    >>> one.__dict__ is two.__dict__
    False
    >>> one.sum is two.sum
    True
    >>> one.error is two.error
    True
    

    该具体行为应被视为CPython的实现细节。

  • 为解决 PyInit_modulename 函数不接受 spec 参数的限制,导入机制会保存部分状态,并在 PyInit_modulename 调用期间将其应用于首个匹配的模块对象。具体表现为:当导入子模块时,该机制会将父包名称自动前置到模块名前。

    单阶段 PyInit_modulename 函数应当尽早创建"其所属"模块对象,该操作需在任何其他模块对象创建之前完成。

  • 不支持非ASCII模块命名格式(如 PyInitU_modulename)。

  • 单阶段模块支持模块查找函数如 PyState_FindModule()

  • The module's PyModuleDef.m_slots must be NULL.