2. 自定义扩展类型:教程

Python 允许编写 C 扩展模块定义可以从 Python 代码中操纵的新类型,这很像内置的 strlist 类型。所有扩展类型的代码都遵循一个模式,但是在您开始之前,您需要了解一些细节。这份文件是对这个主题介绍。

2.1. 基础

CPython 运行时会将所有 Python 对象都视为 PyObject* 类型的变量,这是所有 Python 对象的“基础类型”。 PyObject 结构体本身只包含对象的 reference count 和指向对象的“类型对象”的指针。 这是动作所针对的目标。 类型对象决定解释器要调用哪些 (C) 函数,例如,在对象上查找一个属性,调用一个方法,或者与另一个对象相乘等。 这些 C 函数被称为“类型方法”。

所以,如果你想要定义新的扩展类型,需要创建新的类型对象。

这类事情只能用例子解释,这里用一个最小化但完整的的模块,定义了新的类型叫做 Custom 在C扩展模块 custom 里。

注解

这里展示的方法是定义 static 扩展类型的传统方法。可以适合大部分用途。C API也可以定义在堆上分配的扩展类型,使用 PyType_FromSpec() 函数,但不在本入门里讨论。

#define PY_SSIZE_T_CLEAN
#include <Python.h>

typedef struct {
    PyObject_HEAD
    /* Type-specific fields go here. */
} CustomObject;

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT,
    .tp_new = PyType_GenericNew,
};

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom(void)
{
    PyObject *m;
    if (PyType_Ready(&CustomType) < 0)
        return NULL;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

这部分很容易理解,这是为了跟上一章能对接上。这个文件定义了三件事:

  1. Custom 类的对象 object 包含了: CustomObject 结构,这会为每个 Custom 实例分配一次。

  2. Custom type 的行为:这是 CustomType 结构体,其定义了一堆标识和函数指针,会指向解释器里请求的操作。

  3. 初始化 custom 模块: PyInit_custom 函数和对应的 custommodule 结构体。

结构的第一块是

typedef struct {
    PyObject_HEAD
} CustomObject;

这就是一个自定义对象将会包含的内容。 PyObject_HEAD 是强制要求放在每个对象结构体之前并定义一个名为 ob_basePyObject 类型的字段,其中包含一个指向类型对象和引用计数的指针(这两者可以分别使用宏 Py_TYPEPy_REFCNT 来区分)。 使用宏的理由是将布局抽象出来并在 调试编译版中 中启用附加字段。

注解

注意在宏 PyObject_HEAD 后没有分号。意外添加分号会导致编译器提示出错。

当然,对象除了在 PyObject_HEAD 存储数据外,还有额外数据;例如,如下定义了标准的Python浮点数:

typedef struct {
    PyObject_HEAD
    double ob_fval;
} PyFloatObject;

第二个位是类型对象的定义:

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT,
    .tp_new = PyType_GenericNew,
};

注解

推荐使用如上C99风格的初始化,以避免列出所有的 PyTypeObject 字段,其中很多是你不需要关心的,这样也可以避免关注字段的定义顺序。

object.h 中实际定义的 PyTypeObject 具有比如上定义更多的 字段。 剩余的字段会由 C 编译器用零来填充,通常的做法是不显式地指定它们,除非你确实需要它们。

我们先挑选一部分,每次一个字段:

PyVarObject_HEAD_INIT(NULL, 0)

这一行是强制的样板,用以初始化如上提到的 ob_base 字段:

.tp_name = "custom.Custom",

我们的类型的名称。 这将出现在我们的对象的默认文本表示形式和某些错误消息中,例如:

>>> "" + custom.Custom()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "custom.Custom") to str

注意,名字是一个带点的名字,包括模块名称和模块中的类型名称。本例中的模块是 custom,类型是 Custom,所以我们将类型名设为 custom.Custom。使用真正的点状导入路径很重要,可以使你的类型与 pydocpickle 模块兼容。

.tp_basicsize = sizeof(CustomObject),
.tp_itemsize = 0,

This is so that Python knows how much memory to allocate when creating new Custom instances. tp_itemsize is only used for variable-sized objects and should otherwise be zero.

注解

If you want your type to be subclassable from Python, and your type has the same tp_basicsize as its base type, you may have problems with multiple inheritance. A Python subclass of your type will have to list your type first in its __bases__, or else it will not be able to call your type's __new__() method without getting an error. You can avoid this problem by ensuring that your type has a larger value for tp_basicsize than its base type does. Most of the time, this will be true anyway, because either your base type will be object, or else you will be adding data members to your base type, and therefore increasing its size.

我们将类旗标设为 Py_TPFLAGS_DEFAULT

.tp_flags = Py_TPFLAGS_DEFAULT,

所有类型都应当在它们的旗标中包括此常量。 该常量将启用至少在 Python 3.3 之前定义的全部成员。 如果你需要更多的成员,你将需要对相应的旗标进行 OR 运算。

我们为 tp_doc 类型提供一个文档字符串.

.tp_doc = PyDoc_STR("Custom objects"),

To enable object creation, we have to provide a tp_new handler. This is the equivalent of the Python method __new__(), but has to be specified explicitly. In this case, we can just use the default implementation provided by the API function PyType_GenericNew().

.tp_new = PyType_GenericNew,

Everything else in the file should be familiar, except for some code in PyInit_custom():

if (PyType_Ready(&CustomType) < 0)
    return;

This initializes the Custom type, filling in a number of members to the appropriate default values, including ob_type that we initially set to NULL.

Py_INCREF(&CustomType);
if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) {
    Py_DECREF(&CustomType);
    Py_DECREF(m);
    return NULL;
}

This adds the type to the module dictionary. This allows us to create Custom instances by calling the Custom class:

>>> import custom
>>> mycustom = custom.Custom()

好了! 接下来要做的就是编译它;将上述代码放到名为 custom.c 的文件中然后:

from distutils.core import setup, Extension
setup(name="custom", version="1.0",
      ext_modules=[Extension("custom", ["custom.c"])])

在名为 setup.py 的文件中;然后输入

$ python setup.py build

到 shell 中应当会在一个子目录中产生文件 custom.so;进入该目录并运行 Python --- 你应当能够执行 import custom 并尝试使用 Custom 对象。

这并不难,对吗?

当然,当前的自定义类型非常无趣。它没有数据,也不做任何事情。它甚至不能被子类化。

注解

While this documentation showcases the standard distutils module for building C extensions, it is recommended in real-world use cases to use the newer and better-maintained setuptools library. Documentation on how to do this is out of scope for this document and can be found in the Python Packaging User's Guide.

2.2. 向基本示例添加数据和方法

Let's extend the basic example to add some data and methods. Let's also make the type usable as a base class. We'll create a new module, custom2 that adds these capabilities:

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "structmember.h"

typedef struct {
    PyObject_HEAD
    PyObject *first; /* first name */
    PyObject *last;  /* last name */
    int number;
} CustomObject;

static void
Custom_dealloc(CustomObject *self)
{
    Py_XDECREF(self->first);
    Py_XDECREF(self->last);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

static PyObject *
Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    CustomObject *self;
    self = (CustomObject *) type->tp_alloc(type, 0);
    if (self != NULL) {
        self->first = PyUnicode_FromString("");
        if (self->first == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->last = PyUnicode_FromString("");
        if (self->last == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->number = 0;
    }
    return (PyObject *) self;
}

static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"first", "last", "number", NULL};
    PyObject *first = NULL, *last = NULL, *tmp;

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
                                     &first, &last,
                                     &self->number))
        return -1;

    if (first) {
        tmp = self->first;
        Py_INCREF(first);
        self->first = first;
        Py_XDECREF(tmp);
    }
    if (last) {
        tmp = self->last;
        Py_INCREF(last);
        self->last = last;
        Py_XDECREF(tmp);
    }
    return 0;
}

static PyMemberDef Custom_members[] = {
    {"first", T_OBJECT_EX, offsetof(CustomObject, first), 0,
     "first name"},
    {"last", T_OBJECT_EX, offsetof(CustomObject, last), 0,
     "last name"},
    {"number", T_INT, offsetof(CustomObject, number), 0,
     "custom number"},
    {NULL}  /* Sentinel */
};

static PyObject *
Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored))
{
    if (self->first == NULL) {
        PyErr_SetString(PyExc_AttributeError, "first");
        return NULL;
    }
    if (self->last == NULL) {
        PyErr_SetString(PyExc_AttributeError, "last");
        return NULL;
    }
    return PyUnicode_FromFormat("%S %S", self->first, self->last);
}

static PyMethodDef Custom_methods[] = {
    {"name", (PyCFunction) Custom_name, METH_NOARGS,
     "Return the name, combining the first and last name"
    },
    {NULL}  /* Sentinel */
};

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom2.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
    .tp_new = Custom_new,
    .tp_init = (initproc) Custom_init,
    .tp_dealloc = (destructor) Custom_dealloc,
    .tp_members = Custom_members,
    .tp_methods = Custom_methods,
};

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom2",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom2(void)
{
    PyObject *m;
    if (PyType_Ready(&CustomType) < 0)
        return NULL;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

该模块的新版本包含多处修改。

我们已经添加呢一个外部导入:

#include <structmember.h>

这包括提供被我们用来处理属性的声明,正如我们稍后所描述的。

The Custom type now has three data attributes in its C struct, first, last, and number. The first and last variables are Python strings containing first and last names. The number attribute is a C integer.

对象的结构将被相应地更新:

typedef struct {
    PyObject_HEAD
    PyObject *first; /* first name */
    PyObject *last;  /* last name */
    int number;
} CustomObject;

因为现在我们有数据需要管理,我们必须更加小心地处理对象的分配和释放。 至少,我们需要有一个释放方法:

static void
Custom_dealloc(CustomObject *self)
{
    Py_XDECREF(self->first);
    Py_XDECREF(self->last);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

它会被赋值给 tp_dealloc 成员:

.tp_dealloc = (destructor) Custom_dealloc,

This method first clears the reference counts of the two Python attributes. Py_XDECREF() correctly handles the case where its argument is NULL (which might happen here if tp_new failed midway). It then calls the tp_free member of the object's type (computed by Py_TYPE(self)) to free the object's memory. Note that the object's type might not be CustomType, because the object may be an instance of a subclass.

注解

上面需要强制转换 destructor 是因为我们定义了 Custom_dealloc 接受一个 CustomObject * 参数,但 tp_dealloc 函数指针预期接受一个 PyObject * 参数。 如果不这样做,编译器将发出警告。 这是 C 语言中面向对象的多态性!

我们希望确保头一个和末一个名称被初始化为空字符串,因此我们提供了一个 tp_new 实现:

static PyObject *
Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    CustomObject *self;
    self = (CustomObject *) type->tp_alloc(type, 0);
    if (self != NULL) {
        self->first = PyUnicode_FromString("");
        if (self->first == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->last = PyUnicode_FromString("");
        if (self->last == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->number = 0;
    }
    return (PyObject *) self;
}

并在 tp_new 成员中安装它:

.tp_new = Custom_new,

The tp_new handler is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the __new__() method. It is not required to define a tp_new member, and indeed many extension types will simply reuse PyType_GenericNew() as done in the first version of the Custom type above. In this case, we use the tp_new handler to initialize the first and last attributes to non-NULL default values.

tp_new 将接受被实例化的类型(不要求为 CustomType,如果被实例化的是一个子类)以及在该类型被调用时传入的任何参数,并预期返回所创建的实例。 tp_new 处理器总是接受位置和关键字参数,但它们总是会忽略这些参数,而将参数处理留给初始化(即 C 中的 tp_init 或 Python 中的 __init__ 函数)方法来执行。

注解

tp_new 不应显式地调用 tp_init,因为解释器会自行调用它。

tp_new 实现会调用 tp_alloc 槽位来分配内存:

self = (CustomObject *) type->tp_alloc(type, 0);

由于内存分配可能会失败,我们必须在继续执行之前检查 tp_alloc 结果确认其不为 NULL

注解

我们没有自行填充 tp_alloc 槽位。 而是由 PyType_Ready() 通过从我们的基类继承来替我们填充它,其中默认为 object。 大部分类型都是使用默认的分配策略。

注解

If you are creating a co-operative tp_new (one that calls a base type's tp_new or __new__()), you must not try to determine what method to call using method resolution order at runtime. Always statically determine what type you are going to call, and call its tp_new directly, or via type->tp_base->tp_new. If you do not do this, Python subclasses of your type that also inherit from other Python-defined classes may not work correctly. (Specifically, you may not be able to create instances of such subclasses without getting a TypeError.)

我们还定义了一个接受参数来为我们的实例提供初始值的初始化函数:

static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"first", "last", "number", NULL};
    PyObject *first = NULL, *last = NULL, *tmp;

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist,
                                     &first, &last,
                                     &self->number))
        return -1;

    if (first) {
        tmp = self->first;
        Py_INCREF(first);
        self->first = first;
        Py_XDECREF(tmp);
    }
    if (last) {
        tmp = self->last;
        Py_INCREF(last);
        self->last = last;
        Py_XDECREF(tmp);
    }
    return 0;
}

通过填充 tp_init 槽位。

.tp_init = (initproc) Custom_init,

The tp_init slot is exposed in Python as the __init__() method. It is used to initialize an object after it's created. Initializers always accept positional and keyword arguments, and they should return either 0 on success or -1 on error.

Unlike the tp_new handler, there is no guarantee that tp_init is called at all (for example, the pickle module by default doesn't call __init__() on unpickled instances). It can also be called multiple times. Anyone can call the __init__() method on our objects. For this reason, we have to be extra careful when assigning the new attribute values. We might be tempted, for example to assign the first member like this:

if (first) {
    Py_XDECREF(self->first);
    Py_INCREF(first);
    self->first = first;
}

但是这可能会有危险。 我们的类型没有限制 first 成员的类型,因此它可以是任何种类的对象。 它可以带有一个会执行尝试访问 first 成员的代码的析构器;或者该析构器可能会释放 全局解释器锁 并让任意代码在其他线程中运行来访问和修改我们的对象。

为了保持谨慎并使我们避免这种可能性,我们几乎总是要在减少成员的引用计数之前给它们重新赋值。 什么时候我们可以不必再这样做?

  • 当我们明确知道引用计数大于 1 的时候;

  • 当我们知道对象的释放 1 既不会释放 GIL 也不会导致任何对我们的类型的代码的回调的时候;

  • 当减少一个 tp_dealloc 处理器内不支持循环垃圾回收的类型的引用计数的时候 2.

我们可能会想将我们的实例变量暴露为属性。 有几种方式可以做到这一点。 最简单的方式是定义成员的定义:

static PyMemberDef Custom_members[] = {
    {"first", T_OBJECT_EX, offsetof(CustomObject, first), 0,
     "first name"},
    {"last", T_OBJECT_EX, offsetof(CustomObject, last), 0,
     "last name"},
    {"number", T_INT, offsetof(CustomObject, number), 0,
     "custom number"},
    {NULL}  /* Sentinel */
};

并将定义放置到 tp_members 槽位中:

.tp_members = Custom_members,

每个成员的定义都有成员名称、类型、偏移量、访问旗标和文档字符串。 请参阅下面的 泛型属性管理 小节来了解详情。section below for details.

此方式的缺点之一是它没有提供限制可被赋值给 Python 属性的对象类型的办法。 我们预期 first 和 last 的名称为字符串,但它们可以被赋值为任意 Python 对象。 此外,这些属性还可以被删除,并将 C 指针设为 NULL。 即使我们可以保证这些成员被初始化为非 NULL 值,如果这些属性被删除这些成员仍可被设为 NULL

We define a single method, Custom.name(), that outputs the objects name as the concatenation of the first and last names.

static PyObject *
Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored))
{
    if (self->first == NULL) {
        PyErr_SetString(PyExc_AttributeError, "first");
        return NULL;
    }
    if (self->last == NULL) {
        PyErr_SetString(PyExc_AttributeError, "last");
        return NULL;
    }
    return PyUnicode_FromFormat("%S %S", self->first, self->last);
}

The method is implemented as a C function that takes a Custom (or Custom subclass) instance as the first argument. Methods always take an instance as the first argument. Methods often take positional and keyword arguments as well, but in this case we don't take any and don't need to accept a positional argument tuple or keyword argument dictionary. This method is equivalent to the Python method:

def name(self):
    return "%s %s" % (self.first, self.last)

Note that we have to check for the possibility that our first and last members are NULL. This is because they can be deleted, in which case they are set to NULL. It would be better to prevent deletion of these attributes and to restrict the attribute values to be strings. We'll see how to do that in the next section.

现在我们已经定义好了方法,我们需要创建一个方法定义数组:

static PyMethodDef Custom_methods[] = {
    {"name", (PyCFunction) Custom_name, METH_NOARGS,
     "Return the name, combining the first and last name"
    },
    {NULL}  /* Sentinel */
};

(note that we used the METH_NOARGS flag to indicate that the method is expecting no arguments other than self)

并将其赋给 tp_methods 槽位:

.tp_methods = Custom_methods,

Finally, we'll make our type usable as a base class for subclassing. We've written our methods carefully so far so that they don't make any assumptions about the type of the object being created or used, so all we need to do is to add the Py_TPFLAGS_BASETYPE to our class flag definition:

.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,

We rename PyInit_custom() to PyInit_custom2(), update the module name in the PyModuleDef struct, and update the full class name in the PyTypeObject struct.

最后,我们更新我们的 setup.py 文件来生成新的模块。

from distutils.core import setup, Extension
setup(name="custom", version="1.0",
      ext_modules=[
         Extension("custom", ["custom.c"]),
         Extension("custom2", ["custom2.c"]),
         ])

2.3. 提供对于数据属性的更精细控制

In this section, we'll provide finer control over how the first and last attributes are set in the Custom example. In the previous version of our module, the instance variables first and last could be set to non-string values or even deleted. We want to make sure that these attributes always contain strings.

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "structmember.h"

typedef struct {
    PyObject_HEAD
    PyObject *first; /* first name */
    PyObject *last;  /* last name */
    int number;
} CustomObject;

static void
Custom_dealloc(CustomObject *self)
{
    Py_XDECREF(self->first);
    Py_XDECREF(self->last);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

static PyObject *
Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    CustomObject *self;
    self = (CustomObject *) type->tp_alloc(type, 0);
    if (self != NULL) {
        self->first = PyUnicode_FromString("");
        if (self->first == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->last = PyUnicode_FromString("");
        if (self->last == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->number = 0;
    }
    return (PyObject *) self;
}

static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"first", "last", "number", NULL};
    PyObject *first = NULL, *last = NULL, *tmp;

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist,
                                     &first, &last,
                                     &self->number))
        return -1;

    if (first) {
        tmp = self->first;
        Py_INCREF(first);
        self->first = first;
        Py_DECREF(tmp);
    }
    if (last) {
        tmp = self->last;
        Py_INCREF(last);
        self->last = last;
        Py_DECREF(tmp);
    }
    return 0;
}

static PyMemberDef Custom_members[] = {
    {"number", T_INT, offsetof(CustomObject, number), 0,
     "custom number"},
    {NULL}  /* Sentinel */
};

static PyObject *
Custom_getfirst(CustomObject *self, void *closure)
{
    Py_INCREF(self->first);
    return self->first;
}

static int
Custom_setfirst(CustomObject *self, PyObject *value, void *closure)
{
    PyObject *tmp;
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
        return -1;
    }
    if (!PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
                        "The first attribute value must be a string");
        return -1;
    }
    tmp = self->first;
    Py_INCREF(value);
    self->first = value;
    Py_DECREF(tmp);
    return 0;
}

static PyObject *
Custom_getlast(CustomObject *self, void *closure)
{
    Py_INCREF(self->last);
    return self->last;
}

static int
Custom_setlast(CustomObject *self, PyObject *value, void *closure)
{
    PyObject *tmp;
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute");
        return -1;
    }
    if (!PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
                        "The last attribute value must be a string");
        return -1;
    }
    tmp = self->last;
    Py_INCREF(value);
    self->last = value;
    Py_DECREF(tmp);
    return 0;
}

static PyGetSetDef Custom_getsetters[] = {
    {"first", (getter) Custom_getfirst, (setter) Custom_setfirst,
     "first name", NULL},
    {"last", (getter) Custom_getlast, (setter) Custom_setlast,
     "last name", NULL},
    {NULL}  /* Sentinel */
};

static PyObject *
Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored))
{
    return PyUnicode_FromFormat("%S %S", self->first, self->last);
}

static PyMethodDef Custom_methods[] = {
    {"name", (PyCFunction) Custom_name, METH_NOARGS,
     "Return the name, combining the first and last name"
    },
    {NULL}  /* Sentinel */
};

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom3.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
    .tp_new = Custom_new,
    .tp_init = (initproc) Custom_init,
    .tp_dealloc = (destructor) Custom_dealloc,
    .tp_members = Custom_members,
    .tp_methods = Custom_methods,
    .tp_getset = Custom_getsetters,
};

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom3",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom3(void)
{
    PyObject *m;
    if (PyType_Ready(&CustomType) < 0)
        return NULL;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

To provide greater control, over the first and last attributes, we'll use custom getter and setter functions. Here are the functions for getting and setting the first attribute:

static PyObject *
Custom_getfirst(CustomObject *self, void *closure)
{
    Py_INCREF(self->first);
    return self->first;
}

static int
Custom_setfirst(CustomObject *self, PyObject *value, void *closure)
{
    PyObject *tmp;
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
        return -1;
    }
    if (!PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
                        "The first attribute value must be a string");
        return -1;
    }
    tmp = self->first;
    Py_INCREF(value);
    self->first = value;
    Py_DECREF(tmp);
    return 0;
}

The getter function is passed a Custom object and a "closure", which is a void pointer. In this case, the closure is ignored. (The closure supports an advanced usage in which definition data is passed to the getter and setter. This could, for example, be used to allow a single set of getter and setter functions that decide the attribute to get or set based on data in the closure.)

The setter function is passed the Custom object, the new value, and the closure. The new value may be NULL, in which case the attribute is being deleted. In our setter, we raise an error if the attribute is deleted or if its new value is not a string.

我们创建一个 PyGetSetDef 结构体的数组:

static PyGetSetDef Custom_getsetters[] = {
    {"first", (getter) Custom_getfirst, (setter) Custom_setfirst,
     "first name", NULL},
    {"last", (getter) Custom_getlast, (setter) Custom_setlast,
     "last name", NULL},
    {NULL}  /* Sentinel */
};

并在 tp_getset 槽位中注册它:

.tp_getset = Custom_getsetters,

PyGetSetDef 结构体中的最后一项是上面提到的“闭包”。 在本例中,我们没有使用闭包,因此我们只传入 NULL

我们还移除了这些属性的成员定义:

static PyMemberDef Custom_members[] = {
    {"number", T_INT, offsetof(CustomObject, number), 0,
     "custom number"},
    {NULL}  /* Sentinel */
};

我们还需要将 tp_init 处理器更新为只允许传入字符串 3:

static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"first", "last", "number", NULL};
    PyObject *first = NULL, *last = NULL, *tmp;

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist,
                                     &first, &last,
                                     &self->number))
        return -1;

    if (first) {
        tmp = self->first;
        Py_INCREF(first);
        self->first = first;
        Py_DECREF(tmp);
    }
    if (last) {
        tmp = self->last;
        Py_INCREF(last);
        self->last = last;
        Py_DECREF(tmp);
    }
    return 0;
}

通过这些更改,我们能够确保 firstlast 成员一定不为 NULL 以便我们能在几乎所有情况下移除 NULL 值检查。 这意味着大部分 Py_XDECREF() 调用都可以被转换为 Py_DECREF() 调用。 我们不能更改这些调用的唯一场合是在 tp_dealloc 实现中,那里这些成员的初始化有可能在 tp_new 中失败。

我们还重命名了模块初始化函数和初始化函数中的模块名称,就像我们之前所做的一样,我们还向 setup.py 文件添加了一个额外的定义。

2.4. 支持循环垃圾回收

Python 具有一个可以标识不再需要的对象的 循环垃圾回收器 (GC) 即使它们的引用计数并不为零。 这种情况会在对象被循环引用时发生。 例如,设想:

>>> l = []
>>> l.append(l)
>>> del l

在这个例子中,我们创建了一个包含其自身的列表。 当我们删除它的时候,它将仍然具有一个来自其本身的引用。 它的引用计数并未降为零。 幸运的是,Python 的循环垃圾回收器将最终发现该列表是无用的垃圾并释放它。

In the second version of the Custom example, we allowed any kind of object to be stored in the first or last attributes 4. Besides, in the second and third versions, we allowed subclassing Custom, and subclasses may add arbitrary attributes. For any of those two reasons, Custom objects can participate in cycles:

>>> import custom3
>>> class Derived(custom3.Custom): pass
...
>>> n = Derived()
>>> n.some_attribute = n

To allow a Custom instance participating in a reference cycle to be properly detected and collected by the cyclic GC, our Custom type needs to fill two additional slots and to enable a flag that enables these slots:

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "structmember.h"

typedef struct {
    PyObject_HEAD
    PyObject *first; /* first name */
    PyObject *last;  /* last name */
    int number;
} CustomObject;

static int
Custom_traverse(CustomObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->first);
    Py_VISIT(self->last);
    return 0;
}

static int
Custom_clear(CustomObject *self)
{
    Py_CLEAR(self->first);
    Py_CLEAR(self->last);
    return 0;
}

static void
Custom_dealloc(CustomObject *self)
{
    PyObject_GC_UnTrack(self);
    Custom_clear(self);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

static PyObject *
Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    CustomObject *self;
    self = (CustomObject *) type->tp_alloc(type, 0);
    if (self != NULL) {
        self->first = PyUnicode_FromString("");
        if (self->first == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->last = PyUnicode_FromString("");
        if (self->last == NULL) {
            Py_DECREF(self);
            return NULL;
        }
        self->number = 0;
    }
    return (PyObject *) self;
}

static int
Custom_init(CustomObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"first", "last", "number", NULL};
    PyObject *first = NULL, *last = NULL, *tmp;

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist,
                                     &first, &last,
                                     &self->number))
        return -1;

    if (first) {
        tmp = self->first;
        Py_INCREF(first);
        self->first = first;
        Py_DECREF(tmp);
    }
    if (last) {
        tmp = self->last;
        Py_INCREF(last);
        self->last = last;
        Py_DECREF(tmp);
    }
    return 0;
}

static PyMemberDef Custom_members[] = {
    {"number", T_INT, offsetof(CustomObject, number), 0,
     "custom number"},
    {NULL}  /* Sentinel */
};

static PyObject *
Custom_getfirst(CustomObject *self, void *closure)
{
    Py_INCREF(self->first);
    return self->first;
}

static int
Custom_setfirst(CustomObject *self, PyObject *value, void *closure)
{
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute");
        return -1;
    }
    if (!PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
                        "The first attribute value must be a string");
        return -1;
    }
    Py_INCREF(value);
    Py_CLEAR(self->first);
    self->first = value;
    return 0;
}

static PyObject *
Custom_getlast(CustomObject *self, void *closure)
{
    Py_INCREF(self->last);
    return self->last;
}

static int
Custom_setlast(CustomObject *self, PyObject *value, void *closure)
{
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute");
        return -1;
    }
    if (!PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
                        "The last attribute value must be a string");
        return -1;
    }
    Py_INCREF(value);
    Py_CLEAR(self->last);
    self->last = value;
    return 0;
}

static PyGetSetDef Custom_getsetters[] = {
    {"first", (getter) Custom_getfirst, (setter) Custom_setfirst,
     "first name", NULL},
    {"last", (getter) Custom_getlast, (setter) Custom_setlast,
     "last name", NULL},
    {NULL}  /* Sentinel */
};

static PyObject *
Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored))
{
    return PyUnicode_FromFormat("%S %S", self->first, self->last);
}

static PyMethodDef Custom_methods[] = {
    {"name", (PyCFunction) Custom_name, METH_NOARGS,
     "Return the name, combining the first and last name"
    },
    {NULL}  /* Sentinel */
};

static PyTypeObject CustomType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "custom4.Custom",
    .tp_doc = PyDoc_STR("Custom objects"),
    .tp_basicsize = sizeof(CustomObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC,
    .tp_new = Custom_new,
    .tp_init = (initproc) Custom_init,
    .tp_dealloc = (destructor) Custom_dealloc,
    .tp_traverse = (traverseproc) Custom_traverse,
    .tp_clear = (inquiry) Custom_clear,
    .tp_members = Custom_members,
    .tp_methods = Custom_methods,
    .tp_getset = Custom_getsetters,
};

static PyModuleDef custommodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "custom4",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_custom4(void)
{
    PyObject *m;
    if (PyType_Ready(&CustomType) < 0)
        return NULL;

    m = PyModule_Create(&custommodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&CustomType);
    if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) {
        Py_DECREF(&CustomType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

首先,遍历方法让循环 GC 知道能够参加循环的子对象:

static int
Custom_traverse(CustomObject *self, visitproc visit, void *arg)
{
    int vret;
    if (self->first) {
        vret = visit(self->first, arg);
        if (vret != 0)
            return vret;
    }
    if (self->last) {
        vret = visit(self->last, arg);
        if (vret != 0)
            return vret;
    }
    return 0;
}

For each subobject that can participate in cycles, we need to call the visit() function, which is passed to the traversal method. The visit() function takes as arguments the subobject and the extra argument arg passed to the traversal method. It returns an integer value that must be returned if it is non-zero.

Python 提供了一个可自动调用 visit 函数的 Py_VISIT() 宏。 使用 Py_VISIT(),我们可以最小化 Custom_traverse 中的准备工作量:

static int
Custom_traverse(CustomObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->first);
    Py_VISIT(self->last);
    return 0;
}

注解

tp_traverse 实现必须将其参数准确命名为 visitarg 以便使用 Py_VISIT()

第二,我们需要提供一个方法用来清除任何可以参加循环的子对象:

static int
Custom_clear(CustomObject *self)
{
    Py_CLEAR(self->first);
    Py_CLEAR(self->last);
    return 0;
}

请注意 Py_CLEAR() 宏的使用。 它是清除任意类型的数据属性并减少其引用计数的推荐的且安全的方式。 如果你要选择在将属性设为 NULL 之间在属性上调用 Py_XDECREF(),则属性的析构器有可能会回调再次读取该属性的代码 (特别是 如果存在引用循环的话)。

注解

你可以通过以下写法来模拟 Py_CLEAR():

PyObject *tmp;
tmp = self->first;
self->first = NULL;
Py_XDECREF(tmp);

无论如何,在删除属性时始终使用Nevertheless, it is much easier and less error-prone to always use Py_CLEAR() 都是更简单且更不易出错的。 请不要尝试以健壮性为代价的微小优化!

释放器 Custom_dealloc 可能会在清除属性时调用任意代码。 这意味着循环 GC 可以在函数内部被触发。 由于 GC 预期引用计数不为零,我们需要通过调用 PyObject_GC_UnTrack() 来让 GC 停止追踪相关的对象。 下面是我们使用 PyObject_GC_UnTrack()Custom_clear 重新实现的释放器:

static void
Custom_dealloc(CustomObject *self)
{
    PyObject_GC_UnTrack(self);
    Custom_clear(self);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

Finally, we add the Py_TPFLAGS_HAVE_GC flag to the class flags:

.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC,

这样就差不多了。 如果我们编写了自定义的 tp_alloctp_free 处理器,则我们需要针对循环垃圾回收来修改它。 大多数扩展都将使用自动提供的版本。

2.5. 子类化其他类型

创建派生自现有类型的新类型是有可能的。 最容易的做法是从内置类型继承,因为扩展可以方便地使用它所需要的 PyTypeObject。 在不同扩展模块之间共享这些 PyTypeObject 结构体则是困难的。

In this example we will create a SubList type that inherits from the built-in list type. The new type will be completely compatible with regular lists, but will have an additional increment() method that increases an internal counter:

>>> import sublist
>>> s = sublist.SubList(range(3))
>>> s.extend(s)
>>> print(len(s))
6
>>> print(s.increment())
1
>>> print(s.increment())
2
#define PY_SSIZE_T_CLEAN
#include <Python.h>

typedef struct {
    PyListObject list;
    int state;
} SubListObject;

static PyObject *
SubList_increment(SubListObject *self, PyObject *unused)
{
    self->state++;
    return PyLong_FromLong(self->state);
}

static PyMethodDef SubList_methods[] = {
    {"increment", (PyCFunction) SubList_increment, METH_NOARGS,
     PyDoc_STR("increment state counter")},
    {NULL},
};

static int
SubList_init(SubListObject *self, PyObject *args, PyObject *kwds)
{
    if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0)
        return -1;
    self->state = 0;
    return 0;
}

static PyTypeObject SubListType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "sublist.SubList",
    .tp_doc = PyDoc_STR("SubList objects"),
    .tp_basicsize = sizeof(SubListObject),
    .tp_itemsize = 0,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
    .tp_init = (initproc) SubList_init,
    .tp_methods = SubList_methods,
};

static PyModuleDef sublistmodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "sublist",
    .m_doc = "Example module that creates an extension type.",
    .m_size = -1,
};

PyMODINIT_FUNC
PyInit_sublist(void)
{
    PyObject *m;
    SubListType.tp_base = &PyList_Type;
    if (PyType_Ready(&SubListType) < 0)
        return NULL;

    m = PyModule_Create(&sublistmodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&SubListType);
    if (PyModule_AddObject(m, "SubList", (PyObject *) &SubListType) < 0) {
        Py_DECREF(&SubListType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

As you can see, the source code closely resembles the Custom examples in previous sections. We will break down the main differences between them.

typedef struct {
    PyListObject list;
    int state;
} SubListObject;

派生类型对象的主要差异在于基类型的对象结构体必须是第一个值。 基类型将已经在其结构体的开头包括了 PyObject_HEAD()

When a Python object is a SubList instance, its PyObject * pointer can be safely cast to both PyListObject * and SubListObject *:

static int
SubList_init(SubListObject *self, PyObject *args, PyObject *kwds)
{
    if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0)
        return -1;
    self->state = 0;
    return 0;
}

We see above how to call through to the __init__ method of the base type.

这个模式在编写具有自定义 tp_newtp_dealloc 成员的类型时很重要。 tp_new 处理器不应为具有 tp_alloc 的对象实际分配内存,而是让基类通过调用自己的 tp_new 来处理它。

PyTypeObject 支持用 tp_base 指定类型的实体基类。 由于跨平台编译器问题,你无法使用对 PyList_Type 的引用来直接填充该字段;它应当随后在模块初始化函数中完成:

PyMODINIT_FUNC
PyInit_sublist(void)
{
    PyObject* m;
    SubListType.tp_base = &PyList_Type;
    if (PyType_Ready(&SubListType) < 0)
        return NULL;

    m = PyModule_Create(&sublistmodule);
    if (m == NULL)
        return NULL;

    Py_INCREF(&SubListType);
    if (PyModule_AddObject(m, "SubList", (PyObject *) &SubListType) < 0) {
        Py_DECREF(&SubListType);
        Py_DECREF(m);
        return NULL;
    }

    return m;
}

在调用 PyType_Ready() 之前,类型结构体必须已经填充 tp_base 槽位。 当我们从现有类型派生时,它不需要将 tp_alloc 槽位填充为 PyType_GenericNew() -- 来自基类型的分配函数将会被继承。

After that, calling PyType_Ready() and adding the type object to the module is the same as with the basic Custom examples.

备注

1

当我们知道该对象属于基本类型,如字符串或浮点数时情况就是如此。

2

在本示例中我们需要 tp_dealloc 处理器中的这一机制,因为我们的类型不支持垃圾回收。

3

现在我们知道 first 和 last 成员都是字符串,因此也许我们可以对减少它们的引用计数不必太过小心,但是,我们还接受字符串子类的实例。 即使释放普通字符串不会对我们的对象执行回调,我们也不能保证释放一个字符串子类的实例不会对我们的对象执行回调。

4

而且,即使是将我们的属性限制为字符串实例,用户还是可以传入任意 str 子类因而仍能造成引用循环。