Argument Clinic 的用法¶
- 作者
Larry Hastings
摘要
Argument Clinic 是 CPython 的一个 C 文件预处理器。旨在自动处理所有与“内置”参数解析有关的代码。本文展示了将 C 函数转换为配合 Argument Clinic 工作的做法,还介绍了一些关于 Argument Clinic 用法的进阶内容。
目前 Argument Clinic 视作仅供 CPython 内部使用。不支持在 CPython 之外的文件中使用,也不保证未来版本会向下兼容。换句话说:如果维护的是 CPython 的外部 C 语言扩展,欢迎在自己的代码中试用 Argument Clinic。但 Argument Clinic 与新版 CPython 中的版本 可能 完全不兼容,且会打乱全部代码。
Argument Clinic 的设计目标¶
Argument Clinic 的主要目标,是接管 CPython 中的所有参数解析代码。这意味着,如果要把某个函数转换为配合 Argument Clinic一起工作,则该函数不应再作任何参数解析工作——Argument Clinic 生成的代码应该是个“黑盒”,CPython 会在顶部发起调用,底部则调用自己的代码, PyObject *args
(也许还有 PyObject *kwargs
)会神奇地转换成所需的 C 变量和类型。
Argument Clinic 为了能完成主要目标,用起来必须方便。目前,使用 CPython 的参数解析库是一件苦差事,需要在很多地方维护冗余信息。如果使用 Argument Clinic,则不必再重复代码了。
显然,除非 Argument Clinic 解决了自身的问题,且没有产生新的问题,否则没有人会愿意用它。所以,Argument Clinic 最重要的事情就是生成正确的代码。如果能加速代码的运行当然更好,但至少不应引入明显的减速。(最终 Argument Clinic 应该 可以实现较大的速度提升——代码生成器可以重写一下,以产生量身定做的参数解析代码,而不是调用通用的 CPython 参数解析库。 这会让参数解析达到最佳速度!)
此外,Argument Clinic 必须足够灵活,能够与任何参数解析的方法一起工作。Python 有一些函数具备一些非常奇怪的解析行为;Argument Clinic 的目标是支持所有这些函数。
最后,Argument Clinic 的初衷是为 CPython 内置程序提供内省“签名”。以前如果传入一个内置函数,内省查询函数会抛出异常。有了 Argument Clinic,再不会发生这种问题了!
在与 Argument Clinic 合作时,应该牢记一个理念:给它的信息越多,它做得就会越好。诚然,Argument Clinic 现在还比较简单。但会演变得越来越复杂,应该能够利用给出的全部信息干很多聪明而有趣的事情。
基本概念和用法¶
Argument Clinic 与 CPython 一起提供,位于 Tools/clinic/clinic.py
。若要运行它,请指定一个 C 文件作为参数。
$ python3 Tools/clinic/clinic.py foo.c
Argument Clinic 会扫描 C 文件,精确查找以下代码:
/*[clinic input]
一旦找到一条后,就会读取所有内容,直至遇到以下代码:
[clinic start generated code]*/
这两行之间的所有内容都是 Argument Clinic 的输入。所有行,包括开始和结束的注释行,统称为 Argument Clinic “块”。
Argument Clinic 在解析某一块时,会生成输出信息。输出信息会紧跟着该块写入 C 文件中,后面还会跟着包含校验和的注释。现在 Argument Clinic 块看起来应如下所示:
/*[clinic input]
... clinic input goes here ...
[clinic start generated code]*/
... clinic output goes here ...
/*[clinic end generated code: checksum=...]*/
如果对同一文件第二次运行 Argument Clinic,则它会丢弃之前的输出信息,并写入带有新校验行的输出信息。不过如果输入没有变化,则输出也不会后变化。
不应去改动 Argument Clinic 块的输出部分。而应去修改输入,直到生成所需的输出信息。(这就是校验和的用途——检测是否有人改动了输出信息,因为在 Argument Clinic 下次写入新的输出时,这些改动都会丢失)。
为了清晰起见,下面列出了 Argument Clinic 用到的术语:
注释的第一行
/*[clinic input]
是 起始行 。注释(
[clinic start generated code]*/
)的最后一行是 结束行。最后一行(
/*[clinic end generated code: checksum=...]*/
)是 校验和行 。在起始行和结束行之间是 输入数据。
在结束行和校验和行之间是 输出数据 。
从开始行到校验和行的所有文本,都是 块。(Argument Clinic 尚未处理成功的块,没有输出或校验和行,但仍视作一个块)。
函数的转换¶
要想了解 Argument Clinic 是如何工作的,最好的方式就是转换一个函数与之合作。下面介绍需遵循的最基本步骤。请注意,若真的准备在 CPython 中进行检查,则应进行更深入的转换,使用一些本文后续会介绍到的高级概念(比如 “返回转换” 和 “自转换”)。但以下例子将维持简单,以供学习。
就此开始
请确保 CPython 是最新的已签出版本。
找到一个调用
PyArg_ParseTuple()
或PyArg_ParseTupleAndKeywords()
,且未被转换为采用 Argument Clinic 的 Python 内置程序。这里用了_pickle.Pickler.dump()
。如果对
PyArg_Parse
函数的调用采用了以下格式:O& O! es es# et et#
或者多次调用
PyArg_ParseTuple()
,则应再选一个函数。Argument Clinic 确实 支持上述这些状况。 但这些都是高阶内容——第一次使用就简单一些吧。此外,如果多次调用
PyArg_ParseTuple()
或PyArg_ParseTupleAndKeywords()
且同一参数需支持不同的类型,或者用到 PyArg_Parse 以外的函数来解析参数,则可能不适合转换为 Argument Clinic。 Argument Clinic 不支持通用函数或多态参数。在函数上方添加以下模板,创建块:
/*[clinic input] [clinic start generated code]*/
剪下文档字符串并粘贴到
[clinic]
行之间,去除所有的无用字符,使其成为一个正确引用的 C 字符串。最有应该只留下带有左侧缩进的文本,且行宽不大于 80 个字符。(参数 Clinic 将保留文档字符串中的缩进。)如果文档字符串的第一行看起来像是函数的签名,就把这一行去掉吧。((文档串不再需要用到它——将来对内置函数调用
help()
时,第一行将根据函数的签名自动建立。)示例:
/*[clinic input] Write a pickled representation of obj to the open file. [clinic start generated code]*/
如果文档字符串中没有“摘要”行,Argument Clinic 会报错。所以应确保带有摘要行。 “摘要”行应为在文档字符串开头的一个段落,由一个80列的单行构成。
(示例的文档字符串只包括一个摘要行,所以示例代码这一步不需改动)。
在文档字符串上方,输入函数的名称,后面是空行。这应是函数的 Python 名称,而且应是句点分隔的完整路径——以模块的名称开始,包含所有子模块名,若函数为类方法则还应包含类名。
示例:
/*[clinic input] _pickle.Pickler.dump Write a pickled representation of obj to the open file. [clinic start generated code]*/
如果是第一次在此 C 文件中用到 Argument Clinic 的模块或类,必须对其进行声明。清晰的 Argument Clinic 写法应于 C 文件顶部附近的某个单独块中声明这些,就像 include 文件和 statics 放在顶部一样。(在这里的示例代码中,将这两个块相邻给出。)
类和模块的名称应与暴露给 Python 的相同。请适时检查
PyModuleDef
或PyTypeObject
中定义的名称。在声明某个类时,还必须指定其 C 语言类型的两个部分:用于指向该类实例的指针的类型声明,和指向该类的
PyTypeObject
指针。示例:
/*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump Write a pickled representation of obj to the open file. [clinic start generated code]*/
声明函数的所有参数。每个参数都应另起一行。所有的参数行都应对齐函数名和文档字符串进行缩进。
这些参数行的常规形式如下:
name_of_parameter: converter
如果参数带有默认值,请加在转换器之后:
name_of_parameter: converter = default_value
Argument Clinic 对 “缺省值” 的支持方式相当复杂;更多信息请参见 关于缺省值的部分 。
在参数行下面添加一个空行。
What's a "converter"? It establishes both the type of the variable used in C, and the method to convert the Python value into a C value at runtime. For now you're going to use what's called a "legacy converter"—a convenience syntax intended to make porting old code into Argument Clinic easier.
每个参数都要从``PyArg_Parse()`` 格式参数中复制其 “格式单元”,并以带引号字符串的形式指定其转换器。(“格式单元”是
format
参数的1-3个字符的正式名称,用于让参数解析函数知晓该变量的类型及转换方法。关于格式单位的更多信息,请参阅 语句解释及变量编译 )。对于像
z#
这样的多字符格式单元,要使用2-3个字符组成的整个字符串。示例:
/*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' Write a pickled representation of obj to the open file. [clinic start generated code]*/
如果函数的格式字符串包含
|
,意味着有些参数带有缺省值,这可以忽略。Argument Clinic 根据参数是否有缺省值来推断哪些参数是可选的。如果函数的格式字符串中包含 $,意味着只接受关键字参数,请在第一个关键字参数之前单独给出一行
*
,缩进与参数行对齐。(
_pickle.Pickler.dump
两种格式字符串都没有,所以这里的示例不用改动。)如果 C 函数调用的是
PyArg_ParseTuple()
(而不是PyArg_ParseTupleAndKeywords()
),那么其所有参数均是仅限位置参数。若要在 Argument Clinic 中把所有参数都标记为只认位置,请在最后一个参数后面一行加入一个
/
,缩进程度与参数行对齐。目前这个标记是全体生效;要么所有参数都是只认位置,要么都不是。(以后 Argument Clinic 可能会放宽这一限制。)
示例:
/*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' / Write a pickled representation of obj to the open file. [clinic start generated code]*/
为每个参数都编写一个文档字符串,这很有意义。但这是可选项;可以跳过这一步。
下面介绍如何添加逐参数的文档字符串。逐参数文档字符串的第一行必须比参数定义多缩进一层。第一行的左边距即确定了所有逐参数文档字符串的左边距;所有文档字符串文本都要同等缩进。文本可以用多行编写。
示例:
/*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/
保存并关闭该文件,然后运行
Tools/clinic/clinic.py
。 运气好的话就万事大吉——程序块现在有了输出信息,并且生成了一个.c.h
文件!在文本编辑器中重新打开该文件,可以看到:/*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ static PyObject * _pickle_Pickler_dump(PicklerObject *self, PyObject *obj) /*[clinic end generated code: output=87ecad1261e02ac7 input=552eb1c0f52260d9]*/
显然,如果 Argument Clinic 未产生任何输出,那是因为在输入信息中发现了错误。继续修正错误并重试,直至 Argument Clinic 正确地处理好文件。
为了便于阅读,大部分“胶水”代码已写入
.c.h
文件中。需在原.c
文件中包含这个文件,通常是在 clinic 模块之后:#include "clinic/_pickle.c.h"
请仔细检查 Argument Clinic 生成的参数解析代码,是否与现有代码基本相同。
首先,确保两种代码使用相同的参数解析函数。已有代码必须调用
PyArg_ParseTuple()
或PyArg_ParseTupleAndKeywords()
;确保 Argument Clinic 生成的代码调用 完全 相同的函数。其次,传给
PyArg_ParseTuple()
或PyArg_ParseTupleAndKeywords()
的格式字符串应该 完全 与原有函数中的相同,直到冒号或分号为止。(Argument Clinic always generates its format strings with a
:
followed by the name of the function. If the existing code's format string ends with;
, to provide usage help, this change is harmless—don't worry about it.)Third, for parameters whose format units require two arguments (like a length variable, or an encoding string, or a pointer to a conversion function), ensure that the second argument is exactly the same between the two invocations.
Fourth, inside the output portion of the block you'll find a preprocessor macro defining the appropriate static
PyMethodDef
structure for this builtin:#define __PICKLE_PICKLER_DUMP_METHODDEF \ {"dump", (PyCFunction)__pickle_Pickler_dump, METH_O, __pickle_Pickler_dump__doc__},
This static structure should be exactly the same as the existing static
PyMethodDef
structure for this builtin.If any of these items differ in any way, adjust your Argument Clinic function specification and rerun
Tools/clinic/clinic.py
until they are the same.Notice that the last line of its output is the declaration of your "impl" function. This is where the builtin's implementation goes. Delete the existing prototype of the function you're modifying, but leave the opening curly brace. Now delete its argument parsing code and the declarations of all the variables it dumps the arguments into. Notice how the Python arguments are now arguments to this impl function; if the implementation used different names for these variables, fix it.
Let's reiterate, just because it's kind of weird. Your code should now look like this:
static return_type your_function_impl(...) /*[clinic end generated code: checksum=...]*/ { ...
Argument Clinic generated the checksum line and the function prototype just above it. You should write the opening (and closing) curly braces for the function, and the implementation inside.
示例:
/*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ PyDoc_STRVAR(__pickle_Pickler_dump__doc__, "Write a pickled representation of obj to the open file.\n" "\n" ... static PyObject * _pickle_Pickler_dump_impl(PicklerObject *self, PyObject *obj) /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ { /* Check whether the Pickler was initialized correctly (issue3664). Developers often forget to call __init__() in their subclasses, which would trigger a segfault without this check. */ if (self->write == NULL) { PyErr_Format(PicklingError, "Pickler.__init__() was not called by %s.__init__()", Py_TYPE(self)->tp_name); return NULL; } if (_Pickler_ClearBuffer(self) < 0) return NULL; ...
Remember the macro with the
PyMethodDef
structure for this function? Find the existingPyMethodDef
structure for this function and replace it with a reference to the macro. (If the builtin is at module scope, this will probably be very near the end of the file; if the builtin is a class method, this will probably be below but relatively near to the implementation.)Note that the body of the macro contains a trailing comma. So when you replace the existing static
PyMethodDef
structure with the macro, don't add a comma to the end.示例:
static struct PyMethodDef Pickler_methods[] = { __PICKLE_PICKLER_DUMP_METHODDEF __PICKLE_PICKLER_CLEAR_MEMO_METHODDEF {NULL, NULL} /* sentinel */ };
Compile, then run the relevant portions of the regression-test suite. This change should not introduce any new compile-time warnings or errors, and there should be no externally-visible change to Python's behavior.
Well, except for one difference:
inspect.signature()
run on your function should now provide a valid signature!Congratulations, you've ported your first function to work with Argument Clinic!
Advanced Topics¶
Now that you've had some experience working with Argument Clinic, it's time for some advanced topics.
Symbolic default values¶
The default value you provide for a parameter can't be any arbitrary expression. Currently the following are explicitly supported:
Numeric constants (integer and float)
字符串常量
True
,False
, andNone
Simple symbolic constants like
sys.maxsize
, which must start with the name of the module
In case you're curious, this is implemented in from_builtin()
in Lib/inspect.py
.
(In the future, this may need to get even more elaborate,
to allow full expressions like CONSTANT - 1
.)
Renaming the C functions and variables generated by Argument Clinic¶
Argument Clinic automatically names the functions it generates for you.
Occasionally this may cause a problem, if the generated name collides with
the name of an existing C function. There's an easy solution: override the names
used for the C functions. Just add the keyword "as"
to your function declaration line, followed by the function name you wish to use.
Argument Clinic will use that function name for the base (generated) function,
then add "_impl"
to the end and use that for the name of the impl function.
For example, if we wanted to rename the C function names generated for
pickle.Pickler.dump
, it'd look like this:
/*[clinic input]
pickle.Pickler.dump as pickler_dumper
...
The base function would now be named pickler_dumper()
,
and the impl function would now be named pickler_dumper_impl()
.
Similarly, you may have a problem where you want to give a parameter
a specific Python name, but that name may be inconvenient in C. Argument
Clinic allows you to give a parameter different names in Python and in C,
using the same "as"
syntax:
/*[clinic input]
pickle.Pickler.dump
obj: object
file as file_obj: object
protocol: object = NULL
*
fix_imports: bool = True
Here, the name used in Python (in the signature and the keywords
array) would be file
, but the C variable would be named file_obj
.
You can use this to rename the self
parameter too!
Converting functions using PyArg_UnpackTuple¶
To convert a function parsing its arguments with PyArg_UnpackTuple()
,
simply write out all the arguments, specifying each as an object
. You
may specify the type
argument to cast the type as appropriate. All
arguments should be marked positional-only (add a /
on a line by itself
after the last argument).
Currently the generated code will use PyArg_ParseTuple()
, but this
will change soon.
Optional Groups¶
Some legacy functions have a tricky approach to parsing their arguments:
they count the number of positional arguments, then use a switch
statement
to call one of several different PyArg_ParseTuple()
calls depending on
how many positional arguments there are. (These functions cannot accept
keyword-only arguments.) This approach was used to simulate optional
arguments back before PyArg_ParseTupleAndKeywords()
was created.
While functions using this approach can often be converted to
use PyArg_ParseTupleAndKeywords()
, optional arguments, and default values,
it's not always possible. Some of these legacy functions have
behaviors PyArg_ParseTupleAndKeywords()
doesn't directly support.
The most obvious example is the builtin function range()
, which has
an optional argument on the left side of its required argument!
Another example is curses.window.addch()
, which has a group of two
arguments that must always be specified together. (The arguments are
called x
and y
; if you call the function passing in x
,
you must also pass in y
—and if you don't pass in x
you may not
pass in y
either.)
In any case, the goal of Argument Clinic is to support argument parsing for all existing CPython builtins without changing their semantics. Therefore Argument Clinic supports this alternate approach to parsing, using what are called optional groups. Optional groups are groups of arguments that must all be passed in together. They can be to the left or the right of the required arguments. They can only be used with positional-only parameters.
注解
Optional groups are only intended for use when converting
functions that make multiple calls to PyArg_ParseTuple()
!
Functions that use any other approach for parsing arguments
should almost never be converted to Argument Clinic using
optional groups. Functions using optional groups currently
cannot have accurate signatures in Python, because Python just
doesn't understand the concept. Please avoid using optional
groups wherever possible.
To specify an optional group, add a [
on a line by itself before
the parameters you wish to group together, and a ]
on a line by itself
after these parameters. As an example, here's how curses.window.addch
uses optional groups to make the first two parameters and the last
parameter optional:
/*[clinic input]
curses.window.addch
[
x: int
X-coordinate.
y: int
Y-coordinate.
]
ch: object
Character to add.
[
attr: long
Attributes for the character.
]
/
...
注释:
For every optional group, one additional parameter will be passed into the impl function representing the group. The parameter will be an int named
group_{direction}_{number}
, where{direction}
is eitherright
orleft
depending on whether the group is before or after the required parameters, and{number}
is a monotonically increasing number (starting at 1) indicating how far away the group is from the required parameters. When the impl is called, this parameter will be set to zero if this group was unused, and set to non-zero if this group was used. (By used or unused, I mean whether or not the parameters received arguments in this invocation.)If there are no required arguments, the optional groups will behave as if they're to the right of the required arguments.
In the case of ambiguity, the argument parsing code favors parameters on the left (before the required parameters).
Optional groups can only contain positional-only parameters.
Optional groups are only intended for legacy code. Please do not use optional groups for new code.
Using real Argument Clinic converters, instead of "legacy converters"¶
To save time, and to minimize how much you need to learn to achieve your first port to Argument Clinic, the walkthrough above tells you to use "legacy converters". "Legacy converters" are a convenience, designed explicitly to make porting existing code to Argument Clinic easier. And to be clear, their use is acceptable when porting code for Python 3.4.
However, in the long term we probably want all our blocks to use Argument Clinic's real syntax for converters. Why? A couple reasons:
The proper converters are far easier to read and clearer in their intent.
There are some format units that are unsupported as "legacy converters", because they require arguments, and the legacy converter syntax doesn't support specifying arguments.
In the future we may have a new argument parsing library that isn't restricted to what
PyArg_ParseTuple()
supports; this flexibility won't be available to parameters using legacy converters.
Therefore, if you don't mind a little extra effort, please use the normal converters instead of legacy converters.
In a nutshell, the syntax for Argument Clinic (non-legacy) converters
looks like a Python function call. However, if there are no explicit
arguments to the function (all functions take their default values),
you may omit the parentheses. Thus bool
and bool()
are exactly
the same converters.
All arguments to Argument Clinic converters are keyword-only. All Argument Clinic converters accept the following arguments:
c_default
The default value for this parameter when defined in C. Specifically, this will be the initializer for the variable declared in the "parse function". See the section on default values for how to use this. Specified as a string.
annotation
The annotation value for this parameter. Not currently supported, because PEP 8 mandates that the Python library may not use annotations.
In addition, some converters accept additional arguments. Here is a list of these arguments, along with their meanings:
accept
A set of Python types (and possibly pseudo-types); this restricts the allowable Python argument to values of these types. (This is not a general-purpose facility; as a rule it only supports specific lists of types as shown in the legacy converter table.)
To accept
None
, addNoneType
to this set.bitwise
Only supported for unsigned integers. The native integer value of this Python argument will be written to the parameter without any range checking, even for negative values.
converter
Only supported by the
object
converter. Specifies the name of a C "converter function" to use to convert this object to a native type.encoding
Only supported for strings. Specifies the encoding to use when converting this string from a Python str (Unicode) value into a C
char *
value.subclass_of
Only supported for the
object
converter. Requires that the Python value be a subclass of a Python type, as expressed in C.type
Only supported for the
object
andself
converters. Specifies the C type that will be used to declare the variable. Default value is"PyObject *"
.zeroes
Only supported for strings. If true, embedded NUL bytes (
'\\0'
) are permitted inside the value. The length of the string will be passed in to the impl function, just after the string parameter, as a parameter named<parameter_name>_length
.
Please note, not every possible combination of arguments will work.
Usually these arguments are implemented by specific PyArg_ParseTuple
format units, with specific behavior. For example, currently you cannot
call unsigned_short
without also specifying bitwise=True
.
Although it's perfectly reasonable to think this would work, these semantics don't
map to any existing format unit. So Argument Clinic doesn't support it. (Or, at
least, not yet.)
下表显示了传统转换器映射到实参转换器的情况。左边是传统转换器,右边是要替换它的文本。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As an example, here's our sample pickle.Pickler.dump
using the proper
converter:
/*[clinic input]
pickle.Pickler.dump
obj: object
The object to be pickled.
/
Write a pickled representation of obj to the open file.
[clinic start generated code]*/
One advantage of real converters is that they're more flexible than legacy
converters. For example, the unsigned_int
converter (and all the
unsigned_
converters) can be specified without bitwise=True
. Their
default behavior performs range checking on the value, and they won't accept
negative numbers. You just can't do that with a legacy converter!
Argument Clinic will show you all the converters it has
available. For each converter it'll show you all the parameters
it accepts, along with the default value for each parameter.
Just run Tools/clinic/clinic.py --converters
to see the full list.
Py_buffer¶
When using the Py_buffer
converter
(or the 's*'
, 'w*'
, '*y'
, or 'z*'
legacy converters),
you must not call PyBuffer_Release()
on the provided buffer.
Argument Clinic generates code that does it for you (in the parsing function).
Advanced converters¶
Remember those format units you skipped for your first time because they were advanced? Here's how to handle those too.
The trick is, all those format units take arguments—either
conversion functions, or types, or strings specifying an encoding.
(But "legacy converters" don't support arguments. That's why we
skipped them for your first function.) The argument you specified
to the format unit is now an argument to the converter; this
argument is either converter
(for O&
), subclass_of
(for O!
),
or encoding
(for all the format units that start with e
).
When using subclass_of
, you may also want to use the other
custom argument for object()
: type
, which lets you set the type
actually used for the parameter. For example, if you want to ensure
that the object is a subclass of PyUnicode_Type
, you probably want
to use the converter object(type='PyUnicodeObject *', subclass_of='&PyUnicode_Type')
.
One possible problem with using Argument Clinic: it takes away some possible
flexibility for the format units starting with e
. When writing a
PyArg_Parse
call by hand, you could theoretically decide at runtime what
encoding string to pass in to PyArg_ParseTuple()
. But now this string must
be hard-coded at Argument-Clinic-preprocessing-time. This limitation is deliberate;
it made supporting this format unit much easier, and may allow for future optimizations.
This restriction doesn't seem unreasonable; CPython itself always passes in static
hard-coded encoding strings for parameters whose format units start with e
.
Parameter default values¶
Default values for parameters can be any of a number of values. At their simplest, they can be string, int, or float literals:
foo: str = "abc"
bar: int = 123
bat: float = 45.6
They can also use any of Python's built-in constants:
yep: bool = True
nope: bool = False
nada: object = None
There's also special support for a default value of NULL
, and
for simple expressions, documented in the following sections.
The NULL
default value¶
For string and object parameters, you can set them to None
to indicate
that there's no default. However, that means the C variable will be
initialized to Py_None
. For convenience's sakes, there's a special
value called NULL
for just this reason: from Python's perspective it
behaves like a default value of None
, but the C variable is initialized
with NULL
.
Expressions specified as default values¶
The default value for a parameter can be more than just a literal value. It can be an entire expression, using math operators and looking up attributes on objects. However, this support isn't exactly simple, because of some non-obvious semantics.
Consider the following example:
foo: Py_ssize_t = sys.maxsize - 1
sys.maxsize
can have different values on different platforms. Therefore
Argument Clinic can't simply evaluate that expression locally and hard-code it
in C. So it stores the default in such a way that it will get evaluated at
runtime, when the user asks for the function's signature.
What namespace is available when the expression is evaluated? It's evaluated
in the context of the module the builtin came from. So, if your module has an
attribute called "max_widgets
", you may simply use it:
foo: Py_ssize_t = max_widgets
If the symbol isn't found in the current module, it fails over to looking in
sys.modules
. That's how it can find sys.maxsize
for example. (Since you
don't know in advance what modules the user will load into their interpreter,
it's best to restrict yourself to modules that are preloaded by Python itself.)
Evaluating default values only at runtime means Argument Clinic can't compute
the correct equivalent C default value. So you need to tell it explicitly.
When you use an expression, you must also specify the equivalent expression
in C, using the c_default
parameter to the converter:
foo: Py_ssize_t(c_default="PY_SSIZE_T_MAX - 1") = sys.maxsize - 1
Another complication: Argument Clinic can't know in advance whether or not the expression you supply is valid. It parses it to make sure it looks legal, but it can't actually know. You must be very careful when using expressions to specify values that are guaranteed to be valid at runtime!
Finally, because expressions must be representable as static C values, there are many restrictions on legal expressions. Here's a list of Python features you're not permitted to use:
Function calls.
Inline if statements (
3 if foo else 5
).Automatic sequence unpacking (
*[1, 2, 3]
).List/set/dict comprehensions and generator expressions.
Tuple/list/set/dict literals.
Using a return converter¶
By default the impl function Argument Clinic generates for you returns PyObject *
.
But your C function often computes some C type, then converts it into the PyObject *
at the last moment. Argument Clinic handles converting your inputs from Python types
into native C types—why not have it convert your return value from a native C type
into a Python type too?
That's what a "return converter" does. It changes your impl function to return
some C type, then adds code to the generated (non-impl) function to handle converting
that value into the appropriate PyObject *
.
The syntax for return converters is similar to that of parameter converters. You specify the return converter like it was a return annotation on the function itself. Return converters behave much the same as parameter converters; they take arguments, the arguments are all keyword-only, and if you're not changing any of the default arguments you can omit the parentheses.
(If you use both "as"
and a return converter for your function,
the "as"
should come before the return converter.)
There's one additional complication when using return converters: how do you
indicate an error has occurred? Normally, a function returns a valid (non-NULL
)
pointer for success, and NULL
for failure. But if you use an integer return converter,
all integers are valid. How can Argument Clinic detect an error? Its solution: each return
converter implicitly looks for a special value that indicates an error. If you return
that value, and an error has been set (PyErr_Occurred()
returns a true
value), then the generated code will propagate the error. Otherwise it will
encode the value you return like normal.
Currently Argument Clinic supports only a few return converters:
bool
int
unsigned int
long
unsigned int
size_t
Py_ssize_t
float
double
DecodeFSDefault
None of these take parameters. For the first three, return -1 to indicate
error. For DecodeFSDefault
, the return type is const char *
; return a NULL
pointer to indicate an error.
(There's also an experimental NoneType
converter, which lets you
return Py_None
on success or NULL
on failure, without having
to increment the reference count on Py_None
. I'm not sure it adds
enough clarity to be worth using.)
To see all the return converters Argument Clinic supports, along with
their parameters (if any),
just run Tools/clinic/clinic.py --converters
for the full list.
Cloning existing functions¶
If you have a number of functions that look similar, you may be able to use Clinic's "clone" feature. When you clone an existing function, you reuse:
its parameters, including
their names,
their converters, with all parameters,
their default values,
their per-parameter docstrings,
their kind (whether they're positional only, positional or keyword, or keyword only), and
its return converter.
The only thing not copied from the original function is its docstring; the syntax allows you to specify a new docstring.
Here's the syntax for cloning a function:
/*[clinic input]
module.class.new_function [as c_basename] = module.class.existing_function
Docstring for new_function goes here.
[clinic start generated code]*/
(The functions can be in different modules or classes. I wrote
module.class
in the sample just to illustrate that you must
use the full path to both functions.)
Sorry, there's no syntax for partially-cloning a function, or cloning a function then modifying it. Cloning is an all-or nothing proposition.
Also, the function you are cloning from must have been previously defined in the current file.
Calling Python code¶
The rest of the advanced topics require you to write Python code which lives inside your C file and modifies Argument Clinic's runtime state. This is simple: you simply define a Python block.
A Python block uses different delimiter lines than an Argument Clinic function block. It looks like this:
/*[python input]
# python code goes here
[python start generated code]*/
All the code inside the Python block is executed at the time it's parsed. All text written to stdout inside the block is redirected into the "output" after the block.
As an example, here's a Python block that adds a static integer variable to the C code:
/*[python input]
print('static int __ignored_unused_variable__ = 0;')
[python start generated code]*/
static int __ignored_unused_variable__ = 0;
/*[python checksum:...]*/
Using a "self converter"¶
Argument Clinic automatically adds a "self" parameter for you
using a default converter. It automatically sets the type
of this parameter to the "pointer to an instance" you specified
when you declared the type. However, you can override
Argument Clinic's converter and specify one yourself.
Just add your own self
parameter as the first parameter in a
block, and ensure that its converter is an instance of
self_converter
or a subclass thereof.
What's the point? This lets you override the type of self
,
or give it a different default name.
How do you specify the custom type you want to cast self
to?
If you only have one or two functions with the same type for self
,
you can directly use Argument Clinic's existing self
converter,
passing in the type you want to use as the type
parameter:
/*[clinic input]
_pickle.Pickler.dump
self: self(type="PicklerObject *")
obj: object
/
Write a pickled representation of the given object to the open file.
[clinic start generated code]*/
On the other hand, if you have a lot of functions that will use the same
type for self
, it's best to create your own converter, subclassing
self_converter
but overwriting the type
member:
/*[python input]
class PicklerObject_converter(self_converter):
type = "PicklerObject *"
[python start generated code]*/
/*[clinic input]
_pickle.Pickler.dump
self: PicklerObject
obj: object
/
Write a pickled representation of the given object to the open file.
[clinic start generated code]*/
Writing a custom converter¶
As we hinted at in the previous section... you can write your own converters!
A converter is simply a Python class that inherits from CConverter
.
The main purpose of a custom converter is if you have a parameter using
the O&
format unit—parsing this parameter means calling
a PyArg_ParseTuple()
"converter function".
Your converter class should be named *something*_converter
.
If the name follows this convention, then your converter class
will be automatically registered with Argument Clinic; its name
will be the name of your class with the _converter
suffix
stripped off. (This is accomplished with a metaclass.)
You shouldn't subclass CConverter.__init__
. Instead, you should
write a converter_init()
function. converter_init()
always accepts a self
parameter; after that, all additional
parameters must be keyword-only. Any arguments passed in to
the converter in Argument Clinic will be passed along to your
converter_init()
.
There are some additional members of CConverter
you may wish
to specify in your subclass. Here's the current list:
type
The C type to use for this variable.
type
should be a Python string specifying the type, e.g.int
. If this is a pointer type, the type string should end with' *'
.default
The Python default value for this parameter, as a Python value. Or the magic value
unspecified
if there is no default.py_default
default
as it should appear in Python code, as a string. OrNone
if there is no default.c_default
default
as it should appear in C code, as a string. OrNone
if there is no default.c_ignored_default
The default value used to initialize the C variable when there is no default, but not specifying a default may result in an "uninitialized variable" warning. This can easily happen when using option groups—although properly-written code will never actually use this value, the variable does get passed in to the impl, and the C compiler will complain about the "use" of the uninitialized value. This value should always be a non-empty string.
converter
The name of the C converter function, as a string.
impl_by_reference
A boolean value. If true, Argument Clinic will add a
&
in front of the name of the variable when passing it into the impl function.parse_by_reference
A boolean value. If true, Argument Clinic will add a
&
in front of the name of the variable when passing it intoPyArg_ParseTuple()
.
Here's the simplest example of a custom converter, from Modules/zlibmodule.c
:
/*[python input]
class ssize_t_converter(CConverter):
type = 'Py_ssize_t'
converter = 'ssize_t_converter'
[python start generated code]*/
/*[python end generated code: output=da39a3ee5e6b4b0d input=35521e4e733823c7]*/
This block adds a converter to Argument Clinic named ssize_t
. Parameters
declared as ssize_t
will be declared as type Py_ssize_t
, and will
be parsed by the 'O&'
format unit, which will call the
ssize_t_converter
converter function. ssize_t
variables
automatically support default values.
More sophisticated custom converters can insert custom C code to
handle initialization and cleanup.
You can see more examples of custom converters in the CPython
source tree; grep the C files for the string CConverter
.
Writing a custom return converter¶
Writing a custom return converter is much like writing a custom converter. Except it's somewhat simpler, because return converters are themselves much simpler.
Return converters must subclass CReturnConverter
.
There are no examples yet of custom return converters,
because they are not widely used yet. If you wish to
write your own return converter, please read Tools/clinic/clinic.py
,
specifically the implementation of CReturnConverter
and
all its subclasses.
METH_O and METH_NOARGS¶
To convert a function using METH_O
, make sure the function's
single argument is using the object
converter, and mark the
arguments as positional-only:
/*[clinic input]
meth_o_sample
argument: object
/
[clinic start generated code]*/
To convert a function using METH_NOARGS
, just don't specify
any arguments.
You can still use a self converter, a return converter, and specify
a type
argument to the object converter for METH_O
.
tp_new and tp_init functions¶
You can convert tp_new
and tp_init
functions. Just name
them __new__
or __init__
as appropriate. Notes:
The function name generated for
__new__
doesn't end in__new__
like it would by default. It's just the name of the class, converted into a valid C identifier.No
PyMethodDef
#define
is generated for these functions.__init__
functions returnint
, notPyObject *
.Use the docstring as the class docstring.
Although
__new__
and__init__
functions must always accept both theargs
andkwargs
objects, when converting you may specify any signature for these functions that you like. (If your function doesn't support keywords, the parsing function generated will throw an exception if it receives any.)
Changing and redirecting Clinic's output¶
It can be inconvenient to have Clinic's output interspersed with your conventional hand-edited C code. Luckily, Clinic is configurable: you can buffer up its output for printing later (or earlier!), or write its output to a separate file. You can also add a prefix or suffix to every line of Clinic's generated output.
While changing Clinic's output in this manner can be a boon to readability, it may result in Clinic code using types before they are defined, or your code attempting to use Clinic-generated code before it is defined. These problems can be easily solved by rearranging the declarations in your file, or moving where Clinic's generated code goes. (This is why the default behavior of Clinic is to output everything into the current block; while many people consider this hampers readability, it will never require rearranging your code to fix definition-before-use problems.)
Let's start with defining some terminology:
- field
A field, in this context, is a subsection of Clinic's output. For example, the
#define
for thePyMethodDef
structure is a field, calledmethoddef_define
. Clinic has seven different fields it can output per function definition:docstring_prototype docstring_definition methoddef_define impl_prototype parser_prototype parser_definition impl_definition
All the names are of the form
"<a>_<b>"
, where"<a>"
is the semantic object represented (the parsing function, the impl function, the docstring, or the methoddef structure) and"<b>"
represents what kind of statement the field is. Field names that end in"_prototype"
represent forward declarations of that thing, without the actual body/data of the thing; field names that end in"_definition"
represent the actual definition of the thing, with the body/data of the thing. ("methoddef"
is special, it's the only one that ends with"_define"
, representing that it's a preprocessor #define.)- destination
A destination is a place Clinic can write output to. There are five built-in destinations:
block
The default destination: printed in the output section of the current Clinic block.
buffer
A text buffer where you can save text for later. Text sent here is appended to the end of any existing text. It's an error to have any text left in the buffer when Clinic finishes processing a file.
file
A separate "clinic file" that will be created automatically by Clinic. The filename chosen for the file is
{basename}.clinic{extension}
, wherebasename
andextension
were assigned the output fromos.path.splitext()
run on the current file. (Example: thefile
destination for_pickle.c
would be written to_pickle.clinic.c
.)Important: When using a
file
destination, you must check in the generated file!two-pass
A buffer like
buffer
. However, a two-pass buffer can only be dumped once, and it prints out all text sent to it during all processing, even from Clinic blocks after the dumping point.suppress
The text is suppressed—thrown away.
Clinic defines five new directives that let you reconfigure its output.
The first new directive is dump
:
dump <destination>
This dumps the current contents of the named destination into the output of
the current block, and empties it. This only works with buffer
and
two-pass
destinations.
The second new directive is output
. The most basic form of output
is like this:
output <field> <destination>
This tells Clinic to output field to destination. output
also
supports a special meta-destination, called everything
, which tells
Clinic to output all fields to that destination.
output
has a number of other functions:
output push
output pop
output preset <preset>
output push
and output pop
allow you to push and pop
configurations on an internal configuration stack, so that you
can temporarily modify the output configuration, then easily restore
the previous configuration. Simply push before your change to save
the current configuration, then pop when you wish to restore the
previous configuration.
output preset
sets Clinic's output to one of several built-in
preset configurations, as follows:
block
Clinic's original starting configuration. Writes everything immediately after the input block.
Suppress the
parser_prototype
anddocstring_prototype
, write everything else toblock
.file
Designed to write everything to the "clinic file" that it can. You then
#include
this file near the top of your file. You may need to rearrange your file to make this work, though usually this just means creating forward declarations for varioustypedef
andPyTypeObject
definitions.Suppress the
parser_prototype
anddocstring_prototype
, write theimpl_definition
toblock
, and write everything else tofile
.The default filename is
"{dirname}/clinic/{basename}.h"
.buffer
Save up most of the output from Clinic, to be written into your file near the end. For Python files implementing modules or builtin types, it's recommended that you dump the buffer just above the static structures for your module or builtin type; these are normally very near the end. Using
buffer
may require even more editing thanfile
, if your file has staticPyMethodDef
arrays defined in the middle of the file.Suppress the
parser_prototype
,impl_prototype
, anddocstring_prototype
, write theimpl_definition
toblock
, and write everything else tofile
.two-pass
Similar to the
buffer
preset, but writes forward declarations to thetwo-pass
buffer, and definitions to thebuffer
. This is similar to thebuffer
preset, but may require less editing thanbuffer
. Dump thetwo-pass
buffer near the top of your file, and dump thebuffer
near the end just like you would when using thebuffer
preset.Suppresses the
impl_prototype
, write theimpl_definition
toblock
, writedocstring_prototype
,methoddef_define
, andparser_prototype
totwo-pass
, write everything else tobuffer
.partial-buffer
Similar to the
buffer
preset, but writes more things toblock
, only writing the really big chunks of generated code tobuffer
. This avoids the definition-before-use problem ofbuffer
completely, at the small cost of having slightly more stuff in the block's output. Dump thebuffer
near the end, just like you would when using thebuffer
preset.Suppresses the
impl_prototype
, write thedocstring_definition
andparser_definition
tobuffer
, write everything else toblock
.
The third new directive is destination
:
destination <name> <command> [...]
This performs an operation on the destination named name
.
There are two defined subcommands: new
and clear
.
The new
subcommand works like this:
destination <name> new <type>
This creates a new destination with name <name>
and type <type>
.
There are five destination types:
suppress
Throws the text away.
block
Writes the text to the current block. This is what Clinic originally did.
buffer
A simple text buffer, like the "buffer" builtin destination above.
file
A text file. The file destination takes an extra argument, a template to use for building the filename, like so:
destination <name> new <type> <file_template>
The template can use three strings internally that will be replaced by bits of the filename:
- {path}
The full path to the file, including directory and full filename.
- {dirname}
The name of the directory the file is in.
- {basename}
Just the name of the file, not including the directory.
- {basename_root}
Basename with the extension clipped off (everything up to but not including the last '.').
- {basename_extension}
The last '.' and everything after it. If the basename does not contain a period, this will be the empty string.
If there are no periods in the filename, {basename} and {filename} are the same, and {extension} is empty. "{basename}{extension}" is always exactly the same as "{filename}"."
two-pass
A two-pass buffer, like the "two-pass" builtin destination above.
The clear
subcommand works like this:
destination <name> clear
It removes all the accumulated text up to this point in the destination. (I don't know what you'd need this for, but I thought maybe it'd be useful while someone's experimenting.)
The fourth new directive is set
:
set line_prefix "string"
set line_suffix "string"
set
lets you set two internal variables in Clinic.
line_prefix
is a string that will be prepended to every line of Clinic's output;
line_suffix
is a string that will be appended to every line of Clinic's output.
Both of these support two format strings:
{block comment start}
Turns into the string
/*
, the start-comment text sequence for C files.{block comment end}
Turns into the string
*/
, the end-comment text sequence for C files.
The final new directive is one you shouldn't need to use directly,
called preserve
:
preserve
This tells Clinic that the current contents of the output should be kept, unmodified.
This is used internally by Clinic when dumping output into file
files; wrapping
it in a Clinic block lets Clinic use its existing checksum functionality to ensure
the file was not modified by hand before it gets overwritten.
The #ifdef trick¶
If you're converting a function that isn't available on all platforms, there's a trick you can use to make life a little easier. The existing code probably looks like this:
#ifdef HAVE_FUNCTIONNAME
static module_functionname(...)
{
...
}
#endif /* HAVE_FUNCTIONNAME */
And then in the PyMethodDef
structure at the bottom the existing code
will have:
#ifdef HAVE_FUNCTIONNAME
{'functionname', ... },
#endif /* HAVE_FUNCTIONNAME */
In this scenario, you should enclose the body of your impl function inside the #ifdef
,
like so:
#ifdef HAVE_FUNCTIONNAME
/*[clinic input]
module.functionname
...
[clinic start generated code]*/
static module_functionname(...)
{
...
}
#endif /* HAVE_FUNCTIONNAME */
Then, remove those three lines from the PyMethodDef
structure,
replacing them with the macro Argument Clinic generated:
MODULE_FUNCTIONNAME_METHODDEF
(You can find the real name for this macro inside the generated code.
Or you can calculate it yourself: it's the name of your function as defined
on the first line of your block, but with periods changed to underscores,
uppercased, and "_METHODDEF"
added to the end.)
Perhaps you're wondering: what if HAVE_FUNCTIONNAME
isn't defined?
The MODULE_FUNCTIONNAME_METHODDEF
macro won't be defined either!
Here's where Argument Clinic gets very clever. It actually detects that the
Argument Clinic block might be deactivated by the #ifdef
. When that
happens, it generates a little extra code that looks like this:
#ifndef MODULE_FUNCTIONNAME_METHODDEF
#define MODULE_FUNCTIONNAME_METHODDEF
#endif /* !defined(MODULE_FUNCTIONNAME_METHODDEF) */
That means the macro always works. If the function is defined, this turns into the correct structure, including the trailing comma. If the function is undefined, this turns into nothing.
However, this causes one ticklish problem: where should Argument Clinic put this
extra code when using the "block" output preset? It can't go in the output block,
because that could be deactivated by the #ifdef
. (That's the whole point!)
In this situation, Argument Clinic writes the extra code to the "buffer" destination. This may mean that you get a complaint from Argument Clinic:
Warning in file "Modules/posixmodule.c" on line 12357:
Destination buffer 'buffer' not empty at end of file, emptying.
When this happens, just open your file, find the dump buffer
block that
Argument Clinic added to your file (it'll be at the very bottom), then
move it above the PyMethodDef
structure where that macro is used.
Using Argument Clinic in Python files¶
It's actually possible to use Argument Clinic to preprocess Python files. There's no point to using Argument Clinic blocks, of course, as the output wouldn't make any sense to the Python interpreter. But using Argument Clinic to run Python blocks lets you use Python as a Python preprocessor!
Since Python comments are different from C comments, Argument Clinic blocks embedded in Python files look slightly different. They look like this:
#/*[python input]
#print("def foo(): pass")
#[python start generated code]*/
def foo(): pass
#/*[python checksum:...]*/