481 lines
19 KiB
Plaintext
481 lines
19 KiB
Plaintext
|
PEP: 436
|
|||
|
Title: The Argument Clinic DSL
|
|||
|
Version: $Revision$
|
|||
|
Last-Modified: $Date$
|
|||
|
Author: Larry Hastings <larry@hastings.org>
|
|||
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Content-Type: text/x-rst
|
|||
|
Created: 22-Feb-2013
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
========
|
|||
|
|
|||
|
This document proposes "Argument Clinic", a DSL designed to facilitate
|
|||
|
argument processing for built-in functions in the implementation of
|
|||
|
CPython.
|
|||
|
|
|||
|
|
|||
|
Rationale and Goals
|
|||
|
===================
|
|||
|
|
|||
|
The primary implementation of Python, "CPython", is written in a
|
|||
|
mixture of Python and C. One of the implementation details of CPython
|
|||
|
is what are called "built-in" functions -- functions available to
|
|||
|
Python programs but written in C. When a Python program calls a
|
|||
|
built-in function and passes in arguments, those arguments must be
|
|||
|
translated from Python values into C values. This process is called
|
|||
|
"parsing arguments".
|
|||
|
|
|||
|
As of CPython 3.3, arguments to functions are primarily parsed with
|
|||
|
one of two functions: the original ``PyArg_ParseTuple()``, [1]_ and
|
|||
|
the more modern ``PyArg_ParseTupleAndKeywords()``. [2]_ The former
|
|||
|
function only handles positional parameters; the latter also
|
|||
|
accommodates keyword and keyword-only parameters, and is preferred for
|
|||
|
new code.
|
|||
|
|
|||
|
``PyArg_ParseTuple()`` was a reasonable approach when it was first
|
|||
|
conceived. The programmer specified the translation for the arguments
|
|||
|
in a "format string": [3]_ each parameter matched to a "format unit",
|
|||
|
a one-or-two character sequence telling ``PyArg_ParseTuple()`` what
|
|||
|
Python types to accept and how to translate them into the appropriate
|
|||
|
C value for that parameter. There were only a dozen or so of these
|
|||
|
"format units", and each one was distinct and easy to understand.
|
|||
|
|
|||
|
Over the years the ``PyArg_Parse`` interface has been extended in
|
|||
|
numerous ways. The modern API is quite complex, to the point that it
|
|||
|
is somewhat painful to use. Consider:
|
|||
|
|
|||
|
* There are now forty different "format units"; a few are even three
|
|||
|
characters long. This makes it difficult to understand what the
|
|||
|
format string says without constantly cross-indexing it with the
|
|||
|
documentation.
|
|||
|
* There are also six meta-format units that may be buried in the
|
|||
|
format string. (They are: ``"()|$:;"``.)
|
|||
|
* The more format units are added, the less likely it is the
|
|||
|
implementer can pick an easy-to-use mnemonic for the format unit,
|
|||
|
because the character of choice is probably already in use. In
|
|||
|
other words, the more format units we have, the more obtuse the
|
|||
|
format units become.
|
|||
|
* Several format units are nearly identical to others, having only
|
|||
|
subtle differences. This makes understanding the exact semantics
|
|||
|
of the format string even harder.
|
|||
|
* The docstring is specified as a static C string, which is mildly
|
|||
|
bothersome to read and edit.
|
|||
|
* When adding a new parameter to a function using
|
|||
|
``PyArg_ParseTupleAndKeywords()``, it's necessary to touch six
|
|||
|
different places in the code: [4]_
|
|||
|
|
|||
|
* Declaring the variable to store the argument.
|
|||
|
* Passing in a pointer to that variable in the correct spot in
|
|||
|
``PyArg_ParseTupleAndKeywords()``, also passing in any
|
|||
|
"length" or "converter" arguments in the correct order.
|
|||
|
* Adding the name of the argument in the correct spot of the
|
|||
|
"keywords" array passed in to
|
|||
|
``PyArg_ParseTupleAndKeywords()``.
|
|||
|
* Adding the format unit to the correct spot in the format
|
|||
|
string.
|
|||
|
* Adding the parameter to the prototype in the docstring.
|
|||
|
* Documenting the parameter in the docstring.
|
|||
|
|
|||
|
* There is currently no mechanism for builtin functions to provide
|
|||
|
their "signature" information (see ``inspect.getfullargspec`` and
|
|||
|
``inspect.Signature``). Adding this information using a mechanism
|
|||
|
similar to the existing ``PyArg_Parse`` functions would require
|
|||
|
repeating ourselves yet again.
|
|||
|
|
|||
|
The goal of Argument Clinic is to replace this API with a mechanism
|
|||
|
inheriting none of these downsides:
|
|||
|
|
|||
|
* You need specify each parameter only once.
|
|||
|
* All information about a parameter is kept together in one place.
|
|||
|
* For each parameter, you specify its type in C; Argument Clinic
|
|||
|
handles the translation from Python value into C value for you.
|
|||
|
* Argument Clinic also allows for fine-tuning of argument processing
|
|||
|
behavior with highly-readable "flags", both per-parameter and
|
|||
|
applying across the whole function.
|
|||
|
* Docstrings are written in plain text.
|
|||
|
* From this, Argument Clinic generates for you all the mundane,
|
|||
|
repetitious code and data structures CPython needs internally.
|
|||
|
Once you've specified the interface, the next step is simply to
|
|||
|
write your implementation using native C types. Every detail of
|
|||
|
argument parsing is handled for you.
|
|||
|
|
|||
|
Future goals of Argument Clinic include:
|
|||
|
|
|||
|
* providing signature information for builtins, and
|
|||
|
* speed improvements to the generated code.
|
|||
|
|
|||
|
|
|||
|
DSL Syntax Summary
|
|||
|
==================
|
|||
|
|
|||
|
The Argument Clinic DSL is specified as a comment embedded in a C
|
|||
|
file, as follows. The "Example" column on the right shows you sample
|
|||
|
input to the Argument Clinic DSL, and the "Section" column on the left
|
|||
|
specifies what each line represents in turn.
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
+-----------------------+-----------------------------------------------------+
|
|||
|
| Section | Example |
|
|||
|
+-----------------------+-----------------------------------------------------+
|
|||
|
| Clinic DSL start | /*[clinic] |
|
|||
|
| Function declaration | module.function_name -> return_annotation |
|
|||
|
| Function flags | flag flag2 flag3=value |
|
|||
|
| Parameter declaration | type name = default |
|
|||
|
| Parameter flags | flag flag2 flag3=value |
|
|||
|
| Parameter docstring | Lorem ipsum dolor sit amet, consectetur |
|
|||
|
| | adipisicing elit, sed do eiusmod tempor |
|
|||
|
| Function docstring | Lorem ipsum dolor sit amet, consectetur adipisicing |
|
|||
|
| | elit, sed do eiusmod tempor incididunt ut labore et |
|
|||
|
| Clinic DSL end | [clinic]*/ |
|
|||
|
| Clinic output | ... |
|
|||
|
| Clinic output end | /*[clinic end output:<checksum>]*/ |
|
|||
|
+-----------------------+-----------------------------------------------------+
|
|||
|
|
|||
|
|
|||
|
General Behavior Of the Argument Clinic DSL
|
|||
|
-------------------------------------------
|
|||
|
|
|||
|
All lines support ``#`` as a line comment delimiter *except*
|
|||
|
docstrings. Blank lines are always ignored.
|
|||
|
|
|||
|
Like Python itself, leading whitespace is significant in the Argument
|
|||
|
Clinic DSL. The first line of the "function" section is the
|
|||
|
declaration; all subsequent lines at the same indent are function
|
|||
|
flags. Once you indent, the first line is a parameter declaration;
|
|||
|
subsequent lines at that indent are parameter flags. Indent one more
|
|||
|
time for the lines of the parameter docstring. Finally, dedent back
|
|||
|
to the same level as the function declaration for the function
|
|||
|
docstring.
|
|||
|
|
|||
|
|
|||
|
Function Declaration
|
|||
|
--------------------
|
|||
|
|
|||
|
The return annotation is optional. If skipped, the arrow ("``->``")
|
|||
|
must also be omitted.
|
|||
|
|
|||
|
|
|||
|
Parameter Declaration
|
|||
|
---------------------
|
|||
|
|
|||
|
The "type" is a C type. If it's a pointer type, you must specify a
|
|||
|
single space between the type and the "``*``", and zero spaces between
|
|||
|
the "``*``" and the name. (e.g. "``PyObject *foo``", not "``PyObject*
|
|||
|
foo``")
|
|||
|
|
|||
|
The "name" must be a legal C identifier.
|
|||
|
|
|||
|
The "default" is a Python value. Default values are optional; if not
|
|||
|
specified you must omit the equals sign too. Parameters which don't
|
|||
|
have a default are implicitly required. The default value is
|
|||
|
dynamically assigned, "live" in the generated C code, and although
|
|||
|
it's specified as a Python value, it's translated into a native C
|
|||
|
value in the generated C code.
|
|||
|
|
|||
|
It's explicitly permitted to end the parameter declaration line with a
|
|||
|
semicolon, though the semicolon is optional. This is intended to
|
|||
|
allow directly cutting and pasting in declarations from C code.
|
|||
|
However, the preferred style is without the semicolon.
|
|||
|
|
|||
|
|
|||
|
Flags
|
|||
|
-----
|
|||
|
|
|||
|
"Flags" are like "``make -D``" arguments. They're unordered. Flags
|
|||
|
lines are parsed much like the shell (specifically, using
|
|||
|
``shlex.split()`` [5]_ ). You can have as many flag lines as you
|
|||
|
like. Specifying a flag twice is currently an error.
|
|||
|
|
|||
|
Supported flags for functions:
|
|||
|
|
|||
|
``basename``
|
|||
|
The basename to use for the generated C functions. By default this
|
|||
|
is the name of the function from the DSL, only with periods replaced
|
|||
|
by underscores.
|
|||
|
|
|||
|
``positional-only``
|
|||
|
This function only supports positional parameters, not keyword
|
|||
|
parameters. See `Functions With Positional-Only Parameters`_ below.
|
|||
|
|
|||
|
Supported flags for parameters:
|
|||
|
|
|||
|
``bitwise``
|
|||
|
If the Python integer passed in is signed, copy the bits directly
|
|||
|
even if it is negative. Only valid for unsigned integer types.
|
|||
|
|
|||
|
``converter``
|
|||
|
Backwards-compatibility support for parameter "converter"
|
|||
|
functions. [6]_ The value should be the name of the converter
|
|||
|
function in C. Only valid when the type of the parameter is
|
|||
|
``void *``.
|
|||
|
|
|||
|
``default``
|
|||
|
The Python value to use in place of the parameter's actual default
|
|||
|
in Python contexts. Specifically, when specified, this value will
|
|||
|
be used for the parameter's default in the docstring, and in the
|
|||
|
``Signature``. (TBD: If the string is a valid Python expression
|
|||
|
which can be rendered into a Python value using ``eval()``, then the
|
|||
|
result of ``eval()`` on it will be used as the default in the
|
|||
|
``Signature``.) Ignored if there is no default.
|
|||
|
|
|||
|
``encoding``
|
|||
|
Encoding to use when encoding a Unicode string to a ``char *``.
|
|||
|
Only valid when the type of the parameter is ``char *``.
|
|||
|
|
|||
|
``group=``
|
|||
|
This parameter is part of a group of options that must either all be
|
|||
|
specified or none specified. Parameters in the same "group" must be
|
|||
|
contiguous. The value of the group flag is the name used for the
|
|||
|
group variable, and therefore must be legal as a C identifier. Only
|
|||
|
valid for functions marked "``positional-only``"; see `Functions
|
|||
|
With Positional-Only Parameters`_ below.
|
|||
|
|
|||
|
``immutable``
|
|||
|
Only accept immutable values.
|
|||
|
|
|||
|
``keyword-only``
|
|||
|
This parameter (and all subsequent parameters) is keyword-only.
|
|||
|
Keyword-only parameters must also be optional parameters. Not valid
|
|||
|
for positional-only functions.
|
|||
|
|
|||
|
``length``
|
|||
|
This is an iterable type, and we also want its length. The DSL will
|
|||
|
generate a second ``Py_ssize_t`` variable; its name will be this
|
|||
|
parameter's name appended with "``_length``".
|
|||
|
|
|||
|
``nullable``
|
|||
|
``None`` is a legal argument for this parameter. If ``None`` is
|
|||
|
supplied on the Python side, the equivalent C argument will be
|
|||
|
``NULL``. Only valid for pointer types.
|
|||
|
|
|||
|
``required``
|
|||
|
Normally any parameter that has a default value is automatically
|
|||
|
optional. A parameter that has "required" set will be considered
|
|||
|
required (non-optional) even if it has a default value. The
|
|||
|
generated documentation will also not show any default value.
|
|||
|
|
|||
|
``types``
|
|||
|
Space-separated list of acceptable Python types for this object.
|
|||
|
There are also four special-case types which represent Python
|
|||
|
protocols:
|
|||
|
|
|||
|
* buffer
|
|||
|
* mapping
|
|||
|
* number
|
|||
|
* sequence
|
|||
|
|
|||
|
``zeroes``
|
|||
|
This parameter is a string type, and its value should be allowed to
|
|||
|
have embedded zeroes. Not valid for all varieties of string
|
|||
|
parameters.
|
|||
|
|
|||
|
|
|||
|
Python Code
|
|||
|
-----------
|
|||
|
|
|||
|
Argument Clinic also permits embedding Python code inside C files,
|
|||
|
which is executed in-place when Argument Clinic processes the file.
|
|||
|
Embedded code looks like this:
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
/*[python]
|
|||
|
|
|||
|
# this is python code!
|
|||
|
print("/" + "* Hello world! *" + "/")
|
|||
|
|
|||
|
[python]*/
|
|||
|
|
|||
|
Any Python code is valid. Python code sections in Argument Clinic can
|
|||
|
also be used to modify Clinic's behavior at runtime; for example, see
|
|||
|
`Extending Argument Clinic`_.
|
|||
|
|
|||
|
|
|||
|
Output
|
|||
|
======
|
|||
|
|
|||
|
Argument Clinic writes its output in-line in the C file, immediately
|
|||
|
after the section of Clinic code. For "python" sections, the output
|
|||
|
is everything printed using ``builtins.print``. For "clinic"
|
|||
|
sections, the output is valid C code, including:
|
|||
|
|
|||
|
* a ``#define`` providing the correct ``methoddef`` structure for the
|
|||
|
function
|
|||
|
* a prototype for the "impl" function -- this is what you'll write
|
|||
|
to implement this function
|
|||
|
* a function that handles all argument processing, which calls your
|
|||
|
"impl" function
|
|||
|
* the definition line of the "impl" function
|
|||
|
* and a comment indicating the end of output.
|
|||
|
|
|||
|
The intention is that you will write the body of your impl function
|
|||
|
immediately after the output -- as in, you write a left-curly-brace
|
|||
|
immediately after the end-of-output comment and write the
|
|||
|
implementation of the builtin in the body there. (It's a bit strange
|
|||
|
at first, but oddly convenient.)
|
|||
|
|
|||
|
Argument Clinic will define the parameters of the impl function for
|
|||
|
you. The function will take the "self" parameter passed in
|
|||
|
originally, all the parameters you define, and possibly some extra
|
|||
|
generated parameters ("length" parameters; also "group" parameters,
|
|||
|
see next section).
|
|||
|
|
|||
|
Argument Clinic also writes a checksum for the output section. This
|
|||
|
is a valuable safety feature: if you modify the output by hand, Clinic
|
|||
|
will notice that the checksum doesn't match, and will refuse to
|
|||
|
overwrite the file. (You can force Clinic to overwrite with the
|
|||
|
"``-f``" command-line argument; Clinic will also ignore the checksums
|
|||
|
when using the "``-o``" command-line argument.)
|
|||
|
|
|||
|
|
|||
|
Functions With Positional-Only Parameters
|
|||
|
=========================================
|
|||
|
|
|||
|
A significant fraction of Python builtins implemented in C use the
|
|||
|
older positional-only API for processing arguments
|
|||
|
(``PyArg_ParseTuple()``). In some instances, these builtins parse
|
|||
|
their arguments differently based on how many arguments were passed
|
|||
|
in. This can provide some bewildering flexibility: there may be
|
|||
|
groups of optional parameters, which must either all be specified or
|
|||
|
none specified. And occasionally these groups are on the *left!* (For
|
|||
|
example: ``curses.window.addch()``.)
|
|||
|
|
|||
|
Argument Clinic supports these legacy use-cases with a special set of
|
|||
|
flags. First, set the flag "``positional-only``" on the entire
|
|||
|
function. Then, for every group of parameters that is collectively
|
|||
|
optional, add a "``group=``" flag with a unique string to all the
|
|||
|
parameters in that group. Note that these groups are permitted on the
|
|||
|
right *or left* of any required parameters! However, all groups
|
|||
|
(including the group of required parameters) must be contiguous.
|
|||
|
|
|||
|
The impl function generated by Clinic will add an extra parameter for
|
|||
|
every group, "``int <group>_group``". This argument will be nonzero
|
|||
|
if the group was specified on this call, and zero if it was not.
|
|||
|
|
|||
|
Note that when operating in this mode, you cannot specify default
|
|||
|
arguments. You can simulate defaults by putting parameters in
|
|||
|
individual groups and detecting whether or not they were specified;
|
|||
|
generally speaking it's better to simply not use "positional-only"
|
|||
|
where it isn't absolutely necessary. (TBD: It might be possible to
|
|||
|
relax this restriction. But adding default arguments into the mix of
|
|||
|
groups would seemingly make calculating which groups are active a good
|
|||
|
deal harder.)
|
|||
|
|
|||
|
Also, note that it's possible to specify a set of groups to a function
|
|||
|
such that there are several valid mappings from the number of
|
|||
|
arguments to a valid set of groups. If this happens, Clinic will exit
|
|||
|
with an error message. This should not be a problem, as
|
|||
|
positional-only operation is only intended for legacy use cases, and
|
|||
|
all the legacy functions using this quirky behavior should have
|
|||
|
unambiguous mappings.
|
|||
|
|
|||
|
|
|||
|
Current Status
|
|||
|
==============
|
|||
|
|
|||
|
As of this writing, there is a working prototype implementation of
|
|||
|
Argument Clinic available online. [7]_ The prototype implements the
|
|||
|
syntax above, and generates code using the existing ``PyArg_Parse``
|
|||
|
APIs. It supports translating to all current format units except
|
|||
|
``"w*"``. Sample functions using Argument Clinic exercise all major
|
|||
|
features, including positional-only argument parsing.
|
|||
|
|
|||
|
|
|||
|
Extending Argument Clinic
|
|||
|
-------------------------
|
|||
|
|
|||
|
The prototype also currently provides an experimental extension
|
|||
|
mechanism, allowing adding support for new types on-the-fly. See
|
|||
|
``Modules/posixmodule.c`` in the prototype for an example of its use.
|
|||
|
|
|||
|
|
|||
|
Notes / TBD
|
|||
|
===========
|
|||
|
|
|||
|
* Guido proposed having the "function docstring" be hand-written inline,
|
|||
|
in the middle of the output, something like this:
|
|||
|
|
|||
|
::
|
|||
|
|
|||
|
/*[clinic]
|
|||
|
... prototype and parameters (including parameter docstrings) go here
|
|||
|
[clinic]*/
|
|||
|
... some output ...
|
|||
|
/*[clinic docstring start]*/
|
|||
|
... hand-edited function docstring goes here <-- you edit this by hand!
|
|||
|
/*[clinic docstring end]*/
|
|||
|
... more output
|
|||
|
/*[clinic output end]*/
|
|||
|
|
|||
|
I tried it this way and don't like it -- I think it's clumsy. I
|
|||
|
prefer that everything you write goes in one place, rather than
|
|||
|
having an island of hand-edited stuff in the middle of the DSL
|
|||
|
output.
|
|||
|
|
|||
|
* Do we need to support tuple unpacking? (The "``(OOO)``" style
|
|||
|
format string.) Boy I sure hope not.
|
|||
|
|
|||
|
* What about Python functions that take no arguments? This syntax
|
|||
|
doesn't provide for that. Perhaps a lone indented "None" should
|
|||
|
mean "no arguments"?
|
|||
|
|
|||
|
* This approach removes some dynamism / flexibility. With the
|
|||
|
existing syntax one could theoretically pass in different encodings
|
|||
|
at runtime for the "``es``"/"``et``" format units. AFAICT CPython
|
|||
|
doesn't do this itself, however it's possible external users might
|
|||
|
do this. (Trivia: there are no uses of "``es``" exercised by
|
|||
|
regrtest, and all the uses of "``et``" exercised are in
|
|||
|
socketmodule.c, except for one in _ssl.c. They're all static,
|
|||
|
specifying the encoding ``"idna"``.)
|
|||
|
|
|||
|
* Right now the "basename" flag on a function changes the ``#define
|
|||
|
methoddef`` name too. Should it, or should the #define'd methoddef
|
|||
|
name always be ``{module_name}_{function_name}`` ?
|
|||
|
|
|||
|
|
|||
|
References
|
|||
|
==========
|
|||
|
|
|||
|
.. [1] ``PyArg_ParseTuple()``:
|
|||
|
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple
|
|||
|
|
|||
|
.. [2] ``PyArg_ParseTupleAndKeywords()``:
|
|||
|
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords
|
|||
|
|
|||
|
.. [3] ``PyArg_`` format units:
|
|||
|
http://docs.python.org/3/c-api/arg.html#strings-and-buffers
|
|||
|
|
|||
|
.. [4] Keyword parameters for extension functions:
|
|||
|
http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions
|
|||
|
|
|||
|
.. [5] ``shlex.split()``:
|
|||
|
http://docs.python.org/3/library/shlex.html#shlex.split
|
|||
|
|
|||
|
.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section:
|
|||
|
http://docs.python.org/3/c-api/arg.html#other-objects
|
|||
|
|
|||
|
.. [7] Argument Clinic prototype:
|
|||
|
https://bitbucket.org/larry/python-clinic/
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
=========
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
..
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
sentence-end-double-space: t
|
|||
|
fill-column: 70
|
|||
|
coding: utf-8
|
|||
|
End:
|