Add PEP 445: The Argument Clinic DSL
This commit is contained in:
parent
72af6a15bc
commit
0176d4bc59
|
@ -0,0 +1,481 @@
|
|||
PEP: 445
|
||||
Title: The Argument Clinic DSL
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Larry Hastings <larry@hastings.org>
|
||||
Discussions-To: Python-Dev <python-dev@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 22-Feb-2013
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This document proposes "Argument Clinic", a DSL designed
|
||||
to facilitate argument processing for built-in functions
|
||||
in the implementation of CPython.
|
||||
|
||||
Rationale and Goals
|
||||
===================
|
||||
|
||||
The primary implementation of Python, "CPython", is written in
|
||||
a mixture of Python and C. One of the implementation details
|
||||
of CPython is what are called "built-in" functions--functions
|
||||
available to Python programs but written in C. When a
|
||||
Python program calls a built-in function and passes in
|
||||
arguments, those arguments must be translated from Python
|
||||
values into C values. This process is called "parsing arguments".
|
||||
|
||||
As of CPython 3.3, arguments to functions are primarily
|
||||
parsed with one of two functions: the original
|
||||
``PyArg_ParseTuple()``, [1]_ and the more modern
|
||||
``PyArg_ParseTupleAndKeywords()``. [2]_
|
||||
The former function only handles positional parameters; the
|
||||
latter also accomodates keyword and keyword-only parameters,
|
||||
and is preferred for new code.
|
||||
|
||||
``PyArg_ParseTuple()`` was a reasonable approach when it was
|
||||
first concieved. The programmer specified the translation for
|
||||
the arguments in a "format string": [3]_ each parameter matched to
|
||||
a "format unit", a one-or-two character sequence telling
|
||||
``PyArg_ParseTuple()`` what Python types to accept and how
|
||||
to translate them into the appropriate C value for that
|
||||
parameter. There were only a dozen or so of these "format
|
||||
units", and each one was distinct and easy to understand.
|
||||
|
||||
Over the years the ``PyArg_Parse`` interface has been extended in
|
||||
numerous ways. The modern API is quite complex, to the point
|
||||
that it is somewhat painful to use. Consider:
|
||||
|
||||
* There are now forty different "format units"; a few are
|
||||
even three characters long.
|
||||
This overload of symbology makes it difficult to understand
|
||||
what the format string says without constantly cross-indexing
|
||||
it with the documentation.
|
||||
* There are also six meta-format units that may be buried
|
||||
in the format string. (They are: ``"()|$:;"``.)
|
||||
* The more format units are added, the less likely it is the
|
||||
implementor can pick an easy-to-use mnemonic for the format
|
||||
unit, because the character of choice is probably already in
|
||||
use. In other words, the more format units we have, the more
|
||||
obtuse the format units become.
|
||||
* Several format units are nearly identical to others, having
|
||||
only subtle differences. This makes understanding the exact
|
||||
semantics of the format string even harder.
|
||||
* The docstring is specified as a static C string,
|
||||
which is mildly bothersome to read and edit.
|
||||
* When adding a new parameter to a function using
|
||||
``PyArg_ParseTupleAndKeywords()``, it's necessary to
|
||||
touch six different places in the code: [4]_
|
||||
|
||||
* Declaring the variable to store the argument.
|
||||
* Passing in a pointer to that variable in the correct
|
||||
spot in ``PyArg_ParseTupleAndKeywords()``, also passing
|
||||
in any "length" or "converter" arguments in the correct
|
||||
order.
|
||||
* Adding the name of the argument in the correct spot
|
||||
of the "keywords" array passed in to
|
||||
``PyArg_ParseTupleAndKeywords()``.
|
||||
* Adding the format unit to the correct spot in the
|
||||
format string.
|
||||
* Adding the parameter to the prototype in the
|
||||
docstring.
|
||||
* Documenting the parameter in the docstring.
|
||||
|
||||
* There is currently no mechanism for builtin functions
|
||||
to provide their "signature" information (see
|
||||
``inspect.getfullargspec`` and ``inspect.Signature``).
|
||||
Adding this information using a mechanism similar to
|
||||
the existing ``PyArg_Parse`` functions would require
|
||||
repeating ourselves yet again.
|
||||
|
||||
The goal of Argument Clinic is to replace this API with a
|
||||
mechanism inheriting none of these downsides:
|
||||
|
||||
* You need specify each parameter only once.
|
||||
* All information about a parameter is kept together in one place.
|
||||
* For each parameter, you specify its type in C;
|
||||
Argument Clinic handles the translation from
|
||||
Python value into C value for you.
|
||||
* Argument Clinic also allows for fine-tuning
|
||||
of argument processing behavior with
|
||||
highly-readable "flags", both per-parameter
|
||||
and applying across the whole function.
|
||||
* Docstrings are written in plain text.
|
||||
* From this, Argument Clinic generates for you all
|
||||
the mundane, repetitious code and data structures
|
||||
CPython needs internally. Once you've specified
|
||||
the interface, the next step is simply to write your
|
||||
implementation using native C types. Every detail
|
||||
of argument parsing is handled for you.
|
||||
|
||||
Future goals of Argument Clinic include:
|
||||
|
||||
* providing signature information for builtins, and
|
||||
* speed improvements to the generated code.
|
||||
|
||||
DSL Syntax Summary
|
||||
==================
|
||||
|
||||
The Argument Clinic DSL is specified as a comment
|
||||
embedded in a C file, as follows. The "Example" column on the
|
||||
right shows you sample input to the Argument Clinic DSL,
|
||||
and the "Section" column on the left specifies what each line
|
||||
represents in turn.
|
||||
|
||||
::
|
||||
|
||||
+-----------------------+-----------------------------------------------------+
|
||||
| Section | Example |
|
||||
+-----------------------+-----------------------------------------------------+
|
||||
| Clinic DSL start | /*[clinic] |
|
||||
| Function declaration | module.function_name -> return_annotation |
|
||||
| Function flags | flag flag2 flag3=value |
|
||||
| Parameter declaration | type name = default |
|
||||
| Parameter flags | flag flag2 flag3=value |
|
||||
| Parameter docstring | Lorem ipsum dolor sit amet, consectetur |
|
||||
| | adipisicing elit, sed do eiusmod tempor |
|
||||
| Function docstring | Lorem ipsum dolor sit amet, consectetur adipisicing |
|
||||
| | elit, sed do eiusmod tempor incididunt ut labore et |
|
||||
| Clinic DSL end | [clinic]*/ |
|
||||
| Clinic output | ... |
|
||||
| Clinic output end | /*[clinic end output:<checksum>]*/ |
|
||||
+-----------------------+-----------------------------------------------------+
|
||||
|
||||
|
||||
General Behavior Of the Argument Clinic DSL
|
||||
-------------------------------------------
|
||||
|
||||
All lines support ``#`` as a line comment delimiter *except* docstrings.
|
||||
Blank lines are always ignored.
|
||||
|
||||
Like Python itself, leading whitespace is significant in the Argument Clinic
|
||||
DSL. The first line of the "function" section is the declaration;
|
||||
all subsequent lines at the same indent are function flags. Once you indent,
|
||||
the first line is a parameter declaration; subsequent lines at that indent
|
||||
are parameter flags. Indent one more time for the lines of the parameter
|
||||
docstring. Finally, outdent back to the same level as the function
|
||||
declaration for the function docstring.
|
||||
|
||||
Function Declaration
|
||||
--------------------
|
||||
|
||||
The return annotation is optional. If skipped, the arrow ("``->``") must also be omitted.
|
||||
|
||||
Parameter Declaration
|
||||
---------------------
|
||||
|
||||
The "type" is a C type. If it's a pointer type, you must specify
|
||||
a single space between the type and the "``*``", and zero spaces between
|
||||
the "``*``" and the name. (e.g. "``PyObject *foo``", not "``PyObject* foo``")
|
||||
|
||||
The "name" must be a legal C identifier.
|
||||
|
||||
The "default" is a Python value. Default values are optional;
|
||||
if not specified you must omit the equals sign too. Parameters
|
||||
which don't have a default are implicitly required. The default
|
||||
value is dynamically assigned, "live" in the generated C code,
|
||||
and although it's specified as a Python value, it's translated
|
||||
into a native C value in the generated C code.
|
||||
|
||||
It's explicitly permitted to end the parameter declaration line
|
||||
with a semicolon, though the semicolon is optional. This is
|
||||
intended to allow directly cutting and pasting in declarations
|
||||
from C code. However, the preferred style is without the semicolon.
|
||||
|
||||
|
||||
Flags
|
||||
-----
|
||||
|
||||
"Flags" are like "``make -D``" arguments. They're unordered. Flags lines
|
||||
are parsed much like the shell (specifically, using ``shlex.split()`` [5]_ ).
|
||||
You can have as many flag lines as you like. Specifying a flag twice
|
||||
is currently an error.
|
||||
|
||||
Supported flags for functions:
|
||||
|
||||
``basename``
|
||||
The basename to use for the generated C functions.
|
||||
By default this is the name of the function from
|
||||
the DSL, only with periods replaced by underscores.
|
||||
|
||||
``positional-only``
|
||||
This function only supports positional parameters,
|
||||
not keyword parameters. See `Functions With
|
||||
Positional-Only Parameters`_ below.
|
||||
|
||||
Supported flags for parameters:
|
||||
|
||||
``bitwise``
|
||||
If the Python integer passed in is signed, copy the
|
||||
bits directly even if it is negative. Only valid
|
||||
for unsigned integer types.
|
||||
|
||||
``converter``
|
||||
Backwards-compatibility support for parameter "converter"
|
||||
functions. [6]_ The value should be the name of the converter
|
||||
function in C. Only valid when the type of the parameter
|
||||
is ``void *``.
|
||||
|
||||
``default``
|
||||
The Python value to use in place of the parameter's actual
|
||||
default in Python contexts. Specifically, when specified,
|
||||
this value will be used for the parameter's default in the
|
||||
docstring, and in the ``Signature``. (TBD: If the string is a
|
||||
valid Python expression, renderable into a Python value
|
||||
using ``eval()``, then the result of ``eval()`` on it will be used
|
||||
as the default in the ``Signature``.) Ignored if there is no
|
||||
default.
|
||||
|
||||
``encoding``
|
||||
Encoding to use when encoding a Unicode string to a ``char *``.
|
||||
Only valid when the type of the parameter is ``char *``.
|
||||
|
||||
``group=``
|
||||
This parameter is part of a group of options that must either
|
||||
all be specified or none specified. Parameters in the same
|
||||
"group" must be contiguous. The value of the group flag
|
||||
is the name used for the group variable, and therefore must
|
||||
be legal as a C identifier. Only valid for functions
|
||||
marked "``positional-only``"; see `Functions With
|
||||
Positional-Only Parameters`_ below.
|
||||
|
||||
``immutable``
|
||||
Only accept immutable values.
|
||||
|
||||
``keyword-only``
|
||||
This parameter (and all subsequent parameters) is
|
||||
keyword-only. Keyword-only parameters must also be
|
||||
optional parameters. Not valid for positional-only functions.
|
||||
|
||||
``length``
|
||||
This is an iterable type, and we also want its length. The
|
||||
DSL will generate a second ``Py_ssize_t`` variable;
|
||||
its name will be this parameter's name appended with
|
||||
"``_length``".
|
||||
|
||||
``nullable``
|
||||
``None`` is a legal argument for this parameter. If ``None`` is
|
||||
supplied on the Python side, the equivalent C argument will be
|
||||
``NULL``. Only valid for pointer types.
|
||||
|
||||
``required``
|
||||
Normally any parameter that has a default value is
|
||||
automatically optional. A parameter that has "required"
|
||||
set will be considered required (non-optional) even if
|
||||
it has a default value. The generated documentation
|
||||
will also not show any default value.
|
||||
|
||||
``types``
|
||||
Space-separated list of acceptable Python types for this
|
||||
object. There are also four special-case types which
|
||||
represent Python protocols:
|
||||
|
||||
* buffer
|
||||
* mapping
|
||||
* number
|
||||
* sequence
|
||||
|
||||
``zeroes``
|
||||
This parameter is a string type, and its value should be
|
||||
allowed to have embedded zeroes. Not valid for all
|
||||
varieties of string parameters.
|
||||
|
||||
|
||||
Python Code
|
||||
-----------
|
||||
|
||||
Argument Clinic also permits embedding Python code inside C files,
|
||||
which is executed in-place when Argument Clinic processes the file.
|
||||
Embedded code looks like this:
|
||||
|
||||
::
|
||||
|
||||
/*[python]
|
||||
|
||||
# this is python code!
|
||||
print("/" + "* Hello world! *" + "/")
|
||||
|
||||
[python]*/
|
||||
|
||||
Any Python code is valid. Python code sections in Argument Clinic
|
||||
can also be used to modify Clinic's behavior at runtime; for example,
|
||||
see `Extending Argument Clinic`_.
|
||||
|
||||
|
||||
Output
|
||||
======
|
||||
|
||||
Argument Clinic writes its output in-line in the C file, immediately after
|
||||
the section of Clinic code. For "python" sections, the output is
|
||||
everything printed using ``builtins.print``. For "clinic" sections, the
|
||||
output is valid C code, including:
|
||||
|
||||
* a ``#define`` providing the correct ``methoddef`` structure for the
|
||||
function
|
||||
* a prototype for the "impl" function--this is what you'll write to
|
||||
implement this function
|
||||
* a function that handles all argument processing, which calls your
|
||||
"impl" function
|
||||
* the definition line of the "impl" function
|
||||
* and a comment indicating the end of output.
|
||||
|
||||
The intention is that you will write the body of your impl function
|
||||
immediately after the output--as in, you write a left-curly-brace
|
||||
immediately after the end-of-output comment and write the implementation
|
||||
of the builtin in the body there. (It's a bit strange at first--but oddly
|
||||
convenient.)
|
||||
|
||||
Argument Clinic will define the parameters of the impl function for you.
|
||||
The function will take the "self" parameter passed in originally, all
|
||||
the parameters you define, and possibly some extra generated parameters
|
||||
("length" parameters; also "group" parameters, see next section).
|
||||
|
||||
Argument Clinic also writes a checksum for the output section. This
|
||||
is a valuable safety feature: if you modify the output by hand, Clinic
|
||||
will notice that the checksum doesn't match, and will refuse to
|
||||
overwrite the file. (You can force Clinic to overwrite with the "``-f``"
|
||||
command-line argument; Clinic will also ignore the checksums when
|
||||
using the "``-o``" command-line argument.)
|
||||
|
||||
|
||||
Functions With Positional-Only Parameters
|
||||
=========================================
|
||||
|
||||
A significant fraction of Python builtins implemented in C use the
|
||||
older positional-only API for processing arguments (``PyArg_ParseTuple()``).
|
||||
In some instances, these builtins parse their arguments differently
|
||||
based on how many arguments were passed in. This can provide some
|
||||
bewildering flexibility: there may be groups of optional parameters,
|
||||
which must either all be specified or none specified. And occasionally
|
||||
these groups are on the *left!* (For example: ``curses.window.addch()``.)
|
||||
|
||||
Argument Clinic supports these legacy use-cases with a special set
|
||||
of flags. First, set the flag "``positional-only``" on the entire
|
||||
function. Then, for every group of parameters that is collectively
|
||||
optional, add a "``group=``" flag with a unique string to all the
|
||||
parameters in that group. Note that these groups are permitted on
|
||||
the right *or left* of any required parameters! However, all groups
|
||||
(including the group of required parameters) must be contiguous.
|
||||
|
||||
The impl function generated by Clinic will add an extra parameter for
|
||||
every group, "``int <group>_group``". This argument will be nonzero if
|
||||
the group was specified on this call, and zero if it was not.
|
||||
|
||||
Note that when operating in this mode, you cannot specify default
|
||||
arguments. You can simulate defaults by putting parameters in
|
||||
individual groups and detecting whether or not they were
|
||||
specified--but generally speaking it's better to simply not
|
||||
use "positional-only" where it isn't absolutely necessary. (TBD: It
|
||||
might be possible to relax this restriction. But adding default
|
||||
arguments into the mix of groups would seemingly make calculating which
|
||||
groups are active a good deal harder.)
|
||||
|
||||
Also, note that it's possible--even easy--to specify a set of groups
|
||||
to a function such that there are several valid mappings from the number
|
||||
of arguments to a valid set of groups. If this happens, Clinic will exit
|
||||
with an error message. This should not be a problem, as positional-only
|
||||
operation is only intended for legacy use cases, and all the legacy
|
||||
functions using this quirky behavior should have unambiguous mappings.
|
||||
|
||||
|
||||
Current Status
|
||||
==============
|
||||
|
||||
As of this writing, there is a working prototype implementation of
|
||||
Argument Clinic available online. [7]_ The prototype implements
|
||||
the syntax above, and generates code using the existing ``PyArg_Parse``
|
||||
APIs. It supports translating to all current format units except ``"w*"``.
|
||||
Sample functions using Argument Clinic exercise all major features,
|
||||
including positional-only argument parsing.
|
||||
|
||||
Extending Argument Clinic
|
||||
-------------------------
|
||||
|
||||
The prototype also currently provides an experimental extension mechanism,
|
||||
allowing adding support for new types on-the-fly. See ``Modules/posixmodule.c``
|
||||
in the prototype for an example of its use.
|
||||
|
||||
|
||||
Notes / TBD
|
||||
===========
|
||||
|
||||
* Guido proposed having the "function docstring" be hand-written inline,
|
||||
in the middle of the output, something like this:
|
||||
|
||||
::
|
||||
|
||||
/*[clinic]
|
||||
... prototype and parameters (including parameter docstrings) go here
|
||||
[clinic]*/
|
||||
... some output ...
|
||||
/*[clinic docstring start]*/
|
||||
... hand-edited function docstring goes here <-- you edit this by hand!
|
||||
/*[clinic docstring end]*/
|
||||
... more output
|
||||
/*[clinic output end]*/
|
||||
|
||||
I tried it this way and don't like it--I think it's clumsy. I prefer that
|
||||
everything you write goes in one place, rather than having an island of
|
||||
hand-edited stuff in the middle of the DSL output.
|
||||
|
||||
* Do we need to support tuple unpacking? (The "``(OOO)``" style format string.)
|
||||
Boy I sure hope not.
|
||||
|
||||
* What about Python functions that take no arguments? This syntax doesn't
|
||||
provide for that. Perhaps a lone indented "None" should mean "no arguments"?
|
||||
|
||||
* This approach removes some dynamism / flexibility. With the existing
|
||||
syntax one could theoretically pass in different encodings at runtime for
|
||||
the "``es``"/"``et``" format units. AFAICT CPython doesn't do this itself,
|
||||
however it's possible external users might do this. (Trivia: there are no
|
||||
uses of "``es``" exercised by regrtest, and all the uses of "``et``"
|
||||
exercised are in socketmodule.c, except for one in _ssl.c. They're all
|
||||
static, specifying the encoding ``"idna"``.)
|
||||
|
||||
* Right now the "basename" flag on a function changes the ``#define methoddef`` name
|
||||
too. Should it, or should the #define'd methoddef name always be
|
||||
``{module_name}_{function_name}`` ?
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] ``PyArg_ParseTuple()``:
|
||||
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple
|
||||
|
||||
.. [2] ``PyArg_ParseTupleAndKeywords()``:
|
||||
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords
|
||||
|
||||
.. [3] ``PyArg_`` format units:
|
||||
http://docs.python.org/3/c-api/arg.html#strings-and-buffers
|
||||
|
||||
.. [4] Keyword parameters for extension functions:
|
||||
http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions
|
||||
|
||||
.. [5] ``shlex.split()``:
|
||||
http://docs.python.org/3/library/shlex.html#shlex.split
|
||||
|
||||
.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section:
|
||||
http://docs.python.org/3/c-api/arg.html#other-objects
|
||||
|
||||
.. [7] Argument Clinic prototype:
|
||||
https://bitbucket.org/larry/python-clinic/
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue