Add PEP 445: The Argument Clinic DSL

This commit is contained in:
Brett Cannon 2013-02-25 11:39:56 -05:00
parent 72af6a15bc
commit 0176d4bc59
1 changed files with 481 additions and 0 deletions

481
pep-0445.txt Normal file
View File

@ -0,0 +1,481 @@
PEP: 445
Title: The Argument Clinic DSL
Version: $Revision$
Last-Modified: $Date$
Author: Larry Hastings <larry@hastings.org>
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Feb-2013
Abstract
========
This document proposes "Argument Clinic", a DSL designed
to facilitate argument processing for built-in functions
in the implementation of CPython.
Rationale and Goals
===================
The primary implementation of Python, "CPython", is written in
a mixture of Python and C. One of the implementation details
of CPython is what are called "built-in" functions--functions
available to Python programs but written in C. When a
Python program calls a built-in function and passes in
arguments, those arguments must be translated from Python
values into C values. This process is called "parsing arguments".
As of CPython 3.3, arguments to functions are primarily
parsed with one of two functions: the original
``PyArg_ParseTuple()``, [1]_ and the more modern
``PyArg_ParseTupleAndKeywords()``. [2]_
The former function only handles positional parameters; the
latter also accomodates keyword and keyword-only parameters,
and is preferred for new code.
``PyArg_ParseTuple()`` was a reasonable approach when it was
first concieved. The programmer specified the translation for
the arguments in a "format string": [3]_ each parameter matched to
a "format unit", a one-or-two character sequence telling
``PyArg_ParseTuple()`` what Python types to accept and how
to translate them into the appropriate C value for that
parameter. There were only a dozen or so of these "format
units", and each one was distinct and easy to understand.
Over the years the ``PyArg_Parse`` interface has been extended in
numerous ways. The modern API is quite complex, to the point
that it is somewhat painful to use. Consider:
* There are now forty different "format units"; a few are
even three characters long.
This overload of symbology makes it difficult to understand
what the format string says without constantly cross-indexing
it with the documentation.
* There are also six meta-format units that may be buried
in the format string. (They are: ``"()|$:;"``.)
* The more format units are added, the less likely it is the
implementor can pick an easy-to-use mnemonic for the format
unit, because the character of choice is probably already in
use. In other words, the more format units we have, the more
obtuse the format units become.
* Several format units are nearly identical to others, having
only subtle differences. This makes understanding the exact
semantics of the format string even harder.
* The docstring is specified as a static C string,
which is mildly bothersome to read and edit.
* When adding a new parameter to a function using
``PyArg_ParseTupleAndKeywords()``, it's necessary to
touch six different places in the code: [4]_
* Declaring the variable to store the argument.
* Passing in a pointer to that variable in the correct
spot in ``PyArg_ParseTupleAndKeywords()``, also passing
in any "length" or "converter" arguments in the correct
order.
* Adding the name of the argument in the correct spot
of the "keywords" array passed in to
``PyArg_ParseTupleAndKeywords()``.
* Adding the format unit to the correct spot in the
format string.
* Adding the parameter to the prototype in the
docstring.
* Documenting the parameter in the docstring.
* There is currently no mechanism for builtin functions
to provide their "signature" information (see
``inspect.getfullargspec`` and ``inspect.Signature``).
Adding this information using a mechanism similar to
the existing ``PyArg_Parse`` functions would require
repeating ourselves yet again.
The goal of Argument Clinic is to replace this API with a
mechanism inheriting none of these downsides:
* You need specify each parameter only once.
* All information about a parameter is kept together in one place.
* For each parameter, you specify its type in C;
Argument Clinic handles the translation from
Python value into C value for you.
* Argument Clinic also allows for fine-tuning
of argument processing behavior with
highly-readable "flags", both per-parameter
and applying across the whole function.
* Docstrings are written in plain text.
* From this, Argument Clinic generates for you all
the mundane, repetitious code and data structures
CPython needs internally. Once you've specified
the interface, the next step is simply to write your
implementation using native C types. Every detail
of argument parsing is handled for you.
Future goals of Argument Clinic include:
* providing signature information for builtins, and
* speed improvements to the generated code.
DSL Syntax Summary
==================
The Argument Clinic DSL is specified as a comment
embedded in a C file, as follows. The "Example" column on the
right shows you sample input to the Argument Clinic DSL,
and the "Section" column on the left specifies what each line
represents in turn.
::
+-----------------------+-----------------------------------------------------+
| Section | Example |
+-----------------------+-----------------------------------------------------+
| Clinic DSL start | /*[clinic] |
| Function declaration | module.function_name -> return_annotation |
| Function flags | flag flag2 flag3=value |
| Parameter declaration | type name = default |
| Parameter flags | flag flag2 flag3=value |
| Parameter docstring | Lorem ipsum dolor sit amet, consectetur |
| | adipisicing elit, sed do eiusmod tempor |
| Function docstring | Lorem ipsum dolor sit amet, consectetur adipisicing |
| | elit, sed do eiusmod tempor incididunt ut labore et |
| Clinic DSL end | [clinic]*/ |
| Clinic output | ... |
| Clinic output end | /*[clinic end output:<checksum>]*/ |
+-----------------------+-----------------------------------------------------+
General Behavior Of the Argument Clinic DSL
-------------------------------------------
All lines support ``#`` as a line comment delimiter *except* docstrings.
Blank lines are always ignored.
Like Python itself, leading whitespace is significant in the Argument Clinic
DSL. The first line of the "function" section is the declaration;
all subsequent lines at the same indent are function flags. Once you indent,
the first line is a parameter declaration; subsequent lines at that indent
are parameter flags. Indent one more time for the lines of the parameter
docstring. Finally, outdent back to the same level as the function
declaration for the function docstring.
Function Declaration
--------------------
The return annotation is optional. If skipped, the arrow ("``->``") must also be omitted.
Parameter Declaration
---------------------
The "type" is a C type. If it's a pointer type, you must specify
a single space between the type and the "``*``", and zero spaces between
the "``*``" and the name. (e.g. "``PyObject *foo``", not "``PyObject* foo``")
The "name" must be a legal C identifier.
The "default" is a Python value. Default values are optional;
if not specified you must omit the equals sign too. Parameters
which don't have a default are implicitly required. The default
value is dynamically assigned, "live" in the generated C code,
and although it's specified as a Python value, it's translated
into a native C value in the generated C code.
It's explicitly permitted to end the parameter declaration line
with a semicolon, though the semicolon is optional. This is
intended to allow directly cutting and pasting in declarations
from C code. However, the preferred style is without the semicolon.
Flags
-----
"Flags" are like "``make -D``" arguments. They're unordered. Flags lines
are parsed much like the shell (specifically, using ``shlex.split()`` [5]_ ).
You can have as many flag lines as you like. Specifying a flag twice
is currently an error.
Supported flags for functions:
``basename``
The basename to use for the generated C functions.
By default this is the name of the function from
the DSL, only with periods replaced by underscores.
``positional-only``
This function only supports positional parameters,
not keyword parameters. See `Functions With
Positional-Only Parameters`_ below.
Supported flags for parameters:
``bitwise``
If the Python integer passed in is signed, copy the
bits directly even if it is negative. Only valid
for unsigned integer types.
``converter``
Backwards-compatibility support for parameter "converter"
functions. [6]_ The value should be the name of the converter
function in C. Only valid when the type of the parameter
is ``void *``.
``default``
The Python value to use in place of the parameter's actual
default in Python contexts. Specifically, when specified,
this value will be used for the parameter's default in the
docstring, and in the ``Signature``. (TBD: If the string is a
valid Python expression, renderable into a Python value
using ``eval()``, then the result of ``eval()`` on it will be used
as the default in the ``Signature``.) Ignored if there is no
default.
``encoding``
Encoding to use when encoding a Unicode string to a ``char *``.
Only valid when the type of the parameter is ``char *``.
``group=``
This parameter is part of a group of options that must either
all be specified or none specified. Parameters in the same
"group" must be contiguous. The value of the group flag
is the name used for the group variable, and therefore must
be legal as a C identifier. Only valid for functions
marked "``positional-only``"; see `Functions With
Positional-Only Parameters`_ below.
``immutable``
Only accept immutable values.
``keyword-only``
This parameter (and all subsequent parameters) is
keyword-only. Keyword-only parameters must also be
optional parameters. Not valid for positional-only functions.
``length``
This is an iterable type, and we also want its length. The
DSL will generate a second ``Py_ssize_t`` variable;
its name will be this parameter's name appended with
"``_length``".
``nullable``
``None`` is a legal argument for this parameter. If ``None`` is
supplied on the Python side, the equivalent C argument will be
``NULL``. Only valid for pointer types.
``required``
Normally any parameter that has a default value is
automatically optional. A parameter that has "required"
set will be considered required (non-optional) even if
it has a default value. The generated documentation
will also not show any default value.
``types``
Space-separated list of acceptable Python types for this
object. There are also four special-case types which
represent Python protocols:
* buffer
* mapping
* number
* sequence
``zeroes``
This parameter is a string type, and its value should be
allowed to have embedded zeroes. Not valid for all
varieties of string parameters.
Python Code
-----------
Argument Clinic also permits embedding Python code inside C files,
which is executed in-place when Argument Clinic processes the file.
Embedded code looks like this:
::
/*[python]
# this is python code!
print("/" + "* Hello world! *" + "/")
[python]*/
Any Python code is valid. Python code sections in Argument Clinic
can also be used to modify Clinic's behavior at runtime; for example,
see `Extending Argument Clinic`_.
Output
======
Argument Clinic writes its output in-line in the C file, immediately after
the section of Clinic code. For "python" sections, the output is
everything printed using ``builtins.print``. For "clinic" sections, the
output is valid C code, including:
* a ``#define`` providing the correct ``methoddef`` structure for the
function
* a prototype for the "impl" function--this is what you'll write to
implement this function
* a function that handles all argument processing, which calls your
"impl" function
* the definition line of the "impl" function
* and a comment indicating the end of output.
The intention is that you will write the body of your impl function
immediately after the output--as in, you write a left-curly-brace
immediately after the end-of-output comment and write the implementation
of the builtin in the body there. (It's a bit strange at first--but oddly
convenient.)
Argument Clinic will define the parameters of the impl function for you.
The function will take the "self" parameter passed in originally, all
the parameters you define, and possibly some extra generated parameters
("length" parameters; also "group" parameters, see next section).
Argument Clinic also writes a checksum for the output section. This
is a valuable safety feature: if you modify the output by hand, Clinic
will notice that the checksum doesn't match, and will refuse to
overwrite the file. (You can force Clinic to overwrite with the "``-f``"
command-line argument; Clinic will also ignore the checksums when
using the "``-o``" command-line argument.)
Functions With Positional-Only Parameters
=========================================
A significant fraction of Python builtins implemented in C use the
older positional-only API for processing arguments (``PyArg_ParseTuple()``).
In some instances, these builtins parse their arguments differently
based on how many arguments were passed in. This can provide some
bewildering flexibility: there may be groups of optional parameters,
which must either all be specified or none specified. And occasionally
these groups are on the *left!* (For example: ``curses.window.addch()``.)
Argument Clinic supports these legacy use-cases with a special set
of flags. First, set the flag "``positional-only``" on the entire
function. Then, for every group of parameters that is collectively
optional, add a "``group=``" flag with a unique string to all the
parameters in that group. Note that these groups are permitted on
the right *or left* of any required parameters! However, all groups
(including the group of required parameters) must be contiguous.
The impl function generated by Clinic will add an extra parameter for
every group, "``int <group>_group``". This argument will be nonzero if
the group was specified on this call, and zero if it was not.
Note that when operating in this mode, you cannot specify default
arguments. You can simulate defaults by putting parameters in
individual groups and detecting whether or not they were
specified--but generally speaking it's better to simply not
use "positional-only" where it isn't absolutely necessary. (TBD: It
might be possible to relax this restriction. But adding default
arguments into the mix of groups would seemingly make calculating which
groups are active a good deal harder.)
Also, note that it's possible--even easy--to specify a set of groups
to a function such that there are several valid mappings from the number
of arguments to a valid set of groups. If this happens, Clinic will exit
with an error message. This should not be a problem, as positional-only
operation is only intended for legacy use cases, and all the legacy
functions using this quirky behavior should have unambiguous mappings.
Current Status
==============
As of this writing, there is a working prototype implementation of
Argument Clinic available online. [7]_ The prototype implements
the syntax above, and generates code using the existing ``PyArg_Parse``
APIs. It supports translating to all current format units except ``"w*"``.
Sample functions using Argument Clinic exercise all major features,
including positional-only argument parsing.
Extending Argument Clinic
-------------------------
The prototype also currently provides an experimental extension mechanism,
allowing adding support for new types on-the-fly. See ``Modules/posixmodule.c``
in the prototype for an example of its use.
Notes / TBD
===========
* Guido proposed having the "function docstring" be hand-written inline,
in the middle of the output, something like this:
::
/*[clinic]
... prototype and parameters (including parameter docstrings) go here
[clinic]*/
... some output ...
/*[clinic docstring start]*/
... hand-edited function docstring goes here <-- you edit this by hand!
/*[clinic docstring end]*/
... more output
/*[clinic output end]*/
I tried it this way and don't like it--I think it's clumsy. I prefer that
everything you write goes in one place, rather than having an island of
hand-edited stuff in the middle of the DSL output.
* Do we need to support tuple unpacking? (The "``(OOO)``" style format string.)
Boy I sure hope not.
* What about Python functions that take no arguments? This syntax doesn't
provide for that. Perhaps a lone indented "None" should mean "no arguments"?
* This approach removes some dynamism / flexibility. With the existing
syntax one could theoretically pass in different encodings at runtime for
the "``es``"/"``et``" format units. AFAICT CPython doesn't do this itself,
however it's possible external users might do this. (Trivia: there are no
uses of "``es``" exercised by regrtest, and all the uses of "``et``"
exercised are in socketmodule.c, except for one in _ssl.c. They're all
static, specifying the encoding ``"idna"``.)
* Right now the "basename" flag on a function changes the ``#define methoddef`` name
too. Should it, or should the #define'd methoddef name always be
``{module_name}_{function_name}`` ?
References
==========
.. [1] ``PyArg_ParseTuple()``:
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple
.. [2] ``PyArg_ParseTupleAndKeywords()``:
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords
.. [3] ``PyArg_`` format units:
http://docs.python.org/3/c-api/arg.html#strings-and-buffers
.. [4] Keyword parameters for extension functions:
http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions
.. [5] ``shlex.split()``:
http://docs.python.org/3/library/shlex.html#shlex.split
.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section:
http://docs.python.org/3/c-api/arg.html#other-objects
.. [7] Argument Clinic prototype:
https://bitbucket.org/larry/python-clinic/
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: