diff --git a/pep-0436.txt b/pep-0436.txt new file mode 100644 index 000000000..a9534b424 --- /dev/null +++ b/pep-0436.txt @@ -0,0 +1,480 @@ +PEP: 436 +Title: The Argument Clinic DSL +Version: $Revision$ +Last-Modified: $Date$ +Author: Larry Hastings +Discussions-To: Python-Dev +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 22-Feb-2013 + + +Abstract +======== + +This document proposes "Argument Clinic", a DSL designed to facilitate +argument processing for built-in functions in the implementation of +CPython. + + +Rationale and Goals +=================== + +The primary implementation of Python, "CPython", is written in a +mixture of Python and C. One of the implementation details of CPython +is what are called "built-in" functions -- functions available to +Python programs but written in C. When a Python program calls a +built-in function and passes in arguments, those arguments must be +translated from Python values into C values. This process is called +"parsing arguments". + +As of CPython 3.3, arguments to functions are primarily parsed with +one of two functions: the original ``PyArg_ParseTuple()``, [1]_ and +the more modern ``PyArg_ParseTupleAndKeywords()``. [2]_ The former +function only handles positional parameters; the latter also +accommodates keyword and keyword-only parameters, and is preferred for +new code. + +``PyArg_ParseTuple()`` was a reasonable approach when it was first +conceived. The programmer specified the translation for the arguments +in a "format string": [3]_ each parameter matched to a "format unit", +a one-or-two character sequence telling ``PyArg_ParseTuple()`` what +Python types to accept and how to translate them into the appropriate +C value for that parameter. There were only a dozen or so of these +"format units", and each one was distinct and easy to understand. + +Over the years the ``PyArg_Parse`` interface has been extended in +numerous ways. The modern API is quite complex, to the point that it +is somewhat painful to use. Consider: + + * There are now forty different "format units"; a few are even three + characters long. This makes it difficult to understand what the + format string says without constantly cross-indexing it with the + documentation. + * There are also six meta-format units that may be buried in the + format string. (They are: ``"()|$:;"``.) + * The more format units are added, the less likely it is the + implementer can pick an easy-to-use mnemonic for the format unit, + because the character of choice is probably already in use. In + other words, the more format units we have, the more obtuse the + format units become. + * Several format units are nearly identical to others, having only + subtle differences. This makes understanding the exact semantics + of the format string even harder. + * The docstring is specified as a static C string, which is mildly + bothersome to read and edit. + * When adding a new parameter to a function using + ``PyArg_ParseTupleAndKeywords()``, it's necessary to touch six + different places in the code: [4]_ + + * Declaring the variable to store the argument. + * Passing in a pointer to that variable in the correct spot in + ``PyArg_ParseTupleAndKeywords()``, also passing in any + "length" or "converter" arguments in the correct order. + * Adding the name of the argument in the correct spot of the + "keywords" array passed in to + ``PyArg_ParseTupleAndKeywords()``. + * Adding the format unit to the correct spot in the format + string. + * Adding the parameter to the prototype in the docstring. + * Documenting the parameter in the docstring. + + * There is currently no mechanism for builtin functions to provide + their "signature" information (see ``inspect.getfullargspec`` and + ``inspect.Signature``). Adding this information using a mechanism + similar to the existing ``PyArg_Parse`` functions would require + repeating ourselves yet again. + +The goal of Argument Clinic is to replace this API with a mechanism +inheriting none of these downsides: + + * You need specify each parameter only once. + * All information about a parameter is kept together in one place. + * For each parameter, you specify its type in C; Argument Clinic + handles the translation from Python value into C value for you. + * Argument Clinic also allows for fine-tuning of argument processing + behavior with highly-readable "flags", both per-parameter and + applying across the whole function. + * Docstrings are written in plain text. + * From this, Argument Clinic generates for you all the mundane, + repetitious code and data structures CPython needs internally. + Once you've specified the interface, the next step is simply to + write your implementation using native C types. Every detail of + argument parsing is handled for you. + +Future goals of Argument Clinic include: + + * providing signature information for builtins, and + * speed improvements to the generated code. + + +DSL Syntax Summary +================== + +The Argument Clinic DSL is specified as a comment embedded in a C +file, as follows. The "Example" column on the right shows you sample +input to the Argument Clinic DSL, and the "Section" column on the left +specifies what each line represents in turn. + +:: + + +-----------------------+-----------------------------------------------------+ + | Section | Example | + +-----------------------+-----------------------------------------------------+ + | Clinic DSL start | /*[clinic] | + | Function declaration | module.function_name -> return_annotation | + | Function flags | flag flag2 flag3=value | + | Parameter declaration | type name = default | + | Parameter flags | flag flag2 flag3=value | + | Parameter docstring | Lorem ipsum dolor sit amet, consectetur | + | | adipisicing elit, sed do eiusmod tempor | + | Function docstring | Lorem ipsum dolor sit amet, consectetur adipisicing | + | | elit, sed do eiusmod tempor incididunt ut labore et | + | Clinic DSL end | [clinic]*/ | + | Clinic output | ... | + | Clinic output end | /*[clinic end output:]*/ | + +-----------------------+-----------------------------------------------------+ + + +General Behavior Of the Argument Clinic DSL +------------------------------------------- + +All lines support ``#`` as a line comment delimiter *except* +docstrings. Blank lines are always ignored. + +Like Python itself, leading whitespace is significant in the Argument +Clinic DSL. The first line of the "function" section is the +declaration; all subsequent lines at the same indent are function +flags. Once you indent, the first line is a parameter declaration; +subsequent lines at that indent are parameter flags. Indent one more +time for the lines of the parameter docstring. Finally, dedent back +to the same level as the function declaration for the function +docstring. + + +Function Declaration +-------------------- + +The return annotation is optional. If skipped, the arrow ("``->``") +must also be omitted. + + +Parameter Declaration +--------------------- + +The "type" is a C type. If it's a pointer type, you must specify a +single space between the type and the "``*``", and zero spaces between +the "``*``" and the name. (e.g. "``PyObject *foo``", not "``PyObject* +foo``") + +The "name" must be a legal C identifier. + +The "default" is a Python value. Default values are optional; if not +specified you must omit the equals sign too. Parameters which don't +have a default are implicitly required. The default value is +dynamically assigned, "live" in the generated C code, and although +it's specified as a Python value, it's translated into a native C +value in the generated C code. + +It's explicitly permitted to end the parameter declaration line with a +semicolon, though the semicolon is optional. This is intended to +allow directly cutting and pasting in declarations from C code. +However, the preferred style is without the semicolon. + + +Flags +----- + +"Flags" are like "``make -D``" arguments. They're unordered. Flags +lines are parsed much like the shell (specifically, using +``shlex.split()`` [5]_ ). You can have as many flag lines as you +like. Specifying a flag twice is currently an error. + +Supported flags for functions: + +``basename`` + The basename to use for the generated C functions. By default this + is the name of the function from the DSL, only with periods replaced + by underscores. + +``positional-only`` + This function only supports positional parameters, not keyword + parameters. See `Functions With Positional-Only Parameters`_ below. + +Supported flags for parameters: + +``bitwise`` + If the Python integer passed in is signed, copy the bits directly + even if it is negative. Only valid for unsigned integer types. + +``converter`` + Backwards-compatibility support for parameter "converter" + functions. [6]_ The value should be the name of the converter + function in C. Only valid when the type of the parameter is + ``void *``. + +``default`` + The Python value to use in place of the parameter's actual default + in Python contexts. Specifically, when specified, this value will + be used for the parameter's default in the docstring, and in the + ``Signature``. (TBD: If the string is a valid Python expression + which can be rendered into a Python value using ``eval()``, then the + result of ``eval()`` on it will be used as the default in the + ``Signature``.) Ignored if there is no default. + +``encoding`` + Encoding to use when encoding a Unicode string to a ``char *``. + Only valid when the type of the parameter is ``char *``. + +``group=`` + This parameter is part of a group of options that must either all be + specified or none specified. Parameters in the same "group" must be + contiguous. The value of the group flag is the name used for the + group variable, and therefore must be legal as a C identifier. Only + valid for functions marked "``positional-only``"; see `Functions + With Positional-Only Parameters`_ below. + +``immutable`` + Only accept immutable values. + +``keyword-only`` + This parameter (and all subsequent parameters) is keyword-only. + Keyword-only parameters must also be optional parameters. Not valid + for positional-only functions. + +``length`` + This is an iterable type, and we also want its length. The DSL will + generate a second ``Py_ssize_t`` variable; its name will be this + parameter's name appended with "``_length``". + +``nullable`` + ``None`` is a legal argument for this parameter. If ``None`` is + supplied on the Python side, the equivalent C argument will be + ``NULL``. Only valid for pointer types. + +``required`` + Normally any parameter that has a default value is automatically + optional. A parameter that has "required" set will be considered + required (non-optional) even if it has a default value. The + generated documentation will also not show any default value. + +``types`` + Space-separated list of acceptable Python types for this object. + There are also four special-case types which represent Python + protocols: + + * buffer + * mapping + * number + * sequence + +``zeroes`` + This parameter is a string type, and its value should be allowed to + have embedded zeroes. Not valid for all varieties of string + parameters. + + +Python Code +----------- + +Argument Clinic also permits embedding Python code inside C files, +which is executed in-place when Argument Clinic processes the file. +Embedded code looks like this: + +:: + + /*[python] + + # this is python code! + print("/" + "* Hello world! *" + "/") + + [python]*/ + +Any Python code is valid. Python code sections in Argument Clinic can +also be used to modify Clinic's behavior at runtime; for example, see +`Extending Argument Clinic`_. + + +Output +====== + +Argument Clinic writes its output in-line in the C file, immediately +after the section of Clinic code. For "python" sections, the output +is everything printed using ``builtins.print``. For "clinic" +sections, the output is valid C code, including: + + * a ``#define`` providing the correct ``methoddef`` structure for the + function + * a prototype for the "impl" function -- this is what you'll write + to implement this function + * a function that handles all argument processing, which calls your + "impl" function + * the definition line of the "impl" function + * and a comment indicating the end of output. + +The intention is that you will write the body of your impl function +immediately after the output -- as in, you write a left-curly-brace +immediately after the end-of-output comment and write the +implementation of the builtin in the body there. (It's a bit strange +at first, but oddly convenient.) + +Argument Clinic will define the parameters of the impl function for +you. The function will take the "self" parameter passed in +originally, all the parameters you define, and possibly some extra +generated parameters ("length" parameters; also "group" parameters, +see next section). + +Argument Clinic also writes a checksum for the output section. This +is a valuable safety feature: if you modify the output by hand, Clinic +will notice that the checksum doesn't match, and will refuse to +overwrite the file. (You can force Clinic to overwrite with the +"``-f``" command-line argument; Clinic will also ignore the checksums +when using the "``-o``" command-line argument.) + + +Functions With Positional-Only Parameters +========================================= + +A significant fraction of Python builtins implemented in C use the +older positional-only API for processing arguments +(``PyArg_ParseTuple()``). In some instances, these builtins parse +their arguments differently based on how many arguments were passed +in. This can provide some bewildering flexibility: there may be +groups of optional parameters, which must either all be specified or +none specified. And occasionally these groups are on the *left!* (For +example: ``curses.window.addch()``.) + +Argument Clinic supports these legacy use-cases with a special set of +flags. First, set the flag "``positional-only``" on the entire +function. Then, for every group of parameters that is collectively +optional, add a "``group=``" flag with a unique string to all the +parameters in that group. Note that these groups are permitted on the +right *or left* of any required parameters! However, all groups +(including the group of required parameters) must be contiguous. + +The impl function generated by Clinic will add an extra parameter for +every group, "``int _group``". This argument will be nonzero +if the group was specified on this call, and zero if it was not. + +Note that when operating in this mode, you cannot specify default +arguments. You can simulate defaults by putting parameters in +individual groups and detecting whether or not they were specified; +generally speaking it's better to simply not use "positional-only" +where it isn't absolutely necessary. (TBD: It might be possible to +relax this restriction. But adding default arguments into the mix of +groups would seemingly make calculating which groups are active a good +deal harder.) + +Also, note that it's possible to specify a set of groups to a function +such that there are several valid mappings from the number of +arguments to a valid set of groups. If this happens, Clinic will exit +with an error message. This should not be a problem, as +positional-only operation is only intended for legacy use cases, and +all the legacy functions using this quirky behavior should have +unambiguous mappings. + + +Current Status +============== + +As of this writing, there is a working prototype implementation of +Argument Clinic available online. [7]_ The prototype implements the +syntax above, and generates code using the existing ``PyArg_Parse`` +APIs. It supports translating to all current format units except +``"w*"``. Sample functions using Argument Clinic exercise all major +features, including positional-only argument parsing. + + +Extending Argument Clinic +------------------------- + +The prototype also currently provides an experimental extension +mechanism, allowing adding support for new types on-the-fly. See +``Modules/posixmodule.c`` in the prototype for an example of its use. + + +Notes / TBD +=========== + +* Guido proposed having the "function docstring" be hand-written inline, + in the middle of the output, something like this: + + :: + + /*[clinic] + ... prototype and parameters (including parameter docstrings) go here + [clinic]*/ + ... some output ... + /*[clinic docstring start]*/ + ... hand-edited function docstring goes here <-- you edit this by hand! + /*[clinic docstring end]*/ + ... more output + /*[clinic output end]*/ + + I tried it this way and don't like it -- I think it's clumsy. I + prefer that everything you write goes in one place, rather than + having an island of hand-edited stuff in the middle of the DSL + output. + +* Do we need to support tuple unpacking? (The "``(OOO)``" style + format string.) Boy I sure hope not. + +* What about Python functions that take no arguments? This syntax + doesn't provide for that. Perhaps a lone indented "None" should + mean "no arguments"? + +* This approach removes some dynamism / flexibility. With the + existing syntax one could theoretically pass in different encodings + at runtime for the "``es``"/"``et``" format units. AFAICT CPython + doesn't do this itself, however it's possible external users might + do this. (Trivia: there are no uses of "``es``" exercised by + regrtest, and all the uses of "``et``" exercised are in + socketmodule.c, except for one in _ssl.c. They're all static, + specifying the encoding ``"idna"``.) + +* Right now the "basename" flag on a function changes the ``#define + methoddef`` name too. Should it, or should the #define'd methoddef + name always be ``{module_name}_{function_name}`` ? + + +References +========== + +.. [1] ``PyArg_ParseTuple()``: + http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple + +.. [2] ``PyArg_ParseTupleAndKeywords()``: + http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords + +.. [3] ``PyArg_`` format units: + http://docs.python.org/3/c-api/arg.html#strings-and-buffers + +.. [4] Keyword parameters for extension functions: + http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions + +.. [5] ``shlex.split()``: + http://docs.python.org/3/library/shlex.html#shlex.split + +.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section: + http://docs.python.org/3/c-api/arg.html#other-objects + +.. [7] Argument Clinic prototype: + https://bitbucket.org/larry/python-clinic/ + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: