Add alternative argument clinic DSL PEP

This commit is contained in:
Nick Coghlan 2013-03-13 22:54:21 -07:00
parent 4ffae15866
commit bb22bceeae
1 changed files with 359 additions and 0 deletions

359
pep-0437.txt Normal file
View File

@ -0,0 +1,359 @@
PEP: 0437
Title: A DSL for specifying signatures, annotations and argument converters
Version: $Revision$
Last-Modified: $Date$
Author: Stefan Krah <skrah@bytereef.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2013-03-11
Python-Version: 3.4
Post-History:
Resolution:
Abstract
========
The Python C-API currently has no mechanism for specifying and auto-generating
function signatures, annotations or custom argument converters.
There are several possible approaches to the problem. Cython uses *cdef*
definitions in *.pyx* files to generate the required information. However,
CPython's C-API functions often require additional initialization and
cleanup snippets that would be hard to specify in a *cdef*.
PEP 436 proposes a domain specific language (DSL) enclosed in C comments
that largely resembles a per-parameter configuration file. A preprocessor
reads the comment and emits an argument parsing function, docstrings and
a header for the function that utilizes the results of the parsing step.
The latter function is subsequently referred to as the *implementation
function*.
Rationale
=========
Opinions differ regarding the suitability of the PEP 436 DSL in the context
of a C file. This PEP proposes an alternative DSL. The specific issues with
PEP 436 that spurred the counter proposal will be explained in the final
section of this PEP.
Scope
=====
The PEP focuses exclusively on the DSL. Topics like the location of docstrings
are outside the scope of this PEP. It is however vital that the DSL is suitable
for generating custom argument parsers, a feature that is already implemented
in Cython. Therefore, one of the goals of this PEP is to keep the DSL close
to existing solutions, thus facilitating a possible inclusion of the relevant
parts of Cython into the CPython source tree.
DSL overview
============
Type safety and annotations
---------------------------
A conversion from a Python to a C value is fully defined by the type of
the converter function. The PyArg_Parse* family of functions accepts
custom converters in addition to the well-known default converters "i",
"f", etc.
This PEP views the default converters as abstract functions, regardless
of how they are actually implemented.
Include/converters.h
--------------------
Converter functions must be forward-declared. All converter functions
shall be entered into the file Include/converters.h. The file is read
by the preprocessor prior to translating .c files. This is an excerpt::
/*[converter]
##### Default converters #####
"s": str -> const char *res;
"s*": [str, bytes, bytearray, rw_buffer] -> Py_buffer &res;
[...]
"es#": str -> (const char *res_encoding, char **res, Py_ssize_t *res_length);
[...]
##### Custom converters #####
path_converter: [str, bytes, int] -> path_t &res;
OS_STAT_DIR_FD_CONVERTER: [int, None] -> int res;
[converter_end]*/
Converters are specified by their name, Python input type(s) and C output
type(s). Default converters must be have quoted names, custom converters
must have regular names. A Python type is given by its name. If a function
accepts multiple Python types, the set is written in list form.
Since the default converters may have multiple implicit return values,
the C output type(s) are written according to the following convention:
The main return value must be named *res*. This is a placeholder for
the actual variable name given later in the DSL. Additional implicit
return values must be prefixed by *res_*.
By default the variables are passed by value to the implementation function.
If the address should be passed instead, *res* must be prefixed with an
ampersand.
Additional declarations may be placed into .c files. Duplicate declarations
are allowed as long as the function types are identical.
TBD: Make a list of fantasy types like *rw_buffer*.
Function specifications
-----------------------
Keyword arguments
^^^^^^^^^^^^^^^^^
This example contains the definition of os.stat. The individual sections
will be explained in detail. Grammatically, the whole define block consists
of a function specification and an output section. The function specification
in turn consists of a declaration section, a C-declaration section and a
cleanup code section. Sections within the function specification are
separated in yacc style by '%%'::
/*[define posix_stat]
def os.stat(path: path_converter, *, dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
follow_symlinks: "p" = True) -> os.stat_result: pass
%%
path_t path = PATH_T_INITIALIZE("stat", 0, 1);
int dir_fd = DEFAULT_DIR_FD;
int follow_symlinks = 1;
%%
path_cleanup(&path);
[define_end]*/
<literal C output>
/*[define_output_end]*/
Define block
~~~~~~~~~~~~
The function specification block starts with a ``/*[define`` token, followed
by an optional C function name, followed by a right bracket. If the C function
name is not given, it is generated from the declaration name. In the example,
omitting the name *posix_stat* would result in a C function name of *os_stat*.
Declaration
~~~~~~~~~~~
The required declaration is (almost) a valid Python function definition. The
'def' keyword and the function body are redundant, but the author of this PEP
finds the definition more readable if they are present.
The function name may be a path instead of a plain identifier. Each argument
is annotated with the name of the converter function that will be applied to it.
Default values are given in the usual Python manner and may be any valid
Python expression.
The return value may be any Python expression. Usually it will be the name
of an object, but alternative return values could be specified in list form.
C-declarations
~~~~~~~~~~~~~~
This section contains C variable declarations. Since the converter functions
have been declared beforehand, the preprocessor can type-check the declarations.
Cleanup
~~~~~~~
The cleanup section contains literal C code that will be inserted unmodified
after the implementation function.
Output
~~~~~~
The output section contains the code emitted by the preprocessor.
Positional-only arguments
^^^^^^^^^^^^^^^^^^^^^^^^^
Functions that do not take keyword arguments are indicated by the presence
of the *slash* special parameter::
/*[define stat_float_times]
def os.stat_float_times(/, newval: "i") -> os.stat_result: pass
%%
int newval = -1;
[define_end]*/
The preprocessor translates this definition to a PyArg_ParseTuple() call.
All arguments to the right of the slash are optional arguments.
Left and right optional arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Some legacy functions contain optional arguments groups both to the left and
right of a central parameter. It is debatable whether a new tool should support
such functions. For completeness' sake, this is the proposed syntax::
/*[define]
def curses.window.addch(y: "i", x: "i", ch: "O", attr: "l") -> None
where groups = [[ch], [ch, attr], [y, x, ch], [y, x, ch, attr]]
%%
int newval = -1;
[define_end]*/
Here *ch* is the central parameter, *attr* can optionally be added on the
right, and the group [y, x] can optionally be added on the left.
Essentially the rule is that all ordered combinations of the central
parameter and the optional groups must be possible such that no two
combinations have the same length.
This is concisely expressed by putting the central parameter first in
the list and subsequently adding the optional arguments groups to the
left and right.
Flexibility in formatting
=========================
If the above os.stat example is considered too compact, it can easily be
formatted this way::
/*[define posix_stat]
def os.stat(path: path_converter,
*,
dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
follow_symlinks: "p" = True)
-> os.stat_result: pass
%%
path_t path = PATH_T_INITIALIZE("stat", 0, 1);
int dir_fd = DEFAULT_DIR_FD;
int follow_symlinks = 1;
%%
path_cleanup(&path);
[define_end]*/
<literal C output>
/*[define_output_end]*/
Easy validation of the definition
=================================
How can an inexperienced user validate a definition like os.stat? Simply
by changing os.stat to os_stat, defining missing converters and pasting
the definition into the Python interactive interpreter!
In fact, a converters.py module could be auto-generated from converters.h.
Reference implementation
========================
A reference implementation is available at `issue 16612`_. Since this PEP
was written under time constraints and the author is unfamiliar with the
PLY toolchain, the software is written in Standard ML and utilizes the
ml-yacc/ml-lex toolchain.
The grammar is conflict-free and available in ml-yacc readable BNF form.
Two tools are available:
* *printsemant* reads a converter header and a .c file and dumps
the semantically checked parse tree to stdout.
* *preprocess* reads a converter header and a .c file and dumps
the preprocessed .c file to stdout.
Known deficiencies:
* The Python 'test' expression is not semantically checked. The syntax
however is checked since it is part of the grammar.
* The lexer does not handle triple quoted strings.
* The *preprocess* tool does not emit code for the left-and-right optional
arguments case. The *printsemant* tool can deal with this case.
* Since the *preprocess* tool generates the output from the parse
tree, the original indentation of the define block is lost.
Grammar
=======
TBD: The grammar exists in ml-yacc readable form, but should probably be
included here in EBNF notation.
Comparison with PEP 436
=======================
The author of this PEP has the following concerns about the DSL proposed
in PEP 436:
* The whitespace sensitive configuration file like syntax looks out
of place in a C file.
* The structure of the function definition gets lost in the per-parameter
specifications. Keywords like positional-only, required and keyword-only
are scattered across too many different places.
By contrast, in the alternative DSL the structure of the function
definition can be understood at a single glance.
* The PEP 436 DSL has 14 documented flags and at least one undocumented
(allow_fd) flag. Figuring out which of the 2**15 possible combinations
are valid places an unnecessary burden on the user.
Experience with the PEP-3118 buffer flags has shown that sorting out
(and exhaustively testing!) valid combinations is an extremely tedious
task. The PEP-3118 flags are still not well understood by many people.
By contrast, the alternative DSL has a central file Include/converters.h
that can be quickly searched for the desired converter. Many of the
converters are already known, perhaps even memorized by people (due
to frequent use).
* The PEP 436 DSL allows too much freedom. Types can apparently be omitted,
the preprocessor accepts (and ignores) unknown keywords, sometimes adding
white space after a docstring results in an assertion error.
The alternative DSL on the other hand allows no such freedoms. Omitting
converter or return value annotations is plainly a syntax error. The
LALR(1) grammar is unambiguous and specified for the complete translation
unit.
Copyright
=========
This document is licensed under the `Open Publication License`_.
References and Footnotes
========================
.. _issue 16612: http://bugs.python.org/issue16612
.. _Open Publication License: http://www.opencontent.org/openpub/