2016-03-31 10:45:40 -04:00
|
|
|
|
PEP: 437
|
2013-03-14 01:54:21 -04:00
|
|
|
|
Title: A DSL for specifying signatures, annotations and argument converters
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Stefan Krah <skrah@bytereef.org>
|
2013-05-18 03:30:25 -04:00
|
|
|
|
Status: Rejected
|
2013-03-14 01:54:21 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
2021-02-09 11:54:26 -05:00
|
|
|
|
Created: 11-Mar-2013
|
2013-03-14 01:54:21 -04:00
|
|
|
|
Python-Version: 3.4
|
|
|
|
|
Post-History:
|
2017-06-11 15:02:39 -04:00
|
|
|
|
Resolution: https://mail.python.org/pipermail/python-dev/2013-May/126117.html
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
The Python C-API currently has no mechanism for specifying and auto-generating
|
|
|
|
|
function signatures, annotations or custom argument converters.
|
|
|
|
|
|
|
|
|
|
There are several possible approaches to the problem. Cython uses *cdef*
|
|
|
|
|
definitions in *.pyx* files to generate the required information. However,
|
|
|
|
|
CPython's C-API functions often require additional initialization and
|
|
|
|
|
cleanup snippets that would be hard to specify in a *cdef*.
|
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
:pep:`436` proposes a domain specific language (DSL) enclosed in C comments
|
2013-03-14 01:54:21 -04:00
|
|
|
|
that largely resembles a per-parameter configuration file. A preprocessor
|
|
|
|
|
reads the comment and emits an argument parsing function, docstrings and
|
|
|
|
|
a header for the function that utilizes the results of the parsing step.
|
|
|
|
|
|
|
|
|
|
The latter function is subsequently referred to as the *implementation
|
|
|
|
|
function*.
|
|
|
|
|
|
|
|
|
|
|
2013-05-18 03:30:25 -04:00
|
|
|
|
Rejection Notice
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
This PEP was rejected by Guido van Rossum at PyCon US 2013. However, several
|
|
|
|
|
of the specific issues raised by this PEP were taken into account when
|
|
|
|
|
designing the `second iteration of the PEP 436 DSL`_.
|
|
|
|
|
|
|
|
|
|
|
2013-03-14 01:54:21 -04:00
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
Opinions differ regarding the suitability of the :pep:`436` DSL in the context
|
2013-03-14 01:54:21 -04:00
|
|
|
|
of a C file. This PEP proposes an alternative DSL. The specific issues with
|
2022-01-21 06:03:51 -05:00
|
|
|
|
:pep:`436` that spurred the counter proposal will be explained in the final
|
2013-03-14 01:54:21 -04:00
|
|
|
|
section of this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scope
|
|
|
|
|
=====
|
|
|
|
|
|
2013-03-14 07:50:00 -04:00
|
|
|
|
The PEP focuses exclusively on the DSL. Topics like the output locations of
|
|
|
|
|
docstrings or the generated code are outside the scope of this PEP.
|
|
|
|
|
|
|
|
|
|
It is however vital that the DSL is suitable for generating custom argument
|
|
|
|
|
parsers, a feature that is already implemented in Cython. Therefore, one of
|
|
|
|
|
the goals of this PEP is to keep the DSL close to existing solutions, thus
|
|
|
|
|
facilitating a possible inclusion of the relevant parts of Cython into the
|
|
|
|
|
CPython source tree.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DSL overview
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
Type safety and annotations
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
|
|
A conversion from a Python to a C value is fully defined by the type of
|
|
|
|
|
the converter function. The PyArg_Parse* family of functions accepts
|
|
|
|
|
custom converters in addition to the well-known default converters "i",
|
|
|
|
|
"f", etc.
|
|
|
|
|
|
|
|
|
|
This PEP views the default converters as abstract functions, regardless
|
|
|
|
|
of how they are actually implemented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Include/converters.h
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
Converter functions must be forward-declared. All converter functions
|
|
|
|
|
shall be entered into the file Include/converters.h. The file is read
|
|
|
|
|
by the preprocessor prior to translating .c files. This is an excerpt::
|
|
|
|
|
|
|
|
|
|
/*[converter]
|
|
|
|
|
##### Default converters #####
|
|
|
|
|
"s": str -> const char *res;
|
|
|
|
|
"s*": [str, bytes, bytearray, rw_buffer] -> Py_buffer &res;
|
|
|
|
|
[...]
|
|
|
|
|
"es#": str -> (const char *res_encoding, char **res, Py_ssize_t *res_length);
|
|
|
|
|
[...]
|
|
|
|
|
##### Custom converters #####
|
|
|
|
|
path_converter: [str, bytes, int] -> path_t &res;
|
|
|
|
|
OS_STAT_DIR_FD_CONVERTER: [int, None] -> int res;
|
|
|
|
|
[converter_end]*/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Converters are specified by their name, Python input type(s) and C output
|
2013-03-14 07:40:29 -04:00
|
|
|
|
type(s). Default converters must have quoted names, custom converters must
|
|
|
|
|
have regular names. A Python type is given by its name. If a function accepts
|
|
|
|
|
multiple Python types, the set is written in list form.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
Since the default converters may have multiple implicit return values,
|
|
|
|
|
the C output type(s) are written according to the following convention:
|
|
|
|
|
|
|
|
|
|
The main return value must be named *res*. This is a placeholder for
|
|
|
|
|
the actual variable name given later in the DSL. Additional implicit
|
|
|
|
|
return values must be prefixed by *res_*.
|
|
|
|
|
|
|
|
|
|
By default the variables are passed by value to the implementation function.
|
|
|
|
|
If the address should be passed instead, *res* must be prefixed with an
|
|
|
|
|
ampersand.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Additional declarations may be placed into .c files. Duplicate declarations
|
|
|
|
|
are allowed as long as the function types are identical.
|
|
|
|
|
|
2013-03-14 08:04:59 -04:00
|
|
|
|
It is encouraged to declare custom converter types a second time right
|
|
|
|
|
above the converter function definition. The preprocessor will then catch
|
|
|
|
|
any mismatch between the declarations.
|
|
|
|
|
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2013-03-14 08:11:37 -04:00
|
|
|
|
In order to keep the converter complexity manageable, PY_SSIZE_T_CLEAN will
|
|
|
|
|
be deprecated and Py_ssize_t will be assumed for all length arguments.
|
|
|
|
|
|
|
|
|
|
|
2013-03-14 01:54:21 -04:00
|
|
|
|
TBD: Make a list of fantasy types like *rw_buffer*.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Function specifications
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Keyword arguments
|
|
|
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
2013-03-14 07:40:29 -04:00
|
|
|
|
This example contains the definition of os.stat. The individual sections will
|
|
|
|
|
be explained in detail. Grammatically, the whole define block consists of a
|
|
|
|
|
function specification and an output section. The function specification in
|
|
|
|
|
turn consists of a declaration section, an optional C-declaration section and
|
|
|
|
|
an optional cleanup code section. Sections within the function specification
|
|
|
|
|
are separated in yacc style by '%%'::
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
/*[define posix_stat]
|
|
|
|
|
def os.stat(path: path_converter, *, dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
|
|
|
|
|
follow_symlinks: "p" = True) -> os.stat_result: pass
|
|
|
|
|
%%
|
|
|
|
|
path_t path = PATH_T_INITIALIZE("stat", 0, 1);
|
|
|
|
|
int dir_fd = DEFAULT_DIR_FD;
|
|
|
|
|
int follow_symlinks = 1;
|
|
|
|
|
%%
|
|
|
|
|
path_cleanup(&path);
|
|
|
|
|
[define_end]*/
|
|
|
|
|
|
|
|
|
|
<literal C output>
|
|
|
|
|
|
|
|
|
|
/*[define_output_end]*/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Define block
|
|
|
|
|
~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
The function specification block starts with a ``/*[define`` token, followed
|
|
|
|
|
by an optional C function name, followed by a right bracket. If the C function
|
|
|
|
|
name is not given, it is generated from the declaration name. In the example,
|
|
|
|
|
omitting the name *posix_stat* would result in a C function name of *os_stat*.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Declaration
|
|
|
|
|
~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
The required declaration is (almost) a valid Python function definition. The
|
|
|
|
|
'def' keyword and the function body are redundant, but the author of this PEP
|
|
|
|
|
finds the definition more readable if they are present.
|
|
|
|
|
|
|
|
|
|
The function name may be a path instead of a plain identifier. Each argument
|
|
|
|
|
is annotated with the name of the converter function that will be applied to it.
|
|
|
|
|
|
|
|
|
|
Default values are given in the usual Python manner and may be any valid
|
|
|
|
|
Python expression.
|
|
|
|
|
|
|
|
|
|
The return value may be any Python expression. Usually it will be the name
|
|
|
|
|
of an object, but alternative return values could be specified in list form.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C-declarations
|
|
|
|
|
~~~~~~~~~~~~~~
|
|
|
|
|
|
2013-03-14 07:40:29 -04:00
|
|
|
|
This optional section contains C variable declarations. Since the converter
|
|
|
|
|
functions have been declared beforehand, the preprocessor can type-check
|
|
|
|
|
the declarations.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cleanup
|
|
|
|
|
~~~~~~~
|
|
|
|
|
|
2013-03-14 07:40:29 -04:00
|
|
|
|
The optional cleanup section contains literal C code that will be inserted
|
|
|
|
|
unmodified after the implementation function.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Output
|
|
|
|
|
~~~~~~
|
|
|
|
|
|
|
|
|
|
The output section contains the code emitted by the preprocessor.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Positional-only arguments
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
Functions that do not take keyword arguments are indicated by the presence
|
|
|
|
|
of the *slash* special parameter::
|
|
|
|
|
|
|
|
|
|
/*[define stat_float_times]
|
|
|
|
|
def os.stat_float_times(/, newval: "i") -> os.stat_result: pass
|
|
|
|
|
%%
|
|
|
|
|
int newval = -1;
|
|
|
|
|
[define_end]*/
|
|
|
|
|
|
|
|
|
|
The preprocessor translates this definition to a PyArg_ParseTuple() call.
|
|
|
|
|
All arguments to the right of the slash are optional arguments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Left and right optional arguments
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
Some legacy functions contain optional arguments groups both to the left and
|
|
|
|
|
right of a central parameter. It is debatable whether a new tool should support
|
|
|
|
|
such functions. For completeness' sake, this is the proposed syntax::
|
|
|
|
|
|
|
|
|
|
/*[define]
|
2013-03-14 07:40:29 -04:00
|
|
|
|
def curses.window.addch(y: "i", x: "i", ch: "O", attr: "l") -> None: pass
|
2013-03-14 01:54:21 -04:00
|
|
|
|
where groups = [[ch], [ch, attr], [y, x, ch], [y, x, ch, attr]]
|
|
|
|
|
[define_end]*/
|
|
|
|
|
|
|
|
|
|
Here *ch* is the central parameter, *attr* can optionally be added on the
|
|
|
|
|
right, and the group [y, x] can optionally be added on the left.
|
|
|
|
|
|
|
|
|
|
Essentially the rule is that all ordered combinations of the central
|
|
|
|
|
parameter and the optional groups must be possible such that no two
|
|
|
|
|
combinations have the same length.
|
|
|
|
|
|
|
|
|
|
This is concisely expressed by putting the central parameter first in
|
|
|
|
|
the list and subsequently adding the optional arguments groups to the
|
|
|
|
|
left and right.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Flexibility in formatting
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
If the above os.stat example is considered too compact, it can easily be
|
|
|
|
|
formatted this way::
|
|
|
|
|
|
|
|
|
|
/*[define posix_stat]
|
|
|
|
|
def os.stat(path: path_converter,
|
|
|
|
|
*,
|
|
|
|
|
dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
|
|
|
|
|
follow_symlinks: "p" = True)
|
|
|
|
|
-> os.stat_result: pass
|
|
|
|
|
%%
|
|
|
|
|
path_t path = PATH_T_INITIALIZE("stat", 0, 1);
|
|
|
|
|
int dir_fd = DEFAULT_DIR_FD;
|
|
|
|
|
int follow_symlinks = 1;
|
|
|
|
|
%%
|
|
|
|
|
path_cleanup(&path);
|
|
|
|
|
[define_end]*/
|
|
|
|
|
|
|
|
|
|
<literal C output>
|
|
|
|
|
|
|
|
|
|
/*[define_output_end]*/
|
|
|
|
|
|
|
|
|
|
|
2013-03-14 10:18:24 -04:00
|
|
|
|
Benefits of a compact notation
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
The advantages of a concise notation are especially obvious when a large
|
|
|
|
|
number of parameters is involved. The argument parsing part of
|
|
|
|
|
``_posixsubprocess.fork_exec`` is fully specified by this definition::
|
|
|
|
|
|
|
|
|
|
/*[define subprocess_fork_exec]
|
|
|
|
|
def _posixsubprocess.fork_exec(
|
|
|
|
|
process_args: "O", executable_list: "O",
|
|
|
|
|
close_fds: "p", py_fds_to_keep: "O",
|
|
|
|
|
cwd_obj: "O", env_list: "O",
|
|
|
|
|
p2cread: "i", p2cwrite: "i", c2pread: "i", c2pwrite: "i",
|
|
|
|
|
errread: "i", errwrite: "i", errpipe_read: "i", errpipe_write: "i",
|
|
|
|
|
restore_signals: "i", call_setsid: "i", preexec_fn: "i", /) -> int: pass
|
|
|
|
|
[define_end]*/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that the *preprocess* tool currently emits a redundant C-declaration
|
|
|
|
|
section for this example, so the output is longer than necessary.
|
|
|
|
|
|
|
|
|
|
|
2013-03-14 01:54:21 -04:00
|
|
|
|
Easy validation of the definition
|
|
|
|
|
=================================
|
|
|
|
|
|
|
|
|
|
How can an inexperienced user validate a definition like os.stat? Simply
|
|
|
|
|
by changing os.stat to os_stat, defining missing converters and pasting
|
|
|
|
|
the definition into the Python interactive interpreter!
|
|
|
|
|
|
|
|
|
|
In fact, a converters.py module could be auto-generated from converters.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference implementation
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
A reference implementation is available at `issue 16612`_. Since this PEP
|
|
|
|
|
was written under time constraints and the author is unfamiliar with the
|
|
|
|
|
PLY toolchain, the software is written in Standard ML and utilizes the
|
|
|
|
|
ml-yacc/ml-lex toolchain.
|
|
|
|
|
|
|
|
|
|
The grammar is conflict-free and available in ml-yacc readable BNF form.
|
|
|
|
|
|
|
|
|
|
Two tools are available:
|
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* *printsemant* reads a converter header and a .c file and dumps
|
|
|
|
|
the semantically checked parse tree to stdout.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* *preprocess* reads a converter header and a .c file and dumps
|
|
|
|
|
the preprocessed .c file to stdout.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Known deficiencies:
|
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* The Python 'test' expression is not semantically checked. The syntax
|
|
|
|
|
however is checked since it is part of the grammar.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* The lexer does not handle triple quoted strings.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* C declarations are parsed in a primitive way. The final implementation
|
|
|
|
|
should utilize 'declarator' and 'init-declarator' from the C grammar.
|
2013-03-14 07:40:29 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* The *preprocess* tool does not emit code for the left-and-right optional
|
|
|
|
|
arguments case. The *printsemant* tool can deal with this case.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* Since the *preprocess* tool generates the output from the parse
|
|
|
|
|
tree, the original indentation of the define block is lost.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Grammar
|
|
|
|
|
=======
|
|
|
|
|
|
|
|
|
|
TBD: The grammar exists in ml-yacc readable form, but should probably be
|
|
|
|
|
included here in EBNF notation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comparison with PEP 436
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
The author of this PEP has the following concerns about the DSL proposed
|
2022-01-21 06:03:51 -05:00
|
|
|
|
in :pep:`436`:
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* The whitespace sensitive configuration file like syntax looks out
|
|
|
|
|
of place in a C file.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
* The structure of the function definition gets lost in the per-parameter
|
|
|
|
|
specifications. Keywords like positional-only, required and keyword-only
|
|
|
|
|
are scattered across too many different places.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
By contrast, in the alternative DSL the structure of the function
|
|
|
|
|
definition can be understood at a single glance.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
* The :pep:`436` DSL has 14 documented flags and at least one undocumented
|
2016-05-03 04:18:02 -04:00
|
|
|
|
(allow_fd) flag. Figuring out which of the 2**15 possible combinations
|
|
|
|
|
are valid places an unnecessary burden on the user.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
Experience with the :pep:`3118` buffer flags has shown that sorting out
|
2016-05-03 04:18:02 -04:00
|
|
|
|
(and exhaustively testing!) valid combinations is an extremely tedious
|
2022-01-21 06:03:51 -05:00
|
|
|
|
task. The :pep:`3118` flags are still not well understood by many people.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
By contrast, the alternative DSL has a central file Include/converters.h
|
|
|
|
|
that can be quickly searched for the desired converter. Many of the
|
|
|
|
|
converters are already known, perhaps even memorized by people (due
|
|
|
|
|
to frequent use).
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
* The :pep:`436` DSL allows too much freedom. Types can apparently be omitted,
|
2016-05-03 04:18:02 -04:00
|
|
|
|
the preprocessor accepts (and ignores) unknown keywords, sometimes adding
|
|
|
|
|
white space after a docstring results in an assertion error.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2016-05-03 04:18:02 -04:00
|
|
|
|
The alternative DSL on the other hand allows no such freedoms. Omitting
|
|
|
|
|
converter or return value annotations is plainly a syntax error. The
|
|
|
|
|
LALR(1) grammar is unambiguous and specified for the complete translation
|
|
|
|
|
unit.
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document is licensed under the `Open Publication License`_.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References and Footnotes
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
.. _issue 16612: http://bugs.python.org/issue16612
|
|
|
|
|
|
|
|
|
|
.. _Open Publication License: http://www.opencontent.org/openpub/
|
|
|
|
|
|
2013-05-18 03:30:25 -04:00
|
|
|
|
.. _second iteration of the PEP 436 DSL:
|
|
|
|
|
http://hg.python.org/peps/rev/a2fa10b2424b
|
2013-03-14 01:54:21 -04:00
|
|
|
|
|
2013-03-15 17:51:01 -04:00
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|