587 lines
18 KiB
ReStructuredText
587 lines
18 KiB
ReStructuredText
PEP: 638
|
|
Title: Syntactic Macros
|
|
Author: Mark Shannon <mark@hotpy.org>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 24-Sep-2020
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP adds support for syntactic macros to Python.
|
|
A macro is a compile-time function that transforms
|
|
a part of the program to allow functionality that cannot be
|
|
expressed cleanly in normal library code.
|
|
|
|
The term "syntactic" means that this sort of macro operates on the program's
|
|
syntax tree. This reduces the chance of mistranslation that can happen
|
|
with text-based substitution macros, and allows the implementation
|
|
of `hygienic macros`__.
|
|
|
|
__ https://en.wikipedia.org/wiki/Hygienic_macro
|
|
|
|
Syntactic macros allow libraries to modify the abstract syntax tree during compilation,
|
|
providing the ability to extend the language for specific domains without
|
|
adding to complexity to the language as a whole.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
New language features can be controversial, disruptive and sometimes divisive.
|
|
Python is now sufficiently powerful and complex, that many proposed additions
|
|
are a net loss for the language due to the additional complexity.
|
|
|
|
Although a language change may make certain patterns easy to express,
|
|
it will have a cost. Each new feature makes the language larger,
|
|
harder to learn and harder to understand.
|
|
Python was once described as `Python Fits Your Brain`__,
|
|
but that becomes less and less true as more and more features are added.
|
|
|
|
Because of the high cost of adding a new feature,
|
|
it is very difficult or impossible to add a feature that would benefit only
|
|
some users, regardless of how many users, or how beneficial that feature would
|
|
be to them.
|
|
|
|
The use of Python in data science and machine learning has grown very rapidly
|
|
over the last few years.
|
|
However, most of the core developers of Python do not have a background in
|
|
data science or machine learning.
|
|
This makes it extremely difficult for the core developers to determine if a
|
|
language extension for machine learning is worthwhile.
|
|
|
|
By allowing language extensions to be modular and distributable, like libraries,
|
|
domain specific extensions can be implemented without negatively impacting
|
|
users outside of that domain.
|
|
A web developer is likely to want a very different set of extensions from
|
|
a data scientist.
|
|
We need to let the community develop their own extensions.
|
|
|
|
Without some form of user defined language extensions,
|
|
there will be a constant battle between those wanting to keep the
|
|
language compact and fitting their brains, and those wanting a new feature
|
|
that suits their domain or programming style.
|
|
|
|
__ https://www.linuxjournal.com/article/4731
|
|
|
|
|
|
Improving the expressiveness of libraries for specific domains
|
|
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
Many domains see repeated patterns that are difficult or impossible
|
|
to express as a library.
|
|
Macros can allow those patterns to be expressed in a more concise and less error
|
|
prone way.
|
|
|
|
Trialing new language features
|
|
''''''''''''''''''''''''''''''
|
|
|
|
It is possible to demonstrate potential language extensions using macros.
|
|
For example, macros would have enabled the ``with`` statement and
|
|
``yield from`` expression to have been trialed.
|
|
Doing so might well have lead to a higher quality implementation
|
|
at first release, by allowing more testing
|
|
before those features were included in the language.
|
|
|
|
It is nearly impossible to make sure that a new feature is completely reliable
|
|
before it is released; bugs relating to the ``with`` and ``yield from``
|
|
features were still being fixed many years after they were released.
|
|
|
|
Long term stability for the bytecode interpreter
|
|
''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
Historically new language features have been implemented by naive compilation
|
|
of the AST into new, complex bytecode instructions.
|
|
Those bytecodes have often had their own internal flow-control, performing
|
|
operations that that could, and should, have been done in the compiler.
|
|
|
|
For example,
|
|
until recently flow control within the ``try``-``finally`` and ``with``
|
|
statements was managed by complicated bytecodes with context dependent semantics.
|
|
The control flow within those statements is now implemented in the compiler, making
|
|
the interpreter simpler and faster.
|
|
|
|
By implementing new features as AST transformations, the existing compiler can
|
|
generate the bytecode for a feature without having to modify the interpreter.
|
|
|
|
A stable interpreter is necessary if we are to improve the performance and
|
|
portability of the CPython VM.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Python is both expressive and easy to learn;
|
|
it is widely recognized as the easiest to learn widely-used programming language.
|
|
However, it is not the most flexible. That title belongs to lisp.
|
|
|
|
Because lisp is homoiconic, meaning that lisp programs are lisp data-structures,
|
|
lisp programs can be manipulated by lisp programs.
|
|
Thus much of the language can be defined in itself.
|
|
|
|
We would like that ability in Python,
|
|
without the many parentheses that characterize lisp.
|
|
Fortunately, homoiconicity is not needed for a language to be able to
|
|
manipulate itself, all that is needed is the ability to manipulate programs
|
|
after parsing, but before translation to an executable form.
|
|
|
|
Python already has the components needed.
|
|
The syntax tree of Python is available through the ``ast`` module.
|
|
All that is needed is a marker to tell the compiler that a macro is present,
|
|
and the ability for the compiler to callback into user code to manipulate the AST.
|
|
|
|
Specification
|
|
=============
|
|
|
|
Syntax
|
|
''''''
|
|
|
|
Lexical analysis
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Any sequence of identifier characters followed by an exclamation point
|
|
(exclamation mark, UK English) will be tokenized as a ``MACRO_NAME``.
|
|
|
|
Statement form
|
|
~~~~~~~~~~~~~~
|
|
|
|
::
|
|
|
|
macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as" NAME ] [ ":" NEWLINE suite ]
|
|
|
|
Expression form
|
|
~~~~~~~~~~~~~~~
|
|
|
|
::
|
|
|
|
macro_expr = MACRO_NAME "(" testlist ")"
|
|
|
|
Resolving ambiguity
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
The statement form of a macro takes precedence, so that the code
|
|
``macro_name!(x)`` will be parsed as a macro statement,
|
|
not as an expression statement containing a macro expression.
|
|
|
|
Semantics
|
|
'''''''''
|
|
|
|
Compilation
|
|
~~~~~~~~~~~
|
|
|
|
Upon encountering a ``macro`` during translation to bytecode,
|
|
the code generator will look up the macro processor registered for the macro,
|
|
and pass the AST, rooted at the macro to the processor function.
|
|
The returned AST will then be substituted for the original tree.
|
|
|
|
For macros with multiple names,
|
|
several trees will be passed to the macro processor,
|
|
but only one will be returned and substituted,
|
|
shorting the enclosing block of statements.
|
|
|
|
This process can be repeated,
|
|
to enable macros to return AST nodes including other macros.
|
|
|
|
The compiler will not look up a macro processor until that macro is reached,
|
|
so that inner macros do not need to have processors registered.
|
|
For example, in a ``switch`` macro, the ``case`` and ``default`` macros wouldn't
|
|
need processors registered as they would be eliminated by the ``switch`` processor.
|
|
|
|
To enable definition of macros to be imported,
|
|
the macros ``import!`` and ``from!`` are predefined.
|
|
They support the following syntax:
|
|
|
|
::
|
|
|
|
"import!" dotted_name "as" name
|
|
|
|
"from!" dotted_name "import" name [ "as" name ]
|
|
|
|
The ``import!`` macro performs a compile time import of ``dotted_name``
|
|
to find the macro processor, then registers it under ``name``
|
|
for the scope currently being compiled.
|
|
|
|
The ``from!`` macro performs a compile time import of ``dotted_name.name``
|
|
to find the macro processor, then registers it under ``name``
|
|
(using the ``name`` following "as", if present)
|
|
for the scope currently being compiled.
|
|
|
|
Note that, since ``import!`` and ``from!`` only define the macro for the
|
|
scope in which the import is present, all uses of a macro must be preceded by
|
|
an explicit ``import!`` or ``from!`` to improve clarity.
|
|
|
|
For example, to import the macro "compile" from "my.compiler":
|
|
|
|
::
|
|
|
|
from! my.compiler import compile
|
|
|
|
|
|
Defining macro processors
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
A macro processor is defined by a four-tuple, consisting of
|
|
``(func, kind, version, additional_names)``
|
|
|
|
* ``func`` must be a callable that takes ``len(additional_names)+1`` arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree.
|
|
* ``kind`` must be one of the following:
|
|
|
|
* ``macros.STMT_MACRO`` A statement macro where the body of the macro is indented. This is the only form which is allowed to have additional names.
|
|
* ``macros.SIBLING_MACRO`` A statement macro where the body of the macro is the next statement is the same block. The following statement is moved into the macro as its body.
|
|
* ``macros.EXPR_MACRO`` An expression macro.
|
|
|
|
* ``version`` is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.
|
|
* ``additional_names`` are the names of the additional parts of the macro, and must be a tuple of strings.
|
|
|
|
::
|
|
|
|
# (func, _ast.STMT_MACRO, VERSION, ())
|
|
stmt_macro!:
|
|
multi_statement_body
|
|
|
|
# (func, _ast.SIBLING_MACRO, VERSION, ())
|
|
sibling_macro!
|
|
single_statement_body
|
|
|
|
# (func, _ast.EXPR_MACRO, VERSION, ())
|
|
x = expr_macro!(...)
|
|
|
|
# (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",))
|
|
multi_part_macro!:
|
|
multi_statement_body
|
|
subsequent_macro_part!:
|
|
multi_statement_body
|
|
|
|
The compiler will check that the syntax used matches the declared kind.
|
|
|
|
For convenience, the decorator ``macro_processor`` is provided in the ``macros`` module to mark a function as a macro processor:
|
|
|
|
::
|
|
|
|
def macro_processor(kind, version, *additional_names):
|
|
def deco(func):
|
|
return func, kind, version, additional_names
|
|
return deco
|
|
|
|
Which can be used to help declare macro processors, for example:
|
|
|
|
::
|
|
|
|
@macros.macro_processor(macros.STMT_MACRO, 1_08)
|
|
def switch(astnode):
|
|
...
|
|
|
|
|
|
AST extensions
|
|
~~~~~~~~~~~~~~
|
|
|
|
Two new AST nodes will be needed to express macros, ``macro_stmt`` and ``macro_expr``.
|
|
|
|
::
|
|
|
|
class macro_stmt(_ast.stmt):
|
|
|
|
_fields = "name", "args", "importname", "asname", "body"
|
|
|
|
class macro_expr(_ast.expr):
|
|
|
|
_fields = "name", "args"
|
|
|
|
In addition, macro processors will needs a means to express control flow or side effecting code, that produces a value.
|
|
To support this, a new ast node, called ``stmt_expr``, that combines a statement and expression will be added.
|
|
This new ast node will be a subtype of ``expr``, but include a statement to allow side effects.
|
|
It will be compiled to bytecode by compiling the statement, then compiling the value.
|
|
|
|
::
|
|
|
|
class stmt_expr(_ast.expr):
|
|
|
|
_fields = "stmt", "value"
|
|
|
|
Hygiene and debugging
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Macros processors will often need to create new variables.
|
|
Those variables need to named in such as way as to avoid contaminating the original code and other macros.
|
|
No rules for naming will be enforced, but to ensure hygiene and help debugging, the following naming scheme is recommended:
|
|
|
|
* All generated variable names should start with a ``$``
|
|
* Purely artificial variable names should start ``$$mname`` where ``mname`` is the name of the macro.
|
|
* Variables derived from real variables should start ``$vname`` where ``vname`` is the name of the variable.
|
|
* All variable names should include the line number and the column offset, separated by an underscore.
|
|
|
|
Examples:
|
|
|
|
* Purely generated name: ``$$macro_17_0``
|
|
* Name derived from a variable for an expression macro: ``$var_12_5``
|
|
|
|
|
|
Examples
|
|
''''''''
|
|
|
|
Compile time checked data structures
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
It is common to encode tables of data in Python as large dictionaries.
|
|
However, these can be hard to maintain and error prone.
|
|
Macros allow this data to be written in a more readable format.
|
|
Then, at compile time, it can be verified and converted to an efficient format.
|
|
|
|
For example, suppose we have a two dictionary literals mapping codes to names,
|
|
and vice versa.
|
|
This is error prone, as the dictionaries may have duplicate keys,
|
|
or one table may not be the inverse of the other.
|
|
A macro could generate the two mappings from a single table and,
|
|
at the same time, verify that no duplicates are present.
|
|
|
|
::
|
|
|
|
color_to_code = {
|
|
"red": 1,
|
|
"blue": 2,
|
|
"green": 3,
|
|
}
|
|
|
|
code_to_color = {
|
|
1: "red",
|
|
2: "blue",
|
|
3: "yellow", # error
|
|
}
|
|
|
|
would become:
|
|
::
|
|
|
|
bijection! color_to_code, code_to_color:
|
|
"red" = 1
|
|
"blue" = 2
|
|
"green" = 3
|
|
|
|
Domain specific extensions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Where I see macros having real value is in specific domains, not in general purpose language features.
|
|
|
|
For example, parsers.
|
|
Here's part of a parser definition for Python, using macros:
|
|
|
|
::
|
|
|
|
choice! single_input:
|
|
NEWLINE
|
|
simple_stmt
|
|
sequence!:
|
|
compound_stmt
|
|
NEWLINE
|
|
|
|
Compilers
|
|
~~~~~~~~~
|
|
|
|
Runtime compilers, such as ``numba`` have to reconstitute the Python source, or attempt to analyze the bytecode.
|
|
It would be simpler and more reliable for them to get the AST directly:
|
|
|
|
::
|
|
|
|
from! my.jit.library import jit
|
|
|
|
jit!
|
|
def func():
|
|
...
|
|
|
|
Matching symbolic expressions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When matching something representing syntax, such a Python ``ast`` node, or a ``sympy`` expression,
|
|
it is convenient to match against the actual syntax, not the data structure representing it.
|
|
For example, a calculator could be implemented using a domain specific macro for matching syntax:
|
|
|
|
::
|
|
|
|
from! ast_matcher import match
|
|
|
|
def calculate(node):
|
|
if isinstance(node, Num):
|
|
return node.n
|
|
match! node:
|
|
case! a + b:
|
|
return calculate(a) + calculate(b)
|
|
case! a - b:
|
|
return calculate(a) - calculate(b)
|
|
case! a * b:
|
|
return calculate(a) * calculate(b)
|
|
case! a / b:
|
|
return calculate(a) / calculate(b)
|
|
|
|
Which could be converted to:
|
|
|
|
::
|
|
|
|
def calculate(node):
|
|
if isinstance(node, Num):
|
|
return node.n
|
|
$$match_4_0 = node
|
|
if isinstance($$match_4_0, _ast.Add):
|
|
a, b = $$match_4_0.left, $$match_4_0.right
|
|
return calculate(a) + calculate(b)
|
|
elif isinstance($$match_4_0, _ast.Sub):
|
|
a, b = $$match_4_0.left, $$match_4_0.right
|
|
return calculate(a) - calculate(b)
|
|
elif isinstance($$match_4_0, _ast.Mul):
|
|
a, b = $$match_4_0.left, $$match_4_0.right
|
|
return calculate(a) * calculate(b)
|
|
elif isinstance($$match_4_0, _ast.Div):
|
|
a, b = $$match_4_0.left, $$match_4_0.right
|
|
return calculate(a) / calculate(b)
|
|
|
|
Zero cost markers and annotations
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Annotations, either decorators or PEP 3107 function annotations, have a runtime cost
|
|
even if they serve only as markers for checkers or as documentation.
|
|
|
|
::
|
|
|
|
@do_nothing_marker
|
|
def foo(...):
|
|
...
|
|
|
|
can be replaced with the zero cost macro:
|
|
|
|
::
|
|
|
|
do_nothing_marker!:
|
|
def foo(...):
|
|
...
|
|
|
|
Protyping language extensions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Although macros would be most valuable for domain specific extensions, it is possible to
|
|
demonstrate possible language extensions using macros.
|
|
|
|
f-strings:
|
|
..........
|
|
|
|
The f-string ``f"..."`` could be implemented as macro as ``f!("...")``.
|
|
Which is not quite as nice to read, but would still be useful for experimenting with.
|
|
|
|
Try finally statement:
|
|
......................
|
|
|
|
::
|
|
|
|
try_!:
|
|
body
|
|
finally!:
|
|
closing
|
|
|
|
Would be translated roughly as:
|
|
|
|
::
|
|
|
|
try:
|
|
body
|
|
except:
|
|
closing
|
|
else:
|
|
closing
|
|
|
|
Note:
|
|
Care must be taken to handle returns, breaks and continues correctly.
|
|
The above code is merely illustrative.
|
|
|
|
With statement:
|
|
...............
|
|
|
|
::
|
|
|
|
with! open(filename) as fd:
|
|
return fd.read()
|
|
|
|
The above would require handling ``open`` specially.
|
|
An alternative that would be more explicit, would be:
|
|
|
|
::
|
|
|
|
with! open!(filename) as fd:
|
|
return fd.read()
|
|
|
|
Macro definition macros
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Languages that have syntactic macros usually provide a macro for defining macros.
|
|
This PEP intentionally does not do that, as it is not yet clear what a good design
|
|
would be, and we want to allow the community to define their own macros.
|
|
|
|
One possible form could be:
|
|
|
|
::
|
|
|
|
macro_def! name:
|
|
input:
|
|
... # input pattern, defining meta-variables
|
|
output:
|
|
... # output pattern, using meta-variables
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This PEP is fully backwards compatible.
|
|
|
|
Performance Implications
|
|
========================
|
|
|
|
For code that doesn't use macros, there will be no effect on performance.
|
|
|
|
For code that does use macros and has already been compiled to bytecode,
|
|
there will be some slight overhead to check that the version
|
|
of macros used to compile the code match the imported macro processors.
|
|
|
|
For code that has not been compiled, or compiled with different versions
|
|
of the macro processors, then there would be the usual overhead of bytecode
|
|
compilation, plus any additional overhead of macro processing.
|
|
|
|
It is worth noting that the speed of source to bytecode compilation
|
|
is largely irrelevant for Python performance.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
In order to allow transformation of the AST at compile time by Python code,
|
|
all AST nodes in the compiler will have to be Python objects.
|
|
|
|
To do that efficiently, will mean making all the nodes in the ``_ast`` module
|
|
immutable, so as not degrade performance by much.
|
|
They will need to be immutable to guarantee that the AST remains a *tree*
|
|
to avoid having to support cyclic GC.
|
|
Making them immutable means they will not have a
|
|
``__dict__`` attribute, making them compact.
|
|
|
|
AST nodes in the ``ast`` module will remain mutable.
|
|
|
|
Currently, all AST nodes are allocated using an arena allocator.
|
|
Changing to use the standard allocator might slow compilation down a little,
|
|
but has advantages in terms of maintenance, as much code can be deleted.
|
|
|
|
Reference Implementation
|
|
''''''''''''''''''''''''
|
|
|
|
None as yet.
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|
|
|