parent
8e16d32598
commit
5c7d3aa0cd
|
@ -0,0 +1,586 @@
|
|||
PEP: 638
|
||||
Title: Syntactic Macros
|
||||
Author: Mark Shannon <mark@hotpy.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 24-Sep-2020
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP adds support for syntactic macros to Python.
|
||||
A macro is a compile-time function that transforms
|
||||
a part of the program to allow functionality that cannot be
|
||||
expressed cleanly in normal library code.
|
||||
|
||||
The term "syntactic" means that this sort of macro operates on the program's
|
||||
syntax tree. This reduces the chance of mistranslation that can happen
|
||||
with text-based substitution macros, and allows the implementation
|
||||
of `hygienic macros`__.
|
||||
|
||||
__ https://en.wikipedia.org/wiki/Hygienic_macro
|
||||
|
||||
Syntactic macros allow libraries to modify the abstract syntax tree during compilation,
|
||||
providing the ability to extend the language for specific domains without
|
||||
adding to complexity to the language as a whole.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
New language features can be controversial, disruptive and sometimes divisive.
|
||||
Python is now sufficiently powerful and complex, that many proposed additions
|
||||
are a net loss for the language due to the additional complexity.
|
||||
|
||||
Although a language change may make certain patterns easy to express,
|
||||
it will have a cost. Each new feature makes the language larger,
|
||||
harder to learn and harder to understand.
|
||||
Python was once described as `Python Fits Your Brain`__,
|
||||
but that becomes less and less true as more and more features are added.
|
||||
|
||||
Because of the high cost of adding a new feature,
|
||||
it is very difficult or impossible to add a feature that would benefit only
|
||||
some users, regardless of how many users, or how beneficial that feature would
|
||||
be to them.
|
||||
|
||||
The use of Python in data science and machine learning has grown very rapidly
|
||||
over the last few years.
|
||||
However, most of the core developers of Python do not have a background in
|
||||
data science or machine learning.
|
||||
This makes it extemely difficult for the core developers to determine if a
|
||||
language extension for machine learning is worthwhile.
|
||||
|
||||
By allowing language extensions to be modular and distributable, like libraries,
|
||||
domain specific extensions can be implemented without negatively impacting
|
||||
users outside of that domain.
|
||||
A web developer is likely to want a very different set of extensions from
|
||||
a data scientist.
|
||||
We need to let the commmunity develop their own extensions.
|
||||
|
||||
Without some form of user defined language extensions,
|
||||
there will be a constant battle between those wanting to keep the
|
||||
language compact and fitting their brains, and those wanting a new feature
|
||||
that suits their domain or programming style.
|
||||
|
||||
__ https://www.linuxjournal.com/article/4731)
|
||||
|
||||
|
||||
Improving the expessiveness of libraries for specific domains
|
||||
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Many domains see repeated patterns that are difficult or impossible
|
||||
to express as a library.
|
||||
Macros can allow those patterns to be expressed in a more concise and less error
|
||||
prone way.
|
||||
|
||||
Trialing new language features
|
||||
''''''''''''''''''''''''''''''
|
||||
|
||||
It is possible to demonstrate potential language extensions using macros.
|
||||
For example, macros would have enabled the ``with`` statement and
|
||||
``yield from`` expression to have been trialed.
|
||||
Doing so might well have lead to a higher quality implementation
|
||||
at first release, by allowing more testing
|
||||
before those features were included in the language.
|
||||
|
||||
It is nearly impossible to make sure that a new feature is completely reliable
|
||||
before it is released; bugs relating to the ``with`` and ``yield from``
|
||||
features were still being fixed many years after they were released.
|
||||
|
||||
Long term stability for the bytecode interpreter
|
||||
''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Historically new language features have been implemented by naive compilation
|
||||
of the AST into new, complex bytecode instructions.
|
||||
Those bytecodes have often had their own internal flow-control, performing
|
||||
operations that that could, and should, have been done in the compiler.
|
||||
|
||||
For example,
|
||||
until recently flow control within the ``try``-``finally`` and ``with``
|
||||
statments was managed by complicated bytecodes with context dependent semantics.
|
||||
The control flow within those statements is now implemented in the compiler, making
|
||||
the interpreter simpler and faster.
|
||||
|
||||
By implementing new features as AST transformations, the existing compiler can
|
||||
generate the bytecode for a feature without having to modify the interpreter.
|
||||
|
||||
A stable interpreter is necessary if we are to improve the performance and
|
||||
portability of the CPython VM.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Python is both expressive and easy to learn;
|
||||
it is widely recognized as the easiest to learn widely-used programming language.
|
||||
However, it is not the most flexible. That title belongs to lisp.
|
||||
|
||||
Because lisp is homoiconic, meaning that lisp programs are lisp data-structures,
|
||||
lisp programs can be manipulated by lisp programs.
|
||||
Thus much of the language can be defined in itself.
|
||||
|
||||
We would like that ability in Python,
|
||||
without the many parentheses that characterize lisp.
|
||||
Fortunately, homoiconicity is not needed for a language to be able to
|
||||
manipulate itself, all that is needed is the ability to manipulate programs
|
||||
after parsing, but before translation to an executable form.
|
||||
|
||||
Python already has the components needed.
|
||||
The syntax tree of Python is available through the ``ast`` module.
|
||||
All that is needed is a marker to tell the compiler that a macro is present,
|
||||
and the ability for the compiler to callback into user code to manipulate the AST.
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Syntax
|
||||
''''''
|
||||
|
||||
Lexical analysis
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Any sequence of identifier characters followed by an exclamation point
|
||||
(exclamation mark, UK English) will be tokenized as a ``MACRO_NAME``.
|
||||
|
||||
Statement form
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as" NAME ] [ ":" NEWLINE suite ]
|
||||
|
||||
Expression form
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
macro_expr = MACRO_NAME "(" testlist ")"
|
||||
|
||||
Resolving ambiguity
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The statement form of a macro takes precedence, so that the code
|
||||
``macro_name!(x)`` will be parsed as a macro statement,
|
||||
not as an expression statement containing a macro expression.
|
||||
|
||||
Semantics
|
||||
'''''''''
|
||||
|
||||
Compilation
|
||||
~~~~~~~~~~~
|
||||
|
||||
Upon encountering a ``macro`` during translation to bytecode,
|
||||
the code generator will look up the macro processor registered for the macro,
|
||||
and pass the AST, rooted at the macro to the processor function.
|
||||
The returned AST will then be substituted for the original tree.
|
||||
|
||||
For macros with multiple names,
|
||||
several trees will be passed to the macro processor,
|
||||
but only one will be returned and substituted,
|
||||
shorting the enclosing block of statements.
|
||||
|
||||
This process can be repeated,
|
||||
to enable macros to return AST nodes including other macros.
|
||||
|
||||
The compiler will not look up a macro processor until that macro is reached,
|
||||
so that inner macros do not need to have processors registered.
|
||||
For example, in a ``switch`` macro, the ``case`` and ``default`` macros wouldn't
|
||||
need processors registered as they would be eliminated by the ``switch`` processor.
|
||||
|
||||
To enable definition of macros to be imported,
|
||||
the macros ``import!`` and ``from!`` are predefined.
|
||||
They support the following syntax:
|
||||
|
||||
::
|
||||
|
||||
"import!" dotted_name "as" name
|
||||
|
||||
"from!" dotted_name "import" name [ "as" name ]
|
||||
|
||||
The ``import!`` macro performs a compile time import of ``dotted_name``
|
||||
to find the macro processor, then registers it under ``name``
|
||||
for the scope currently being compiled.
|
||||
|
||||
The ``from!`` macro performs a compile time import of ``dotted_name.name``
|
||||
to find the macro processor, then registers it under ``name``
|
||||
(using the ``name`` following "as", if present)
|
||||
for the scope currently being compiled.
|
||||
|
||||
Note that, since ``import!`` and ``from!`` only define the macro for the
|
||||
scope in which the import is present, all uses of a macro must be preceded by
|
||||
an explicit ``import!`` or ``from!`` to improve clarity.
|
||||
|
||||
For example, to import the macro "compile" from "my.compiler":
|
||||
|
||||
::
|
||||
|
||||
from! my.compiler import compile
|
||||
|
||||
|
||||
Defining macro processors
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A macro processor is defined by a four-tuple, consisting of
|
||||
``(func, kind, version, additional_names)``
|
||||
|
||||
* ``func`` must be a callable that takes ``len(additional_names)+1`` arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree.
|
||||
* ``kind`` must be one of the following:
|
||||
|
||||
* ``macros.STMT_MACRO`` A statement macro where the body of the macro is indented. This is the only form which is allowed to have additional names.
|
||||
* ``macros.SIBLING_MACRO`` A statement macro where the body of the macro is the next statement is the same block. The following statement is moved into the macro as its body.
|
||||
* ``macros.EXPR_MACRO`` An expression macro.
|
||||
|
||||
* ``version`` is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.
|
||||
* ``additional_names`` are the names of the additional parts of the macro, and must be a tuple of strings.
|
||||
|
||||
::
|
||||
|
||||
# (func, _ast.STMT_MACRO, VERSION, ())
|
||||
stmt_macro!:
|
||||
multi_statement_body
|
||||
|
||||
# (func, _ast.SIBLING_MACRO, VERSION, ())
|
||||
sibling_macro!
|
||||
single_statement_body
|
||||
|
||||
# (func, _ast.EXPR_MACRO, VERSION, ())
|
||||
x = expr_macro!(...)
|
||||
|
||||
# (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",))
|
||||
multi_part_macro!:
|
||||
multi_statement_body
|
||||
subsequent_macro_part!:
|
||||
multi_statement_body
|
||||
|
||||
The compiler will check that the syntax used matches the declared kind.
|
||||
|
||||
For convenience, the decorator ``macro_processor`` is provided in the ``macros`` module to mark a function as a macro processor:
|
||||
|
||||
::
|
||||
|
||||
def macro_processor(kind, version, *additional_names):
|
||||
def deco(func):
|
||||
return func, kind, version, additional_names
|
||||
return deco
|
||||
|
||||
Which can be used to help declare macro processors, for example:
|
||||
|
||||
::
|
||||
|
||||
@macros.macro_processor(macros.STMT_MACRO, 1_08)
|
||||
def switch(astnode):
|
||||
...
|
||||
|
||||
|
||||
AST extensions
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Two new AST nodes will be needed to express macros, ``macro_stmt`` and ``macro_expr``.
|
||||
|
||||
::
|
||||
|
||||
class macro_stmt(_ast.stmt):
|
||||
|
||||
_fields = "name", "args", "importname", "asname", "body"
|
||||
|
||||
class macro_expr(_ast.expr):
|
||||
|
||||
_fields = "name", "args"
|
||||
|
||||
In addition, macro processors will needs a means to express control flow or side effecting code, that produces a value.
|
||||
To support this, a new ast node, called ``stmt_expr``, that combines a statement and expression will be added.
|
||||
This new ast node will be a subtype of ``expr``, but include a statement to allow side effects.
|
||||
It will be compiled to bytecode by compiling the statement, then compiling the value.
|
||||
|
||||
::
|
||||
|
||||
class stmt_expr(_ast.expr):
|
||||
|
||||
_fields = "stmt", "value"
|
||||
|
||||
Hygiene and debugging
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Macros processors will often need to create new variables.
|
||||
Those variables need to named in such as way as to avoid contaminating the original code and other macros.
|
||||
No rules for naming will be enforced, but to ensure hygiene and help debugging, the following naming scheme is recommended:
|
||||
|
||||
* All generated variable names should start with a ``$``
|
||||
* Purely artificial variable names should start ``$$mname`` where ``mname`` is the name of the macro.
|
||||
* Variables derived from real variables should start ``$vname`` where ``vname`` is the name of the variable.
|
||||
* All variable names should include the line number and the column offset, separated by an underscore.
|
||||
|
||||
Examples:
|
||||
|
||||
* Purely generated name: ``$$macro_17_0``
|
||||
* Name derived from a variable for an expression macro: ``$var_12_5``
|
||||
|
||||
|
||||
Examples
|
||||
''''''''
|
||||
|
||||
Compile time checked data structures
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is common to encode tables of data in Python as large dictionaries.
|
||||
However, these can be hard to maintain and error prone.
|
||||
Macros allow this data to be written in a more readable format.
|
||||
Then, at compile time, it can be verified and converted to an efficient format.
|
||||
|
||||
For example, suppose we have a two dictionary literals mapping codes to names,
|
||||
and vice versa.
|
||||
This is error prone, as the dictionaries may have duplicate keys,
|
||||
or one table may not be the inverse of the other.
|
||||
A macro could generate the two mappings from a single table and,
|
||||
at the same time, verify that no duplicates are present.
|
||||
|
||||
::
|
||||
|
||||
color_to_code = {
|
||||
"red": 1,
|
||||
"blue": 2,
|
||||
"green": 3,
|
||||
}
|
||||
|
||||
code_to_color = {
|
||||
1: "red",
|
||||
2: "blue",
|
||||
3: "yellow", # error
|
||||
}
|
||||
|
||||
would become:
|
||||
::
|
||||
|
||||
bijection! color_to_code, code_to_color:
|
||||
"red" = 1
|
||||
"blue" = 2
|
||||
"green" = 3
|
||||
|
||||
Domain specific extensions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Where I see macros having real value is in specific domains, not in general purpose language features.
|
||||
|
||||
For example, parsers.
|
||||
Here's part of a parser definition for Python, using macros:
|
||||
|
||||
::
|
||||
|
||||
choice! single_input:
|
||||
NEWLINE
|
||||
simple_stmt
|
||||
sequence!:
|
||||
compound_stmt
|
||||
NEWLINE
|
||||
|
||||
Compilers
|
||||
~~~~~~~~~
|
||||
|
||||
Runtime compilers, such as ``numba`` have to reconstitute the Python source, or attempt to analyze the bytecode.
|
||||
It would be simpler and more reliable for them to get the AST directly:
|
||||
|
||||
::
|
||||
|
||||
from! my.jit.library import jit
|
||||
|
||||
jit!
|
||||
def func():
|
||||
...
|
||||
|
||||
Matching symbolic expressions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When matching something representing syntax, such a Python ``ast`` node, or a ``sympy`` expression,
|
||||
it is convenient to match against the actual syntax, not the data structure representing it.
|
||||
For example, a calculator could be implemented using a domain specific macro for matching syntax:
|
||||
|
||||
::
|
||||
|
||||
from! ast_matcher import match
|
||||
|
||||
def calculate(node):
|
||||
if isinstance(node, Num):
|
||||
return node.n
|
||||
match! node:
|
||||
case! a + b:
|
||||
return calculate(a) + calculate(b)
|
||||
case! a - b:
|
||||
return calculate(a) - calculate(b)
|
||||
case! a * b:
|
||||
return calculate(a) * calculate(b)
|
||||
case! a / b:
|
||||
return calculate(a) / calculate(b)
|
||||
|
||||
Which could be converted to:
|
||||
|
||||
::
|
||||
|
||||
def calculate(node):
|
||||
if isinstance(node, Num):
|
||||
return node.n
|
||||
$$match_4_0 = node
|
||||
if isinstance($$match_4_0, _ast.Add):
|
||||
a, b = $$match_4_0.left, $$match_4_0.right
|
||||
return calculate(a) + calculate(b)
|
||||
elif isinstance($$match_4_0, _ast.Sub):
|
||||
a, b = $$match_4_0.left, $$match_4_0.right
|
||||
return calculate(a) - calculate(b)
|
||||
elif isinstance($$match_4_0, _ast.Mul):
|
||||
a, b = $$match_4_0.left, $$match_4_0.right
|
||||
return calculate(a) * calculate(b)
|
||||
elif isinstance($$match_4_0, _ast.Div):
|
||||
a, b = $$match_4_0.left, $$match_4_0.right
|
||||
return calculate(a) / calculate(b)
|
||||
|
||||
Zero cost markers and annotations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Annotations, either decorators or PEP 3107 function annotations, have a runtime cost
|
||||
even if they serve only as markers for checkers or as documentation.
|
||||
|
||||
::
|
||||
|
||||
@do_nothing_marker
|
||||
def foo(...):
|
||||
...
|
||||
|
||||
can be replaced with the zero cost macro:
|
||||
|
||||
::
|
||||
|
||||
do_nothing_marker!:
|
||||
def foo(...):
|
||||
...
|
||||
|
||||
Protyping langauge extensions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Although macros would be most valuable for domain specific extensions, it is possible to
|
||||
demonstrate possible language extensions using macros.
|
||||
|
||||
f-strings:
|
||||
..........
|
||||
|
||||
The f-string ``f"..."`` could be implemented as macro as ``f!("...")``.
|
||||
Which is not quite as nice to read, but would still be useful for experimenting with.
|
||||
|
||||
Try finally statement:
|
||||
......................
|
||||
|
||||
::
|
||||
|
||||
try_!:
|
||||
body
|
||||
finally!:
|
||||
closing
|
||||
|
||||
Would be translated roughly as:
|
||||
|
||||
::
|
||||
|
||||
try:
|
||||
body
|
||||
except:
|
||||
closing
|
||||
else:
|
||||
closing
|
||||
|
||||
Note:
|
||||
Care must be taken to handle returns, breaks and continues correctly.
|
||||
The above code is merely illustrative.
|
||||
|
||||
With statement:
|
||||
...............
|
||||
|
||||
::
|
||||
|
||||
with! open(filename) as fd:
|
||||
return fd.read()
|
||||
|
||||
The above would require handling ``open`` specially.
|
||||
An alternative that would be more explicit, would be:
|
||||
|
||||
::
|
||||
|
||||
with! open!(filename) as fd:
|
||||
return fd.read()
|
||||
|
||||
Macro definition macros
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Languages that have syntactic macros usually provide a macro for defining macros.
|
||||
This PEP intentionally does not do that, as it is not yet clear what a good design
|
||||
would be, and we want to allow the community to define their own macros.
|
||||
|
||||
One possible form could be:
|
||||
|
||||
::
|
||||
|
||||
macro_def! name:
|
||||
input:
|
||||
... # input pattern, defining meta-variables
|
||||
output:
|
||||
... # output pattern, using meta-variables
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
This PEP is fully backwards compatible.
|
||||
|
||||
Performance Implications
|
||||
========================
|
||||
|
||||
For code that doesn't use macros, there will be no effect on performance.
|
||||
|
||||
For code that does use macros and has already been compiled to bytecode,
|
||||
there will be some slight overhead to check that the version
|
||||
of macros used to compile the code match the imported macro processors.
|
||||
|
||||
For code that has not been compiled, or compiled with different versions
|
||||
of the macro processors, then there would be the usual overhead of bytecode
|
||||
compilation, plus any additional overhead of macro processing.
|
||||
|
||||
It is worth noting that the speed of source to bytecode compilation
|
||||
is largely irrelevant for Python performance.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
In order to allow transformation of the AST at compile time by Python code,
|
||||
all AST nodes in the compiler will have to be Python objects.
|
||||
|
||||
To do that efficiently, will mean making all the nodes in the ``_ast`` module
|
||||
immutable, so as not degrade performance by much.
|
||||
They will need to be immutable to guarantee that the AST remains a *tree*
|
||||
to avoid having to support cyclic GC.
|
||||
Making them immutable means they will not have a
|
||||
``__dict__`` attribute, making them compact.
|
||||
|
||||
AST nodes in the ``ast`` module will remain mutable.
|
||||
|
||||
Currently, all AST nodes are allocated using an arena allocator.
|
||||
Changing to use the standard allocator might slow compilation down a little,
|
||||
but has advantages in terms of maintenance, as much code can be deleted.
|
||||
|
||||
Reference Implementation
|
||||
''''''''''''''''''''''''
|
||||
|
||||
None as yet.
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
|
Loading…
Reference in New Issue