The control flow within those statements is now implemented in the compiler, making
the interpreter simpler and faster.
By implementing new features as AST transformations, the existing compiler can
generate the bytecode for a feature without having to modify the interpreter.
A stable interpreter is necessary if we are to improve the performance and
portability of the CPython VM.
Rationale
=========
Python is both expressive and easy to learn;
it is widely recognized as the easiest to learn widely-used programming language.
However, it is not the most flexible. That title belongs to lisp.
Because lisp is homoiconic, meaning that lisp programs are lisp data-structures,
lisp programs can be manipulated by lisp programs.
Thus much of the language can be defined in itself.
We would like that ability in Python,
without the many parentheses that characterize lisp.
Fortunately, homoiconicity is not needed for a language to be able to
manipulate itself, all that is needed is the ability to manipulate programs
after parsing, but before translation to an executable form.
Python already has the components needed.
The syntax tree of Python is available through the ``ast`` module.
All that is needed is a marker to tell the compiler that a macro is present,
and the ability for the compiler to callback into user code to manipulate the AST.
Specification
=============
Syntax
''''''
Lexical analysis
~~~~~~~~~~~~~~~~
Any sequence of identifier characters followed by an exclamation point
(exclamation mark, UK English) will be tokenized as a ``MACRO_NAME``.
Statement form
~~~~~~~~~~~~~~
::
macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as" NAME ] [ ":" NEWLINE suite ]
Expression form
~~~~~~~~~~~~~~~
::
macro_expr = MACRO_NAME "(" testlist ")"
Resolving ambiguity
~~~~~~~~~~~~~~~~~~~
The statement form of a macro takes precedence, so that the code
``macro_name!(x)`` will be parsed as a macro statement,
not as an expression statement containing a macro expression.
Semantics
'''''''''
Compilation
~~~~~~~~~~~
Upon encountering a ``macro`` during translation to bytecode,
the code generator will look up the macro processor registered for the macro,
and pass the AST, rooted at the macro to the processor function.
The returned AST will then be substituted for the original tree.
For macros with multiple names,
several trees will be passed to the macro processor,
but only one will be returned and substituted,
shorting the enclosing block of statements.
This process can be repeated,
to enable macros to return AST nodes including other macros.
The compiler will not look up a macro processor until that macro is reached,
so that inner macros do not need to have processors registered.
For example, in a ``switch`` macro, the ``case`` and ``default`` macros wouldn't
need processors registered as they would be eliminated by the ``switch`` processor.
To enable definition of macros to be imported,
the macros ``import!`` and ``from!`` are predefined.
They support the following syntax:
::
"import!" dotted_name "as" name
"from!" dotted_name "import" name [ "as" name ]
The ``import!`` macro performs a compile time import of ``dotted_name``
to find the macro processor, then registers it under ``name``
for the scope currently being compiled.
The ``from!`` macro performs a compile time import of ``dotted_name.name``
to find the macro processor, then registers it under ``name``
(using the ``name`` following "as", if present)
for the scope currently being compiled.
Note that, since ``import!`` and ``from!`` only define the macro for the
scope in which the import is present, all uses of a macro must be preceded by
an explicit ``import!`` or ``from!`` to improve clarity.
For example, to import the macro "compile" from "my.compiler":
::
from! my.compiler import compile
Defining macro processors
~~~~~~~~~~~~~~~~~~~~~~~~~
A macro processor is defined by a four-tuple, consisting of
``(func, kind, version, additional_names)``
*``func`` must be a callable that takes ``len(additional_names)+1`` arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree.
*``kind`` must be one of the following:
*``macros.STMT_MACRO`` A statement macro where the body of the macro is indented. This is the only form which is allowed to have additional names.
*``macros.SIBLING_MACRO`` A statement macro where the body of the macro is the next statement is the same block. The following statement is moved into the macro as its body.
*``macros.EXPR_MACRO`` An expression macro.
*``version`` is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.
*``additional_names`` are the names of the additional parts of the macro, and must be a tuple of strings.