PEP 638: Syntactic macros (#1616)

Syntactic macro PEP
2020-09-26 12:01:31 +01:00 · 2020-09-26 12:01:31 +01:00 · 5c7d3aa0cd
parent 8e16d32598
commit 5c7d3aa0cd
1 changed files with 586 additions and 0 deletions
--- a/pep-0638.rst
+++ b/pep-0638.rst
@ -0,0 +1,586 @@
+PEP: 638
+Title: Syntactic Macros
+Author: Mark Shannon <mark@hotpy.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 24-Sep-2020
+
+Abstract
+========
+
+This PEP adds support for syntactic macros to Python.
+A macro is a compile-time function that transforms
+a part of the program to allow functionality that cannot be
+expressed cleanly in normal library code.
+
+The term "syntactic" means that this sort of macro operates on the program's
+syntax tree. This reduces the chance of mistranslation that can happen
+with text-based substitution macros, and allows the implementation
+of `hygienic macros`__.
+
+__ https://en.wikipedia.org/wiki/Hygienic_macro
+
+Syntactic macros allow libraries to modify the abstract syntax tree during compilation,
+providing the ability to extend the language for specific domains without
+adding to complexity to the language as a whole.
+
+Motivation
+==========
+
+New language features can be controversial, disruptive and sometimes divisive.
+Python is now sufficiently powerful and complex, that many proposed additions 
+are a net loss for the language due to the additional complexity.
+
+Although a language change may make certain patterns easy to express,
+it will have a cost. Each new feature makes the language larger,
+harder to learn and harder to understand.
+Python was once described as `Python Fits Your Brain`__,
+but that becomes less and less true as more and more features are added.
+
+Because of the high cost of adding a new feature, 
+it is very difficult or impossible to add a feature that would benefit only
+some users, regardless of how many users, or how beneficial that feature would
+be to them.
+
+The use of Python in data science and machine learning has grown very rapidly
+over the last few years.
+However, most of the core developers of Python do not have a background in
+data science or machine learning.
+This makes it extemely difficult for the core developers to determine if a
+language extension for machine learning is worthwhile.
+
+By allowing language extensions to be modular and distributable, like libraries,
+domain specific extensions can be implemented without negatively impacting
+users outside of that domain.
+A web developer is likely to want a very different set of extensions from
+a data scientist.
+We need to let the commmunity develop their own extensions.
+
+Without some form of user defined language extensions,
+there will be a constant battle between those wanting to keep the
+language compact and fitting their brains, and those wanting a new feature
+that suits their domain or programming style.
+
+__ https://www.linuxjournal.com/article/4731)
+
+
+Improving the expessiveness of libraries for specific domains
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Many domains see repeated patterns that are difficult or impossible
+to express as a library.
+Macros can allow those patterns to be expressed in a more concise and less error
+prone way.
+
+Trialing new language features
+''''''''''''''''''''''''''''''
+
+It is possible to demonstrate potential language extensions using macros.
+For example, macros would have enabled the ``with`` statement and
+``yield from`` expression to have been trialed.
+Doing so might well have lead to a higher quality implementation
+at first release, by allowing more testing
+before those features were included in the language.
+
+It is nearly impossible to make sure that a new feature is completely reliable
+before it is released; bugs relating to the ``with`` and  ``yield from``
+features were still being fixed many years after they were released.
+
+Long term stability for the bytecode interpreter
+''''''''''''''''''''''''''''''''''''''''''''''''
+
+Historically new language features have been implemented by naive compilation
+of the AST into new, complex bytecode instructions.
+Those bytecodes have often had their own internal flow-control, performing
+operations that that could, and should, have been done in the compiler.
+
+For example,
+until recently flow control within the ``try``-``finally`` and ``with``
+statments was managed by complicated bytecodes with context dependent semantics.
+The control flow within those statements is now implemented in the compiler, making
+the interpreter simpler and faster.
+
+By implementing new features as AST transformations, the existing compiler can
+generate the bytecode for a feature without having to modify the interpreter.
+
+A stable interpreter is necessary if we are to improve the performance and
+portability of the CPython VM.
+
+Rationale
+=========
+
+Python is both expressive and easy to learn;
+it is widely recognized as the easiest to learn widely-used programming language.
+However, it is not the most flexible. That title belongs to lisp.
+
+Because lisp is homoiconic, meaning that lisp programs are lisp data-structures,
+lisp programs can be manipulated by lisp programs.
+Thus much of the language can be defined in itself.
+
+We would like that ability in Python,
+without the many parentheses that characterize lisp.
+Fortunately, homoiconicity is not needed for a language to be able to
+manipulate itself, all that is needed is the ability to manipulate programs
+after parsing, but before translation to an executable form.
+
+Python already has the components needed.
+The syntax tree of Python is available through the ``ast`` module.
+All that is needed is a marker to tell the compiler that a macro is present,
+and the ability for the compiler to callback into user code to manipulate the AST.
+
+Specification
+=============
+
+Syntax
+''''''
+
+Lexical analysis
+~~~~~~~~~~~~~~~~
+
+Any sequence of identifier characters followed by an exclamation point
+(exclamation mark, UK English) will be tokenized as a ``MACRO_NAME``.
+
+Statement form
+~~~~~~~~~~~~~~
+
+::
+
+    macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as"  NAME ] [ ":" NEWLINE suite ]
+
+Expression form
+~~~~~~~~~~~~~~~
+
+::
+
+    macro_expr = MACRO_NAME "(" testlist ")"
+
+Resolving ambiguity
+~~~~~~~~~~~~~~~~~~~
+
+The statement form of a macro takes precedence, so that the code
+``macro_name!(x)`` will be parsed as a macro statement,
+not as an expression statement containing a macro expression.
+
+Semantics
+'''''''''
+
+Compilation
+~~~~~~~~~~~
+
+Upon encountering a ``macro`` during translation to bytecode, 
+the code generator will look up the macro processor registered for the macro,
+and pass the AST, rooted at the macro to the processor function.
+The returned AST will then be substituted for the original tree.
+
+For macros with multiple names,
+several trees will be passed to the macro processor,
+but only one will be returned and substituted,
+shorting the enclosing block of statements.
+
+This process can be repeated,
+to enable macros to return AST nodes including other macros.
+
+The compiler will not look up a macro processor until that macro is reached,
+so that inner macros do not need to have processors registered.
+For example, in a ``switch`` macro, the ``case`` and ``default`` macros wouldn't
+need processors registered as they would be eliminated by the ``switch`` processor.
+
+To enable definition of macros to be imported,
+the macros ``import!`` and ``from!`` are predefined.
+They support the following syntax:
+
+::
+
+    "import!" dotted_name "as" name
+
+    "from!" dotted_name "import" name [ "as" name ]
+
+The ``import!`` macro performs a compile time import of ``dotted_name``
+to find the macro processor, then registers it under ``name``
+for the scope currently being compiled.
+
+The ``from!`` macro performs a compile time import of ``dotted_name.name``
+to find the macro processor, then registers it under ``name``
+(using the ``name`` following "as", if present)
+for the scope currently being compiled.
+
+Note that, since ``import!`` and ``from!`` only define the macro for the
+scope in which the import is present, all uses of a macro must be preceded by
+an explicit ``import!`` or ``from!`` to improve clarity.
+
+For example, to import the macro "compile" from "my.compiler":
+
+::
+
+    from! my.compiler import compile
+
+
+Defining macro processors
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A macro processor is defined by a four-tuple, consisting of
+``(func, kind, version, additional_names)``
+
+* ``func`` must be a callable that takes ``len(additional_names)+1`` arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree.
+* ``kind`` must be one of the following:
+
+  * ``macros.STMT_MACRO`` A statement macro where the body of the macro is indented. This is the only form which is allowed to have additional names.
+  * ``macros.SIBLING_MACRO`` A statement macro where the body of the macro is the next statement is the same block. The following statement is moved into the macro as its body.
+  * ``macros.EXPR_MACRO`` An expression macro.
+
+* ``version`` is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.
+* ``additional_names`` are the names of the additional parts of the macro, and must be a tuple of strings.
+
+::
+
+    # (func, _ast.STMT_MACRO, VERSION, ())
+    stmt_macro!:
+        multi_statement_body
+
+    # (func, _ast.SIBLING_MACRO, VERSION, ())
+    sibling_macro!
+    single_statement_body
+
+    # (func, _ast.EXPR_MACRO, VERSION, ())
+    x = expr_macro!(...)
+
+    # (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",))
+    multi_part_macro!:
+        multi_statement_body
+    subsequent_macro_part!:
+        multi_statement_body
+
+The compiler will check that the syntax used matches the declared kind.
+
+For convenience, the decorator ``macro_processor`` is provided in the ``macros`` module to mark a function as a macro processor:
+
+::
+
+    def macro_processor(kind, version, *additional_names):
+        def deco(func):
+            return func, kind, version, additional_names
+        return deco
+
+Which can be used to help declare macro processors, for example:
+
+::
+
+    @macros.macro_processor(macros.STMT_MACRO, 1_08)
+    def switch(astnode):
+        ...
+
+
+AST extensions
+~~~~~~~~~~~~~~
+
+Two new AST nodes will be needed to express macros, ``macro_stmt`` and ``macro_expr``.
+
+::
+
+    class macro_stmt(_ast.stmt):
+
+        _fields = "name", "args", "importname", "asname", "body"
+
+    class macro_expr(_ast.expr):
+
+        _fields = "name", "args"
+
+In addition, macro processors will needs a means to express control flow or side effecting code, that produces a value.
+To support this, a new ast node, called ``stmt_expr``, that combines a statement and expression will be added.
+This new ast node will be a subtype of ``expr``, but include a statement to allow side effects.
+It will be compiled to bytecode by compiling the statement, then compiling the value.
+
+::
+
+    class stmt_expr(_ast.expr):
+
+        _fields = "stmt", "value"
+
+Hygiene and debugging
+~~~~~~~~~~~~~~~~~~~~~
+
+Macros processors will often need to create new variables.
+Those variables need to named in such as way as to avoid contaminating the original code and other macros.
+No rules for naming will be enforced, but to ensure hygiene and help debugging, the following naming scheme is recommended:
+
+* All generated variable names should start with a ``$``
+* Purely artificial variable names should start ``$$mname`` where ``mname`` is the name of the macro.
+* Variables derived from real variables should start ``$vname`` where  ``vname`` is the name of the variable.
+* All variable names should include the line number and the column offset, separated by an underscore.
+
+Examples:
+
+* Purely generated name: ``$$macro_17_0``
+* Name derived from a variable for an expression macro: ``$var_12_5``
+
+
+Examples
+''''''''
+
+Compile time checked data structures
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is common to encode tables of data in Python as large dictionaries.
+However, these can be hard to maintain and error prone.
+Macros allow this data to be written in a more readable format.
+Then, at compile time, it can be verified and converted to an efficient format.
+
+For example, suppose we have a two dictionary literals mapping codes to names,
+and vice versa.
+This is error prone, as the dictionaries may have duplicate keys,
+or one table may not be the inverse of the other.
+A macro could generate the two mappings from a single table and,
+at the same time, verify that no duplicates are present.
+
+::
+
+    color_to_code = {
+        "red": 1,
+        "blue": 2,
+        "green": 3,
+    }
+
+    code_to_color = {
+        1: "red",
+        2: "blue",
+        3: "yellow", # error
+    }
+
+would become:
+::
+
+    bijection! color_to_code, code_to_color:
+        "red" = 1
+        "blue" = 2
+        "green" = 3
+
+Domain specific extensions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Where I see macros having real value is in specific domains, not in general purpose language features.
+
+For example, parsers.
+Here's part of a parser definition for Python, using macros:
+
+::
+
+    choice! single_input:
+        NEWLINE 
+        simple_stmt
+        sequence!:
+            compound_stmt
+            NEWLINE
+
+Compilers
+~~~~~~~~~
+
+Runtime compilers, such as ``numba`` have to reconstitute the Python source, or attempt to analyze the bytecode.
+It would be simpler and more reliable for them to get the AST directly:
+
+::
+
+    from! my.jit.library import jit
+    
+    jit!
+    def func():
+        ...
+
+Matching symbolic expressions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When matching something representing syntax, such a Python ``ast`` node, or a ``sympy`` expression, 
+it is convenient to match against the actual syntax, not the data structure representing it.
+For example, a calculator could be implemented using a domain specific macro for matching syntax:
+
+::
+
+    from! ast_matcher import match
+
+    def calculate(node):
+        if isinstance(node, Num):
+            return node.n
+        match! node:
+            case! a + b:
+                return calculate(a) + calculate(b)
+            case! a - b:
+                return calculate(a) - calculate(b)
+            case! a * b:
+                return calculate(a) * calculate(b)
+            case! a / b:
+                return calculate(a) / calculate(b)
+        
+Which could be converted to:
+
+::
+
+    def calculate(node):
+        if isinstance(node, Num):
+            return node.n
+        $$match_4_0 = node
+        if isinstance($$match_4_0, _ast.Add):
+            a, b = $$match_4_0.left, $$match_4_0.right
+            return calculate(a) + calculate(b)
+        elif isinstance($$match_4_0, _ast.Sub):
+            a, b = $$match_4_0.left, $$match_4_0.right
+            return calculate(a) - calculate(b)
+        elif isinstance($$match_4_0, _ast.Mul):
+            a, b = $$match_4_0.left, $$match_4_0.right
+            return calculate(a) * calculate(b)
+        elif isinstance($$match_4_0, _ast.Div):
+            a, b = $$match_4_0.left, $$match_4_0.right
+            return calculate(a) / calculate(b)
+
+Zero cost markers and annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Annotations, either decorators or PEP 3107 function annotations, have a runtime cost
+even if they serve only as markers for checkers or as documentation.
+
+::
+
+    @do_nothing_marker
+    def foo(...):
+        ...
+
+can be replaced with the zero cost macro:
+
+::
+
+    do_nothing_marker!:
+    def foo(...):
+        ...
+
+Protyping langauge extensions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Although macros would be most valuable for domain specific extensions, it is possible to
+demonstrate possible language extensions using macros.
+
+f-strings:
+..........
+
+The f-string ``f"..."`` could be implemented as macro as ``f!("...")``.
+Which is not quite as nice to read, but would still be useful for experimenting with.
+
+Try finally statement:
+......................
+
+::
+
+    try_!:
+        body
+    finally!:
+        closing
+        
+Would be translated roughly as:
+
+::
+
+    try:
+        body
+    except:
+        closing
+    else:
+        closing
+
+Note:
+    Care must be taken to handle returns, breaks and continues correctly.
+    The above code is merely illustrative.
+
+With statement:
+...............
+
+::
+
+    with! open(filename) as fd:
+        return fd.read()
+
+The above would require handling ``open`` specially.
+An alternative that would be more explicit, would be:
+
+::
+
+    with! open!(filename) as fd:
+        return fd.read()
+
+Macro definition macros
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Languages that have syntactic macros usually provide a macro for defining macros.
+This PEP intentionally does not do that, as it is not yet clear what a good design
+would be, and we want to allow the community to define their own macros.
+
+One possible form could be:
+
+::
+
+    macro_def! name:
+        input:
+            ... # input pattern, defining meta-variables
+        output:
+            ... # output pattern, using meta-variables
+
+
+Backwards Compatibility
+=======================
+
+This PEP is fully backwards compatible.
+
+Performance Implications
+========================
+
+For code that doesn't use macros, there will be no effect on performance.
+
+For code that does use macros and has already been compiled to bytecode,
+there will be some slight overhead to check that the version
+of macros used to compile the code match the imported macro processors.
+
+For code that has not been compiled, or compiled with different versions
+of the macro processors, then there would be the usual overhead of bytecode
+compilation, plus any additional overhead of macro processing.
+
+It is worth noting that the speed of source to bytecode compilation
+is largely irrelevant for Python performance.
+
+Implementation
+==============
+
+In order to allow transformation of the AST at compile time by Python code,
+all AST nodes in the compiler will have to be Python objects.
+
+To do that efficiently, will mean making all the nodes in the ``_ast`` module
+immutable, so as not degrade performance by much.
+They will need to be immutable to guarantee that the AST remains a *tree*
+to avoid having to support cyclic GC.
+Making them immutable means they will not have a
+``__dict__`` attribute, making them compact.
+
+AST nodes in the ``ast`` module will remain mutable.
+
+Currently, all AST nodes are allocated using an arena allocator.
+Changing to use the standard allocator might slow compilation down a little,
+but has advantages in terms of maintenance, as much code can be deleted.
+
+Reference Implementation
+''''''''''''''''''''''''
+
+None as yet.
+
+Copyright
+=========
+
+This document is placed in the public domain or under the
+CC0-1.0-Universal license, whichever is more permissive.
+
+
+
+..
+    Local Variables:
+    mode: indented-text
+    indent-tabs-mode: nil
+    sentence-end-double-space: t
+    fill-column: 70
+    coding: utf-8
+    End:
+