diff --git a/pep-0617.rst b/pep-0617.rst index 59d436ed2..c7e1ef448 100644 --- a/pep-0617.rst +++ b/pep-0617.rst @@ -545,15 +545,25 @@ A subexpression can be named by preceding it with an identifier and an --------------- Grammar actions --------------- - To avoid the intermediate steps that obscure the relationship between the -grammar and the AST generation the proposed PEG parser allows directly generating -AST nodes for a rule via grammar actions. Grammar actions are C expressions that -are evaluated when a grammar rule is successfully parsed. This allows to directly -describe how the AST is composed in the grammar itself, making it more clear and -maintainable. This AST generation process is supported by the use of some helper -functions that factor out common AST object manipulations and some other required -operations that are not directly related to the grammar. +grammar and the AST generation the proposed PEG parser allows directly +generating AST nodes for a rule via grammar actions. Grammar actions are +language-specific expressions that are evaluated when a grammar rule is +successfully parsed. These expressions can be written in Python or C +depending on the desired output of the parser generator. This means that if +one would want to generate a parser in Python and another in C, two grammar +files should be written, each one with a different set of actions, keeping +everything else apart from said actions identical in both files. As an +example of a grammar with Python actions, the piece of the parser generator +that parses grammar files is bootstrapped from a meta-grammar file with +Python actions that generate the grammar tree as a result of the parsing. + +In the specific case of the new proposed PEG grammar for the Python, having +actions allows to directly describe how the AST is composed in the grammar +itself, making it more clear and maintainable. This AST generation process is +supported by the use of some helper functions that factor out common AST +object manipulations and some other required operations that are not directly +related to the grammar. To indicate these actions each alternative can be followed by the action code inside curly-braces, which specifies the return value of the alternative::: @@ -571,35 +581,62 @@ different possibilities: If the action is omitted and Python code is being generated, then a list with all the parsed expressions get returned (this is meant for debugging). -As an illustrative example this simple grammar file allows to directly generate a full -parser that can parse simple aritmetic expressions and that returns a valid Python AST: +As an illustrative example this simple grammar file allows to directly +generate a full parser that can parse simple arithmetic expressions and that +returns a valid C-based Python AST: :: - start[mod_ty]: a=stmt* $ { Module(a, NULL, p->arena) } - stmt[stmt_ty]: a=expr_stmt { a } - expr_stmt[stmt_ty]: a=expression NEWLINE { _Py_Expr(a, EXTRA) } - expression[expr_ty]: ( l=expression '+' r=term { _Py_BinOp(l, Add, r, EXTRA) } - | l=expression '-' r=term { _Py_BinOp(l, Sub, r, EXTRA) } - | t=term { t } - ) - term[expr_ty]: ( l=term '*' r=factor { _Py_BinOp(l, Mult, r, EXTRA } - | l=term '/' r=factor { _Py_BinOp(l, Div, r, EXTRA) } - | f=factor { f } - ) - factor[expr_ty]: ('(' e=expression ')' { e } - | a=atom { a } - ) - atom[expr_ty]: ( n=NAME { n } - | n=NUMBER { n } - | s=STRING { s } - ) + start[mod_ty]: a=expr_stmt* $ { Module(a, NULL, p->arena) } + expr_stmt[stmt_ty]: a=expr NEWLINE { _Py_Expr(a, EXTRA) } + expr[expr_ty]: + | l=expr '+' r=term { _Py_BinOp(l, Add, r, EXTRA) } + | l=expr '-' r=term { _Py_BinOp(l, Sub, r, EXTRA) } + | t=term { t } + + term[expr_ty]: + | l=term '*' r=factor { _Py_BinOp(l, Mult, r, EXTRA) } + | l=term '/' r=factor { _Py_BinOp(l, Div, r, EXTRA) } + | f=factor { f } + + factor[expr_ty]: + | '(' e=expr ')' { e } + | a=atom { a } + + atom[expr_ty]: + | n=NAME { n } + | n=NUMBER { n } + | s=STRING { s } Here ``EXTRA`` is a macro that expands to ``start_lineno, start_col_offset, end_lineno, end_col_offset, p->arena``, those being variables automatically injected by the parser; ``p`` points to an object that holds on to all state for the parser. +A similar grammar written to target Python AST objects: + +:: + + start: expr NEWLINE? ENDMARKER { ast.Expression(expr) } + expr: + | expr '+' term { ast.BinOp(expr, ast.Add(), term) } + | expr '-' term { ast.BinOp(expr, ast.Sub(), term) } + | term { term } + + term: + | l=term '*' r=factor { ast.BinOp(l, ast.Mult(), r) } + | term '/' factor { ast.BinOp(term, ast.Div(), factor) } + | factor { factor } + + factor: + | '(' expr ')' { expr } + | atom { atom } + + atom: + | NAME { ast.Name(id=name.string, ctx=ast.Load()) } + | NUMBER { ast.Constant(value=ast.literal_eval(number.string)) } + + ============== Migration plan ==============