From 71f65547f6e88001bf83f725d21dec4cee26369c Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Fri, 7 Sep 2001 22:35:39 +0000 Subject: [PATCH] PEP 269, Pgen Module for Python, Jonathan Riehl --- pep-0269.txt | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 pep-0269.txt diff --git a/pep-0269.txt b/pep-0269.txt new file mode 100644 index 000000000..58e181237 --- /dev/null +++ b/pep-0269.txt @@ -0,0 +1,184 @@ +PEP: 269 +Title: Pgen Module for Python +Version: $Revision$ +Last-Modified: $Date$ +Author: jriehl@spaceship.com (Jonathan Riehl) +Status: Draft +Type: Standards Track +Created: 24-Aug-2001 +Python-Version: 2.2 +Post-History: + + +Abstract + + Much like the parser module exposes the Python parser, this PEP + proposes that the parser generator used to create the Python + parser, pgen, be exposed as a module in Python. + + +Rationale + + Through the course of Pythonic history, there have been numerous + discussions about the creation of a Python compiler [1]. These + have resulted in several implementations of Python parsers, most + notably the parser module currently provided in the Python + standard library[2] and Jeremy Hylton's compiler module[3]. + However, while multiple language changes have been proposed + [4][5], experimentation with the Python syntax has lacked the + benefit of a Python binding to the actual parser generator used to + build Python. + + By providing a Python wrapper analogous to Fred Drake Jr.'s parser + wrapper, but targeted at the pgen library, the following + assertions are made: + + 1. Reference implementations of syntax changes will be easier to + develop. Currently, a reference implementation of a syntax + change would require the developer to use the pgen tool from + the command line. The resulting parser data structure would + then either have to be reworked to interface with a custom + CPython implementation, or wrapped as a C extension module. + + 2. Reference implementations of syntax changes will be easier to + distribute. Since the parser generator will be available in + Python, it should follow that the resulting parser will + accessible from Python. Therefore, reference implementations + should be available as pure Python code, versus using custom + versions of the existing CPython distribution, or as compilable + extension modules. + + 3. Reference implementations of syntax changes will be easier to + discuss with a larger audience. This somewhat falls out of the + second assertion, since the community of Python users is most + likely larger than the community of CPython developers. + + 4. Development of small languages in Python will be further + enhanced, since the additional module will be a fully + functional LL(1) parser generator. + + +Specification + + The proposed module will be called pgen. The pgen module will + contain the following functions: + + parseGrammarFile (fileName) -> AST + The parseGrammarFile() function will read the file pointed to + by fileName and create an AST object. The AST nodes will + contain the nonterminal, numeric values of the parser + generator meta-grammar. The output AST will be an instance of + the AST extension class as provided by the parser module. + Syntax errors in the input file will cause the SyntaxError + exception to be raised. + + parseGrammarString (text) -> AST + The parseGrammarString() function will follow the semantics of + the parseGrammarFile(), but accept the grammar text as a + string for input, as opposed to the file name. + + buildParser (grammarAst) -> DFA + The buildParser() function will accept an AST object for input + and return a DFA (deterministic finite automaton) data + structure. The DFA data structure will be a C extension + class, much like the AST structure is provided in the parser + module. If the input AST does not conform to the nonterminal + codes defined for the pgen meta-grammar, buildParser() will + throw a ValueError exception. + + parseFile (fileName, dfa, start) -> AST + The parseFile() function will essentially be a wrapper for the + PyParser_ParseFile() C API function. The wrapper code will + accept the DFA C extension class, and the file name. An AST + instance that conforms to the lexical values in the token + module and the nonterminal values contained in the DFA will be + output. + + parseString (text, dfa, start) -> AST + The parseString() function will operate in a similar fashion + to the parseFile() function, but accept the parse text as an + argument. Much like parseFile() will wrap the + PyParser_ParseFile() C API function, parseString() will wrap + the PyParser_ParseString() function. + + symbolToStringMap (dfa) -> dict + The symbolToStringMap() function will accept a DFA instance + and return a dictionary object that maps from the DFA's + numeric values for its nonterminals to the string names of the + nonterminals as found in the original grammar specification + for the DFA. + + stringToSymbolMap (dfa) -> dict + The stringToSymbolMap() function output a dictionary mapping + the nonterminal names of the input DFA to their corresponding + numeric values. + + Extra credit will be awarded if the map generation functions and + parsing functions are also methods of the DFA extension class. + + +Implementation Plan + + A cunning plan has been devised to accomplish this enhancement: + + 1. Rename the pgen functions to conform to the CPython naming + standards. This action may involve adding some header files to + the Include subdirectory. + + 2. Move the pgen C modules in the Makefile.pre.in from unique pgen + elements to the Python C library. + + 3. Make any needed changes to the parser module so the AST + extension class understands that there are AST types it may not + understand. Cursory examination of the AST extension class + shows that it keeps track of whether the tree is a suite or an + expression. + + 3. Code an additional C module in the Modules directory. The C + extension module will implement the DFA extension class and the + functions outlined in the previous section. + + 4. Add the new module to the build process. Black magic, indeed. + + +Limitations + + Under this proposal, would be designers of Python 3000 will still + be constrained to Python's lexical conventions. The addition, + subtraction or modification of the Python lexer is outside the + scope of this PEP. + + +Reference Implementation + + No reference implementation is currently provided. + + +References + + [1] The (defunct) Python Compiler-SIG + http://python.org/sigs/compiler-sig/ + + [2] Parser Module Documentation + http://python.org/doc/current/lib/module-parser.html + + [3] Hylton, Jeremy. FIXME: Reference Compiler Document + + [4] Pelletier, Michel. "Python Interface Syntax", PEP-245. + http://python.sourceforge.net/peps/pep-0245.html + + [5] The Python Types-SIG + http://python.org/sigs/types-sig/ + + +Copyright + + This document has been placed in the public domain. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +fill-column: 70 +End: