2001-09-07 18:35:39 -04:00
|
|
|
|
PEP: 269
|
|
|
|
|
Title: Pgen Module for Python
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: jriehl@spaceship.com (Jonathan Riehl)
|
2004-08-18 07:56:16 -04:00
|
|
|
|
Status: Deferred
|
2001-09-07 18:35:39 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 24-Aug-2001
|
|
|
|
|
Python-Version: 2.2
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
Much like the parser module exposes the Python parser, this PEP
|
|
|
|
|
proposes that the parser generator used to create the Python
|
|
|
|
|
parser, pgen, be exposed as a module in Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
Through the course of Pythonic history, there have been numerous
|
|
|
|
|
discussions about the creation of a Python compiler [1]. These
|
|
|
|
|
have resulted in several implementations of Python parsers, most
|
|
|
|
|
notably the parser module currently provided in the Python
|
|
|
|
|
standard library[2] and Jeremy Hylton's compiler module[3].
|
|
|
|
|
However, while multiple language changes have been proposed
|
|
|
|
|
[4][5], experimentation with the Python syntax has lacked the
|
|
|
|
|
benefit of a Python binding to the actual parser generator used to
|
|
|
|
|
build Python.
|
|
|
|
|
|
|
|
|
|
By providing a Python wrapper analogous to Fred Drake Jr.'s parser
|
|
|
|
|
wrapper, but targeted at the pgen library, the following
|
|
|
|
|
assertions are made:
|
|
|
|
|
|
|
|
|
|
1. Reference implementations of syntax changes will be easier to
|
|
|
|
|
develop. Currently, a reference implementation of a syntax
|
|
|
|
|
change would require the developer to use the pgen tool from
|
|
|
|
|
the command line. The resulting parser data structure would
|
|
|
|
|
then either have to be reworked to interface with a custom
|
|
|
|
|
CPython implementation, or wrapped as a C extension module.
|
|
|
|
|
|
|
|
|
|
2. Reference implementations of syntax changes will be easier to
|
|
|
|
|
distribute. Since the parser generator will be available in
|
|
|
|
|
Python, it should follow that the resulting parser will
|
|
|
|
|
accessible from Python. Therefore, reference implementations
|
|
|
|
|
should be available as pure Python code, versus using custom
|
|
|
|
|
versions of the existing CPython distribution, or as compilable
|
|
|
|
|
extension modules.
|
|
|
|
|
|
|
|
|
|
3. Reference implementations of syntax changes will be easier to
|
|
|
|
|
discuss with a larger audience. This somewhat falls out of the
|
|
|
|
|
second assertion, since the community of Python users is most
|
|
|
|
|
likely larger than the community of CPython developers.
|
|
|
|
|
|
|
|
|
|
4. Development of small languages in Python will be further
|
|
|
|
|
enhanced, since the additional module will be a fully
|
|
|
|
|
functional LL(1) parser generator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
|
|
|
|
|
The proposed module will be called pgen. The pgen module will
|
|
|
|
|
contain the following functions:
|
|
|
|
|
|
|
|
|
|
parseGrammarFile (fileName) -> AST
|
|
|
|
|
The parseGrammarFile() function will read the file pointed to
|
|
|
|
|
by fileName and create an AST object. The AST nodes will
|
|
|
|
|
contain the nonterminal, numeric values of the parser
|
|
|
|
|
generator meta-grammar. The output AST will be an instance of
|
|
|
|
|
the AST extension class as provided by the parser module.
|
|
|
|
|
Syntax errors in the input file will cause the SyntaxError
|
|
|
|
|
exception to be raised.
|
|
|
|
|
|
|
|
|
|
parseGrammarString (text) -> AST
|
|
|
|
|
The parseGrammarString() function will follow the semantics of
|
|
|
|
|
the parseGrammarFile(), but accept the grammar text as a
|
|
|
|
|
string for input, as opposed to the file name.
|
|
|
|
|
|
|
|
|
|
buildParser (grammarAst) -> DFA
|
|
|
|
|
The buildParser() function will accept an AST object for input
|
|
|
|
|
and return a DFA (deterministic finite automaton) data
|
|
|
|
|
structure. The DFA data structure will be a C extension
|
|
|
|
|
class, much like the AST structure is provided in the parser
|
|
|
|
|
module. If the input AST does not conform to the nonterminal
|
|
|
|
|
codes defined for the pgen meta-grammar, buildParser() will
|
|
|
|
|
throw a ValueError exception.
|
|
|
|
|
|
|
|
|
|
parseFile (fileName, dfa, start) -> AST
|
|
|
|
|
The parseFile() function will essentially be a wrapper for the
|
|
|
|
|
PyParser_ParseFile() C API function. The wrapper code will
|
|
|
|
|
accept the DFA C extension class, and the file name. An AST
|
|
|
|
|
instance that conforms to the lexical values in the token
|
|
|
|
|
module and the nonterminal values contained in the DFA will be
|
|
|
|
|
output.
|
|
|
|
|
|
|
|
|
|
parseString (text, dfa, start) -> AST
|
|
|
|
|
The parseString() function will operate in a similar fashion
|
|
|
|
|
to the parseFile() function, but accept the parse text as an
|
|
|
|
|
argument. Much like parseFile() will wrap the
|
|
|
|
|
PyParser_ParseFile() C API function, parseString() will wrap
|
|
|
|
|
the PyParser_ParseString() function.
|
|
|
|
|
|
|
|
|
|
symbolToStringMap (dfa) -> dict
|
|
|
|
|
The symbolToStringMap() function will accept a DFA instance
|
|
|
|
|
and return a dictionary object that maps from the DFA's
|
|
|
|
|
numeric values for its nonterminals to the string names of the
|
|
|
|
|
nonterminals as found in the original grammar specification
|
|
|
|
|
for the DFA.
|
|
|
|
|
|
|
|
|
|
stringToSymbolMap (dfa) -> dict
|
|
|
|
|
The stringToSymbolMap() function output a dictionary mapping
|
|
|
|
|
the nonterminal names of the input DFA to their corresponding
|
|
|
|
|
numeric values.
|
|
|
|
|
|
|
|
|
|
Extra credit will be awarded if the map generation functions and
|
|
|
|
|
parsing functions are also methods of the DFA extension class.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation Plan
|
|
|
|
|
|
|
|
|
|
A cunning plan has been devised to accomplish this enhancement:
|
|
|
|
|
|
|
|
|
|
1. Rename the pgen functions to conform to the CPython naming
|
|
|
|
|
standards. This action may involve adding some header files to
|
|
|
|
|
the Include subdirectory.
|
|
|
|
|
|
|
|
|
|
2. Move the pgen C modules in the Makefile.pre.in from unique pgen
|
|
|
|
|
elements to the Python C library.
|
|
|
|
|
|
|
|
|
|
3. Make any needed changes to the parser module so the AST
|
|
|
|
|
extension class understands that there are AST types it may not
|
|
|
|
|
understand. Cursory examination of the AST extension class
|
|
|
|
|
shows that it keeps track of whether the tree is a suite or an
|
|
|
|
|
expression.
|
|
|
|
|
|
|
|
|
|
3. Code an additional C module in the Modules directory. The C
|
|
|
|
|
extension module will implement the DFA extension class and the
|
|
|
|
|
functions outlined in the previous section.
|
|
|
|
|
|
|
|
|
|
4. Add the new module to the build process. Black magic, indeed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Limitations
|
|
|
|
|
|
|
|
|
|
Under this proposal, would be designers of Python 3000 will still
|
|
|
|
|
be constrained to Python's lexical conventions. The addition,
|
|
|
|
|
subtraction or modification of the Python lexer is outside the
|
|
|
|
|
scope of this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
|
2004-08-18 07:56:16 -04:00
|
|
|
|
No reference implementation is currently provided. A patch
|
|
|
|
|
was provided at some point in
|
|
|
|
|
http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470
|
|
|
|
|
but that patch is no longer maintained.
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] The (defunct) Python Compiler-SIG
|
2002-04-05 14:42:56 -05:00
|
|
|
|
http://www.python.org/sigs/compiler-sig/
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
|
|
|
|
[2] Parser Module Documentation
|
2008-10-02 08:51:05 -04:00
|
|
|
|
http://docs.python.org/library/parser.html
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
2002-04-05 14:42:56 -05:00
|
|
|
|
[3] Hylton, Jeremy.
|
2008-10-02 08:51:05 -04:00
|
|
|
|
http://docs.python.org/library/compiler.html
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
|
|
|
|
[4] Pelletier, Michel. "Python Interface Syntax", PEP-245.
|
2009-01-18 04:50:42 -05:00
|
|
|
|
http://www.python.org/dev/peps/pep-0245/
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
|
|
|
|
[5] The Python Types-SIG
|
2002-04-05 14:42:56 -05:00
|
|
|
|
http://www.python.org/sigs/types-sig/
|
2001-09-07 18:35:39 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|