186 lines
6.9 KiB
Plaintext
186 lines
6.9 KiB
Plaintext
PEP: 269
|
||
Title: Pgen Module for Python
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: jriehl@spaceship.com (Jonathan Riehl)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 24-Aug-2001
|
||
Python-Version: 2.2
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
Much like the parser module exposes the Python parser, this PEP
|
||
proposes that the parser generator used to create the Python
|
||
parser, pgen, be exposed as a module in Python.
|
||
|
||
|
||
Rationale
|
||
|
||
Through the course of Pythonic history, there have been numerous
|
||
discussions about the creation of a Python compiler [1]. These
|
||
have resulted in several implementations of Python parsers, most
|
||
notably the parser module currently provided in the Python
|
||
standard library[2] and Jeremy Hylton's compiler module[3].
|
||
However, while multiple language changes have been proposed
|
||
[4][5], experimentation with the Python syntax has lacked the
|
||
benefit of a Python binding to the actual parser generator used to
|
||
build Python.
|
||
|
||
By providing a Python wrapper analogous to Fred Drake Jr.'s parser
|
||
wrapper, but targeted at the pgen library, the following
|
||
assertions are made:
|
||
|
||
1. Reference implementations of syntax changes will be easier to
|
||
develop. Currently, a reference implementation of a syntax
|
||
change would require the developer to use the pgen tool from
|
||
the command line. The resulting parser data structure would
|
||
then either have to be reworked to interface with a custom
|
||
CPython implementation, or wrapped as a C extension module.
|
||
|
||
2. Reference implementations of syntax changes will be easier to
|
||
distribute. Since the parser generator will be available in
|
||
Python, it should follow that the resulting parser will
|
||
accessible from Python. Therefore, reference implementations
|
||
should be available as pure Python code, versus using custom
|
||
versions of the existing CPython distribution, or as compilable
|
||
extension modules.
|
||
|
||
3. Reference implementations of syntax changes will be easier to
|
||
discuss with a larger audience. This somewhat falls out of the
|
||
second assertion, since the community of Python users is most
|
||
likely larger than the community of CPython developers.
|
||
|
||
4. Development of small languages in Python will be further
|
||
enhanced, since the additional module will be a fully
|
||
functional LL(1) parser generator.
|
||
|
||
|
||
Specification
|
||
|
||
The proposed module will be called pgen. The pgen module will
|
||
contain the following functions:
|
||
|
||
parseGrammarFile (fileName) -> AST
|
||
The parseGrammarFile() function will read the file pointed to
|
||
by fileName and create an AST object. The AST nodes will
|
||
contain the nonterminal, numeric values of the parser
|
||
generator meta-grammar. The output AST will be an instance of
|
||
the AST extension class as provided by the parser module.
|
||
Syntax errors in the input file will cause the SyntaxError
|
||
exception to be raised.
|
||
|
||
parseGrammarString (text) -> AST
|
||
The parseGrammarString() function will follow the semantics of
|
||
the parseGrammarFile(), but accept the grammar text as a
|
||
string for input, as opposed to the file name.
|
||
|
||
buildParser (grammarAst) -> DFA
|
||
The buildParser() function will accept an AST object for input
|
||
and return a DFA (deterministic finite automaton) data
|
||
structure. The DFA data structure will be a C extension
|
||
class, much like the AST structure is provided in the parser
|
||
module. If the input AST does not conform to the nonterminal
|
||
codes defined for the pgen meta-grammar, buildParser() will
|
||
throw a ValueError exception.
|
||
|
||
parseFile (fileName, dfa, start) -> AST
|
||
The parseFile() function will essentially be a wrapper for the
|
||
PyParser_ParseFile() C API function. The wrapper code will
|
||
accept the DFA C extension class, and the file name. An AST
|
||
instance that conforms to the lexical values in the token
|
||
module and the nonterminal values contained in the DFA will be
|
||
output.
|
||
|
||
parseString (text, dfa, start) -> AST
|
||
The parseString() function will operate in a similar fashion
|
||
to the parseFile() function, but accept the parse text as an
|
||
argument. Much like parseFile() will wrap the
|
||
PyParser_ParseFile() C API function, parseString() will wrap
|
||
the PyParser_ParseString() function.
|
||
|
||
symbolToStringMap (dfa) -> dict
|
||
The symbolToStringMap() function will accept a DFA instance
|
||
and return a dictionary object that maps from the DFA's
|
||
numeric values for its nonterminals to the string names of the
|
||
nonterminals as found in the original grammar specification
|
||
for the DFA.
|
||
|
||
stringToSymbolMap (dfa) -> dict
|
||
The stringToSymbolMap() function output a dictionary mapping
|
||
the nonterminal names of the input DFA to their corresponding
|
||
numeric values.
|
||
|
||
Extra credit will be awarded if the map generation functions and
|
||
parsing functions are also methods of the DFA extension class.
|
||
|
||
|
||
Implementation Plan
|
||
|
||
A cunning plan has been devised to accomplish this enhancement:
|
||
|
||
1. Rename the pgen functions to conform to the CPython naming
|
||
standards. This action may involve adding some header files to
|
||
the Include subdirectory.
|
||
|
||
2. Move the pgen C modules in the Makefile.pre.in from unique pgen
|
||
elements to the Python C library.
|
||
|
||
3. Make any needed changes to the parser module so the AST
|
||
extension class understands that there are AST types it may not
|
||
understand. Cursory examination of the AST extension class
|
||
shows that it keeps track of whether the tree is a suite or an
|
||
expression.
|
||
|
||
3. Code an additional C module in the Modules directory. The C
|
||
extension module will implement the DFA extension class and the
|
||
functions outlined in the previous section.
|
||
|
||
4. Add the new module to the build process. Black magic, indeed.
|
||
|
||
|
||
Limitations
|
||
|
||
Under this proposal, would be designers of Python 3000 will still
|
||
be constrained to Python's lexical conventions. The addition,
|
||
subtraction or modification of the Python lexer is outside the
|
||
scope of this PEP.
|
||
|
||
|
||
Reference Implementation
|
||
|
||
No reference implementation is currently provided.
|
||
|
||
|
||
References
|
||
|
||
[1] The (defunct) Python Compiler-SIG
|
||
http://www.python.org/sigs/compiler-sig/
|
||
|
||
[2] Parser Module Documentation
|
||
http://www.python.org/doc/current/lib/module-parser.html
|
||
|
||
[3] Hylton, Jeremy.
|
||
http://www.python.org/doc/current/lib/compiler.html
|
||
|
||
[4] Pelletier, Michel. "Python Interface Syntax", PEP-245.
|
||
http://www.python.org/peps/pep-0245.html
|
||
|
||
[5] The Python Types-SIG
|
||
http://www.python.org/sigs/types-sig/
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|