PEP 269, Pgen Module for Python, Jonathan Riehl
This commit is contained in:
parent
a4d67b7878
commit
71f65547f6
|
@ -0,0 +1,184 @@
|
|||
PEP: 269
|
||||
Title: Pgen Module for Python
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: jriehl@spaceship.com (Jonathan Riehl)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 24-Aug-2001
|
||||
Python-Version: 2.2
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
Much like the parser module exposes the Python parser, this PEP
|
||||
proposes that the parser generator used to create the Python
|
||||
parser, pgen, be exposed as a module in Python.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Through the course of Pythonic history, there have been numerous
|
||||
discussions about the creation of a Python compiler [1]. These
|
||||
have resulted in several implementations of Python parsers, most
|
||||
notably the parser module currently provided in the Python
|
||||
standard library[2] and Jeremy Hylton's compiler module[3].
|
||||
However, while multiple language changes have been proposed
|
||||
[4][5], experimentation with the Python syntax has lacked the
|
||||
benefit of a Python binding to the actual parser generator used to
|
||||
build Python.
|
||||
|
||||
By providing a Python wrapper analogous to Fred Drake Jr.'s parser
|
||||
wrapper, but targeted at the pgen library, the following
|
||||
assertions are made:
|
||||
|
||||
1. Reference implementations of syntax changes will be easier to
|
||||
develop. Currently, a reference implementation of a syntax
|
||||
change would require the developer to use the pgen tool from
|
||||
the command line. The resulting parser data structure would
|
||||
then either have to be reworked to interface with a custom
|
||||
CPython implementation, or wrapped as a C extension module.
|
||||
|
||||
2. Reference implementations of syntax changes will be easier to
|
||||
distribute. Since the parser generator will be available in
|
||||
Python, it should follow that the resulting parser will
|
||||
accessible from Python. Therefore, reference implementations
|
||||
should be available as pure Python code, versus using custom
|
||||
versions of the existing CPython distribution, or as compilable
|
||||
extension modules.
|
||||
|
||||
3. Reference implementations of syntax changes will be easier to
|
||||
discuss with a larger audience. This somewhat falls out of the
|
||||
second assertion, since the community of Python users is most
|
||||
likely larger than the community of CPython developers.
|
||||
|
||||
4. Development of small languages in Python will be further
|
||||
enhanced, since the additional module will be a fully
|
||||
functional LL(1) parser generator.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
The proposed module will be called pgen. The pgen module will
|
||||
contain the following functions:
|
||||
|
||||
parseGrammarFile (fileName) -> AST
|
||||
The parseGrammarFile() function will read the file pointed to
|
||||
by fileName and create an AST object. The AST nodes will
|
||||
contain the nonterminal, numeric values of the parser
|
||||
generator meta-grammar. The output AST will be an instance of
|
||||
the AST extension class as provided by the parser module.
|
||||
Syntax errors in the input file will cause the SyntaxError
|
||||
exception to be raised.
|
||||
|
||||
parseGrammarString (text) -> AST
|
||||
The parseGrammarString() function will follow the semantics of
|
||||
the parseGrammarFile(), but accept the grammar text as a
|
||||
string for input, as opposed to the file name.
|
||||
|
||||
buildParser (grammarAst) -> DFA
|
||||
The buildParser() function will accept an AST object for input
|
||||
and return a DFA (deterministic finite automaton) data
|
||||
structure. The DFA data structure will be a C extension
|
||||
class, much like the AST structure is provided in the parser
|
||||
module. If the input AST does not conform to the nonterminal
|
||||
codes defined for the pgen meta-grammar, buildParser() will
|
||||
throw a ValueError exception.
|
||||
|
||||
parseFile (fileName, dfa, start) -> AST
|
||||
The parseFile() function will essentially be a wrapper for the
|
||||
PyParser_ParseFile() C API function. The wrapper code will
|
||||
accept the DFA C extension class, and the file name. An AST
|
||||
instance that conforms to the lexical values in the token
|
||||
module and the nonterminal values contained in the DFA will be
|
||||
output.
|
||||
|
||||
parseString (text, dfa, start) -> AST
|
||||
The parseString() function will operate in a similar fashion
|
||||
to the parseFile() function, but accept the parse text as an
|
||||
argument. Much like parseFile() will wrap the
|
||||
PyParser_ParseFile() C API function, parseString() will wrap
|
||||
the PyParser_ParseString() function.
|
||||
|
||||
symbolToStringMap (dfa) -> dict
|
||||
The symbolToStringMap() function will accept a DFA instance
|
||||
and return a dictionary object that maps from the DFA's
|
||||
numeric values for its nonterminals to the string names of the
|
||||
nonterminals as found in the original grammar specification
|
||||
for the DFA.
|
||||
|
||||
stringToSymbolMap (dfa) -> dict
|
||||
The stringToSymbolMap() function output a dictionary mapping
|
||||
the nonterminal names of the input DFA to their corresponding
|
||||
numeric values.
|
||||
|
||||
Extra credit will be awarded if the map generation functions and
|
||||
parsing functions are also methods of the DFA extension class.
|
||||
|
||||
|
||||
Implementation Plan
|
||||
|
||||
A cunning plan has been devised to accomplish this enhancement:
|
||||
|
||||
1. Rename the pgen functions to conform to the CPython naming
|
||||
standards. This action may involve adding some header files to
|
||||
the Include subdirectory.
|
||||
|
||||
2. Move the pgen C modules in the Makefile.pre.in from unique pgen
|
||||
elements to the Python C library.
|
||||
|
||||
3. Make any needed changes to the parser module so the AST
|
||||
extension class understands that there are AST types it may not
|
||||
understand. Cursory examination of the AST extension class
|
||||
shows that it keeps track of whether the tree is a suite or an
|
||||
expression.
|
||||
|
||||
3. Code an additional C module in the Modules directory. The C
|
||||
extension module will implement the DFA extension class and the
|
||||
functions outlined in the previous section.
|
||||
|
||||
4. Add the new module to the build process. Black magic, indeed.
|
||||
|
||||
|
||||
Limitations
|
||||
|
||||
Under this proposal, would be designers of Python 3000 will still
|
||||
be constrained to Python's lexical conventions. The addition,
|
||||
subtraction or modification of the Python lexer is outside the
|
||||
scope of this PEP.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
No reference implementation is currently provided.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] The (defunct) Python Compiler-SIG
|
||||
http://python.org/sigs/compiler-sig/
|
||||
|
||||
[2] Parser Module Documentation
|
||||
http://python.org/doc/current/lib/module-parser.html
|
||||
|
||||
[3] Hylton, Jeremy. FIXME: Reference Compiler Document
|
||||
|
||||
[4] Pelletier, Michel. "Python Interface Syntax", PEP-245.
|
||||
http://python.sourceforge.net/peps/pep-0245.html
|
||||
|
||||
[5] The Python Types-SIG
|
||||
http://python.org/sigs/types-sig/
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue