209 lines
6.9 KiB
ReStructuredText
209 lines
6.9 KiB
ReStructuredText
PEP: 269
|
|
Title: Pgen Module for Python
|
|
Author: Jonathan Riehl <jriehl@spaceship.com>
|
|
Status: Deferred
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 24-Aug-2001
|
|
Python-Version: 2.2
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Much like the ``parser`` module exposes the Python parser, this PEP
|
|
proposes that the parser generator used to create the Python
|
|
parser, ``pgen``, be exposed as a module in Python.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Through the course of Pythonic history, there have been numerous
|
|
discussions about the creation of a Python compiler [1]_. These
|
|
have resulted in several implementations of Python parsers, most
|
|
notably the ``parser`` module currently provided in the Python
|
|
standard library [2]_ and Jeremy Hylton's ``compiler`` module [3]_.
|
|
However, while multiple language changes have been proposed
|
|
[4]_ [5]_, experimentation with the Python syntax has lacked the
|
|
benefit of a Python binding to the actual parser generator used to
|
|
build Python.
|
|
|
|
By providing a Python wrapper analogous to Fred Drake Jr.'s parser
|
|
wrapper, but targeted at the ``pgen`` library, the following
|
|
assertions are made:
|
|
|
|
1. Reference implementations of syntax changes will be easier to
|
|
develop. Currently, a reference implementation of a syntax
|
|
change would require the developer to use the ``pgen`` tool from
|
|
the command line. The resulting parser data structure would
|
|
then either have to be reworked to interface with a custom
|
|
CPython implementation, or wrapped as a C extension module.
|
|
|
|
2. Reference implementations of syntax changes will be easier to
|
|
distribute. Since the parser generator will be available in
|
|
Python, it should follow that the resulting parser will
|
|
accessible from Python. Therefore, reference implementations
|
|
should be available as pure Python code, versus using custom
|
|
versions of the existing CPython distribution, or as compilable
|
|
extension modules.
|
|
|
|
3. Reference implementations of syntax changes will be easier to
|
|
discuss with a larger audience. This somewhat falls out of the
|
|
second assertion, since the community of Python users is most
|
|
likely larger than the community of CPython developers.
|
|
|
|
4. Development of small languages in Python will be further
|
|
enhanced, since the additional module will be a fully
|
|
functional LL(1) parser generator.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
The proposed module will be called ``pgen``. The ``pgen`` module will
|
|
contain the following functions:
|
|
|
|
|
|
``parseGrammarFile (fileName) -> AST``
|
|
--------------------------------------
|
|
|
|
The ``parseGrammarFile()`` function will read the file pointed to
|
|
by fileName and create an AST object. The AST nodes will
|
|
contain the nonterminal, numeric values of the parser
|
|
generator meta-grammar. The output AST will be an instance of
|
|
the AST extension class as provided by the ``parser`` module.
|
|
Syntax errors in the input file will cause the SyntaxError
|
|
exception to be raised.
|
|
|
|
|
|
``parseGrammarString (text) -> AST``
|
|
------------------------------------
|
|
|
|
The ``parseGrammarString()`` function will follow the semantics of
|
|
the ``parseGrammarFile()``, but accept the grammar text as a
|
|
string for input, as opposed to the file name.
|
|
|
|
|
|
``buildParser (grammarAst) -> DFA``
|
|
-----------------------------------
|
|
|
|
The ``buildParser()`` function will accept an AST object for input
|
|
and return a DFA (deterministic finite automaton) data
|
|
structure. The DFA data structure will be a C extension
|
|
class, much like the AST structure is provided in the ``parser``
|
|
module. If the input AST does not conform to the nonterminal
|
|
codes defined for the ``pgen`` meta-grammar, ``buildParser()`` will
|
|
throw a ``ValueError`` exception.
|
|
|
|
|
|
``parseFile (fileName, dfa, start) -> AST``
|
|
-------------------------------------------
|
|
|
|
The ``parseFile()`` function will essentially be a wrapper for the
|
|
``PyParser_ParseFile()`` C API function. The wrapper code will
|
|
accept the DFA C extension class, and the file name. An AST
|
|
instance that conforms to the lexical values in the ``token``
|
|
module and the nonterminal values contained in the DFA will be
|
|
output.
|
|
|
|
|
|
``parseString (text, dfa, start) -> AST``
|
|
-----------------------------------------
|
|
|
|
The ``parseString()`` function will operate in a similar fashion
|
|
to the ``parseFile()`` function, but accept the parse text as an
|
|
argument. Much like ``parseFile()`` will wrap the
|
|
``PyParser_ParseFile()`` C API function, ``parseString()`` will wrap
|
|
the ``PyParser_ParseString()`` function.
|
|
|
|
|
|
``symbolToStringMap (dfa) -> dict``
|
|
-----------------------------------
|
|
|
|
The ``symbolToStringMap()`` function will accept a DFA instance
|
|
and return a dictionary object that maps from the DFA's
|
|
numeric values for its nonterminals to the string names of the
|
|
nonterminals as found in the original grammar specification
|
|
for the DFA.
|
|
|
|
|
|
``stringToSymbolMap (dfa) -> dict``
|
|
-----------------------------------
|
|
|
|
The ``stringToSymbolMap()`` function output a dictionary mapping
|
|
the nonterminal names of the input DFA to their corresponding
|
|
numeric values.
|
|
|
|
|
|
Extra credit will be awarded if the map generation functions and
|
|
parsing functions are also methods of the DFA extension class.
|
|
|
|
|
|
Implementation Plan
|
|
===================
|
|
|
|
A cunning plan has been devised to accomplish this enhancement:
|
|
|
|
1. Rename the ``pgen`` functions to conform to the CPython naming
|
|
standards. This action may involve adding some header files to
|
|
the ``Include`` subdirectory.
|
|
|
|
2. Move the ``pgen`` C modules in the Makefile.pre.in from unique ``pgen``
|
|
elements to the Python C library.
|
|
|
|
3. Make any needed changes to the ``parser`` module so the AST
|
|
extension class understands that there are AST types it may not
|
|
understand. Cursory examination of the AST extension class
|
|
shows that it keeps track of whether the tree is a suite or an
|
|
expression.
|
|
|
|
3. Code an additional C module in the ``Modules`` directory. The C
|
|
extension module will implement the DFA extension class and the
|
|
functions outlined in the previous section.
|
|
|
|
4. Add the new module to the build process. Black magic, indeed.
|
|
|
|
|
|
Limitations
|
|
===========
|
|
|
|
Under this proposal, would be designers of Python 3000 will still
|
|
be constrained to Python's lexical conventions. The addition,
|
|
subtraction or modification of the Python lexer is outside the
|
|
scope of this PEP.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
No reference implementation is currently provided. A patch
|
|
was provided at some point in
|
|
http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470
|
|
but that patch is no longer maintained.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] The (defunct) Python Compiler-SIG
|
|
http://www.python.org/sigs/compiler-sig/
|
|
|
|
.. [2] Parser Module Documentation
|
|
http://docs.python.org/library/parser.html
|
|
|
|
.. [3] Hylton, Jeremy.
|
|
http://docs.python.org/library/compiler.html
|
|
|
|
.. [4] Pelletier, Michel. "Python Interface Syntax", :pep:`245`
|
|
|
|
.. [5] The Python Types-SIG
|
|
http://www.python.org/sigs/types-sig/
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|