python-peps/peps/pep-0269.rst

209 lines
6.9 KiB
ReStructuredText

PEP: 269
Title: Pgen Module for Python
Author: Jonathan Riehl <jriehl@spaceship.com>
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 24-Aug-2001
Python-Version: 2.2
Post-History:
Abstract
========
Much like the ``parser`` module exposes the Python parser, this PEP
proposes that the parser generator used to create the Python
parser, ``pgen``, be exposed as a module in Python.
Rationale
=========
Through the course of Pythonic history, there have been numerous
discussions about the creation of a Python compiler [1]_. These
have resulted in several implementations of Python parsers, most
notably the ``parser`` module currently provided in the Python
standard library [2]_ and Jeremy Hylton's ``compiler`` module [3]_.
However, while multiple language changes have been proposed
[4]_ [5]_, experimentation with the Python syntax has lacked the
benefit of a Python binding to the actual parser generator used to
build Python.
By providing a Python wrapper analogous to Fred Drake Jr.'s parser
wrapper, but targeted at the ``pgen`` library, the following
assertions are made:
1. Reference implementations of syntax changes will be easier to
develop. Currently, a reference implementation of a syntax
change would require the developer to use the ``pgen`` tool from
the command line. The resulting parser data structure would
then either have to be reworked to interface with a custom
CPython implementation, or wrapped as a C extension module.
2. Reference implementations of syntax changes will be easier to
distribute. Since the parser generator will be available in
Python, it should follow that the resulting parser will
accessible from Python. Therefore, reference implementations
should be available as pure Python code, versus using custom
versions of the existing CPython distribution, or as compilable
extension modules.
3. Reference implementations of syntax changes will be easier to
discuss with a larger audience. This somewhat falls out of the
second assertion, since the community of Python users is most
likely larger than the community of CPython developers.
4. Development of small languages in Python will be further
enhanced, since the additional module will be a fully
functional LL(1) parser generator.
Specification
=============
The proposed module will be called ``pgen``. The ``pgen`` module will
contain the following functions:
``parseGrammarFile (fileName) -> AST``
--------------------------------------
The ``parseGrammarFile()`` function will read the file pointed to
by fileName and create an AST object. The AST nodes will
contain the nonterminal, numeric values of the parser
generator meta-grammar. The output AST will be an instance of
the AST extension class as provided by the ``parser`` module.
Syntax errors in the input file will cause the SyntaxError
exception to be raised.
``parseGrammarString (text) -> AST``
------------------------------------
The ``parseGrammarString()`` function will follow the semantics of
the ``parseGrammarFile()``, but accept the grammar text as a
string for input, as opposed to the file name.
``buildParser (grammarAst) -> DFA``
-----------------------------------
The ``buildParser()`` function will accept an AST object for input
and return a DFA (deterministic finite automaton) data
structure. The DFA data structure will be a C extension
class, much like the AST structure is provided in the ``parser``
module. If the input AST does not conform to the nonterminal
codes defined for the ``pgen`` meta-grammar, ``buildParser()`` will
throw a ``ValueError`` exception.
``parseFile (fileName, dfa, start) -> AST``
-------------------------------------------
The ``parseFile()`` function will essentially be a wrapper for the
``PyParser_ParseFile()`` C API function. The wrapper code will
accept the DFA C extension class, and the file name. An AST
instance that conforms to the lexical values in the ``token``
module and the nonterminal values contained in the DFA will be
output.
``parseString (text, dfa, start) -> AST``
-----------------------------------------
The ``parseString()`` function will operate in a similar fashion
to the ``parseFile()`` function, but accept the parse text as an
argument. Much like ``parseFile()`` will wrap the
``PyParser_ParseFile()`` C API function, ``parseString()`` will wrap
the ``PyParser_ParseString()`` function.
``symbolToStringMap (dfa) -> dict``
-----------------------------------
The ``symbolToStringMap()`` function will accept a DFA instance
and return a dictionary object that maps from the DFA's
numeric values for its nonterminals to the string names of the
nonterminals as found in the original grammar specification
for the DFA.
``stringToSymbolMap (dfa) -> dict``
-----------------------------------
The ``stringToSymbolMap()`` function output a dictionary mapping
the nonterminal names of the input DFA to their corresponding
numeric values.
Extra credit will be awarded if the map generation functions and
parsing functions are also methods of the DFA extension class.
Implementation Plan
===================
A cunning plan has been devised to accomplish this enhancement:
1. Rename the ``pgen`` functions to conform to the CPython naming
standards. This action may involve adding some header files to
the ``Include`` subdirectory.
2. Move the ``pgen`` C modules in the Makefile.pre.in from unique ``pgen``
elements to the Python C library.
3. Make any needed changes to the ``parser`` module so the AST
extension class understands that there are AST types it may not
understand. Cursory examination of the AST extension class
shows that it keeps track of whether the tree is a suite or an
expression.
3. Code an additional C module in the ``Modules`` directory. The C
extension module will implement the DFA extension class and the
functions outlined in the previous section.
4. Add the new module to the build process. Black magic, indeed.
Limitations
===========
Under this proposal, would be designers of Python 3000 will still
be constrained to Python's lexical conventions. The addition,
subtraction or modification of the Python lexer is outside the
scope of this PEP.
Reference Implementation
========================
No reference implementation is currently provided. A patch
was provided at some point in
http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470
but that patch is no longer maintained.
References
==========
.. [1] The (defunct) Python Compiler-SIG
http://www.python.org/sigs/compiler-sig/
.. [2] Parser Module Documentation
http://docs.python.org/library/parser.html
.. [3] Hylton, Jeremy.
http://docs.python.org/library/compiler.html
.. [4] Pelletier, Michel. "Python Interface Syntax", :pep:`245`
.. [5] The Python Types-SIG
http://www.python.org/sigs/types-sig/
Copyright
=========
This document has been placed in the public domain.