python-peps/pep-0269/index.html

328 lines
25 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 269 Pgen Module for Python | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0269/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 269 Pgen Module for Python | peps.python.org'>
<meta property="og:description" content="Much like the parser module exposes the Python parser, this PEP proposes that the parser generator used to create the Python parser, pgen, be exposed as a module in Python.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0269/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="Much like the parser module exposes the Python parser, this PEP proposes that the parser generator used to create the Python parser, pgen, be exposed as a module in Python.">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 269</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 269 Pgen Module for Python</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Jonathan Riehl &lt;jriehl&#32;&#97;t&#32;spaceship.com&gt;</dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Inactive draft that may be taken up again at a later time">Deferred</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">24-Aug-2001</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">2.2</dd>
<dt class="field-even">Post-History<span class="colon">:</span></dt>
<dd class="field-even"><p></p></dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#specification">Specification</a><ul>
<li><a class="reference internal" href="#parsegrammarfile-filename-ast"><code class="docutils literal notranslate"><span class="pre">parseGrammarFile</span> <span class="pre">(fileName)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#parsegrammarstring-text-ast"><code class="docutils literal notranslate"><span class="pre">parseGrammarString</span> <span class="pre">(text)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#buildparser-grammarast-dfa"><code class="docutils literal notranslate"><span class="pre">buildParser</span> <span class="pre">(grammarAst)</span> <span class="pre">-&gt;</span> <span class="pre">DFA</span></code></a></li>
<li><a class="reference internal" href="#parsefile-filename-dfa-start-ast"><code class="docutils literal notranslate"><span class="pre">parseFile</span> <span class="pre">(fileName,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#parsestring-text-dfa-start-ast"><code class="docutils literal notranslate"><span class="pre">parseString</span> <span class="pre">(text,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#symboltostringmap-dfa-dict"><code class="docutils literal notranslate"><span class="pre">symbolToStringMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></li>
<li><a class="reference internal" href="#stringtosymbolmap-dfa-dict"><code class="docutils literal notranslate"><span class="pre">stringToSymbolMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation-plan">Implementation Plan</a></li>
<li><a class="reference internal" href="#limitations">Limitations</a></li>
<li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>Much like the <code class="docutils literal notranslate"><span class="pre">parser</span></code> module exposes the Python parser, this PEP
proposes that the parser generator used to create the Python
parser, <code class="docutils literal notranslate"><span class="pre">pgen</span></code>, be exposed as a module in Python.</p>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<p>Through the course of Pythonic history, there have been numerous
discussions about the creation of a Python compiler <a class="footnote-reference brackets" href="#id6" id="id1">[1]</a>. These
have resulted in several implementations of Python parsers, most
notably the <code class="docutils literal notranslate"><span class="pre">parser</span></code> module currently provided in the Python
standard library <a class="footnote-reference brackets" href="#id7" id="id2">[2]</a> and Jeremy Hyltons <code class="docutils literal notranslate"><span class="pre">compiler</span></code> module <a class="footnote-reference brackets" href="#id8" id="id3">[3]</a>.
However, while multiple language changes have been proposed
<a class="footnote-reference brackets" href="#id9" id="id4">[4]</a> <a class="footnote-reference brackets" href="#id10" id="id5">[5]</a>, experimentation with the Python syntax has lacked the
benefit of a Python binding to the actual parser generator used to
build Python.</p>
<p>By providing a Python wrapper analogous to Fred Drake Jr.s parser
wrapper, but targeted at the <code class="docutils literal notranslate"><span class="pre">pgen</span></code> library, the following
assertions are made:</p>
<ol class="arabic simple">
<li>Reference implementations of syntax changes will be easier to
develop. Currently, a reference implementation of a syntax
change would require the developer to use the <code class="docutils literal notranslate"><span class="pre">pgen</span></code> tool from
the command line. The resulting parser data structure would
then either have to be reworked to interface with a custom
CPython implementation, or wrapped as a C extension module.</li>
<li>Reference implementations of syntax changes will be easier to
distribute. Since the parser generator will be available in
Python, it should follow that the resulting parser will
accessible from Python. Therefore, reference implementations
should be available as pure Python code, versus using custom
versions of the existing CPython distribution, or as compilable
extension modules.</li>
<li>Reference implementations of syntax changes will be easier to
discuss with a larger audience. This somewhat falls out of the
second assertion, since the community of Python users is most
likely larger than the community of CPython developers.</li>
<li>Development of small languages in Python will be further
enhanced, since the additional module will be a fully
functional LL(1) parser generator.</li>
</ol>
</section>
<section id="specification">
<h2><a class="toc-backref" href="#specification" role="doc-backlink">Specification</a></h2>
<p>The proposed module will be called <code class="docutils literal notranslate"><span class="pre">pgen</span></code>. The <code class="docutils literal notranslate"><span class="pre">pgen</span></code> module will
contain the following functions:</p>
<section id="parsegrammarfile-filename-ast">
<h3><a class="toc-backref" href="#parsegrammarfile-filename-ast" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">parseGrammarFile</span> <span class="pre">(fileName)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">parseGrammarFile()</span></code> function will read the file pointed to
by fileName and create an AST object. The AST nodes will
contain the nonterminal, numeric values of the parser
generator meta-grammar. The output AST will be an instance of
the AST extension class as provided by the <code class="docutils literal notranslate"><span class="pre">parser</span></code> module.
Syntax errors in the input file will cause the SyntaxError
exception to be raised.</p>
</section>
<section id="parsegrammarstring-text-ast">
<h3><a class="toc-backref" href="#parsegrammarstring-text-ast" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">parseGrammarString</span> <span class="pre">(text)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">parseGrammarString()</span></code> function will follow the semantics of
the <code class="docutils literal notranslate"><span class="pre">parseGrammarFile()</span></code>, but accept the grammar text as a
string for input, as opposed to the file name.</p>
</section>
<section id="buildparser-grammarast-dfa">
<h3><a class="toc-backref" href="#buildparser-grammarast-dfa" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">buildParser</span> <span class="pre">(grammarAst)</span> <span class="pre">-&gt;</span> <span class="pre">DFA</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">buildParser()</span></code> function will accept an AST object for input
and return a DFA (deterministic finite automaton) data
structure. The DFA data structure will be a C extension
class, much like the AST structure is provided in the <code class="docutils literal notranslate"><span class="pre">parser</span></code>
module. If the input AST does not conform to the nonterminal
codes defined for the <code class="docutils literal notranslate"><span class="pre">pgen</span></code> meta-grammar, <code class="docutils literal notranslate"><span class="pre">buildParser()</span></code> will
throw a <code class="docutils literal notranslate"><span class="pre">ValueError</span></code> exception.</p>
</section>
<section id="parsefile-filename-dfa-start-ast">
<h3><a class="toc-backref" href="#parsefile-filename-dfa-start-ast" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">parseFile</span> <span class="pre">(fileName,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">parseFile()</span></code> function will essentially be a wrapper for the
<code class="docutils literal notranslate"><span class="pre">PyParser_ParseFile()</span></code> C API function. The wrapper code will
accept the DFA C extension class, and the file name. An AST
instance that conforms to the lexical values in the <code class="docutils literal notranslate"><span class="pre">token</span></code>
module and the nonterminal values contained in the DFA will be
output.</p>
</section>
<section id="parsestring-text-dfa-start-ast">
<h3><a class="toc-backref" href="#parsestring-text-dfa-start-ast" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">parseString</span> <span class="pre">(text,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">parseString()</span></code> function will operate in a similar fashion
to the <code class="docutils literal notranslate"><span class="pre">parseFile()</span></code> function, but accept the parse text as an
argument. Much like <code class="docutils literal notranslate"><span class="pre">parseFile()</span></code> will wrap the
<code class="docutils literal notranslate"><span class="pre">PyParser_ParseFile()</span></code> C API function, <code class="docutils literal notranslate"><span class="pre">parseString()</span></code> will wrap
the <code class="docutils literal notranslate"><span class="pre">PyParser_ParseString()</span></code> function.</p>
</section>
<section id="symboltostringmap-dfa-dict">
<h3><a class="toc-backref" href="#symboltostringmap-dfa-dict" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">symbolToStringMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">symbolToStringMap()</span></code> function will accept a DFA instance
and return a dictionary object that maps from the DFAs
numeric values for its nonterminals to the string names of the
nonterminals as found in the original grammar specification
for the DFA.</p>
</section>
<section id="stringtosymbolmap-dfa-dict">
<h3><a class="toc-backref" href="#stringtosymbolmap-dfa-dict" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">stringToSymbolMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">stringToSymbolMap()</span></code> function output a dictionary mapping
the nonterminal names of the input DFA to their corresponding
numeric values.</p>
<p>Extra credit will be awarded if the map generation functions and
parsing functions are also methods of the DFA extension class.</p>
</section>
</section>
<section id="implementation-plan">
<h2><a class="toc-backref" href="#implementation-plan" role="doc-backlink">Implementation Plan</a></h2>
<p>A cunning plan has been devised to accomplish this enhancement:</p>
<ol class="arabic simple">
<li>Rename the <code class="docutils literal notranslate"><span class="pre">pgen</span></code> functions to conform to the CPython naming
standards. This action may involve adding some header files to
the <code class="docutils literal notranslate"><span class="pre">Include</span></code> subdirectory.</li>
<li>Move the <code class="docutils literal notranslate"><span class="pre">pgen</span></code> C modules in the Makefile.pre.in from unique <code class="docutils literal notranslate"><span class="pre">pgen</span></code>
elements to the Python C library.</li>
<li>Make any needed changes to the <code class="docutils literal notranslate"><span class="pre">parser</span></code> module so the AST
extension class understands that there are AST types it may not
understand. Cursory examination of the AST extension class
shows that it keeps track of whether the tree is a suite or an
expression.</li>
</ol>
<ol class="arabic simple" start="3">
<li>Code an additional C module in the <code class="docutils literal notranslate"><span class="pre">Modules</span></code> directory. The C
extension module will implement the DFA extension class and the
functions outlined in the previous section.</li>
<li>Add the new module to the build process. Black magic, indeed.</li>
</ol>
</section>
<section id="limitations">
<h2><a class="toc-backref" href="#limitations" role="doc-backlink">Limitations</a></h2>
<p>Under this proposal, would be designers of Python 3000 will still
be constrained to Pythons lexical conventions. The addition,
subtraction or modification of the Python lexer is outside the
scope of this PEP.</p>
</section>
<section id="reference-implementation">
<h2><a class="toc-backref" href="#reference-implementation" role="doc-backlink">Reference Implementation</a></h2>
<p>No reference implementation is currently provided. A patch
was provided at some point in
<a class="reference external" href="http://sourceforge.net/tracker/index.php?func=detail&amp;aid=599331&amp;group_id=5470&amp;atid=305470">http://sourceforge.net/tracker/index.php?func=detail&amp;aid=599331&amp;group_id=5470&amp;atid=305470</a>
but that patch is no longer maintained.</p>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="id6" role="doc-footnote">
<dt class="label" id="id6">[<a href="#id1">1</a>]</dt>
<dd>The (defunct) Python Compiler-SIG
<a class="reference external" href="http://www.python.org/sigs/compiler-sig/">http://www.python.org/sigs/compiler-sig/</a></aside>
<aside class="footnote brackets" id="id7" role="doc-footnote">
<dt class="label" id="id7">[<a href="#id2">2</a>]</dt>
<dd>Parser Module Documentation
<a class="reference external" href="http://docs.python.org/library/parser.html">http://docs.python.org/library/parser.html</a></aside>
<aside class="footnote brackets" id="id8" role="doc-footnote">
<dt class="label" id="id8">[<a href="#id3">3</a>]</dt>
<dd>Hylton, Jeremy.
<a class="reference external" href="http://docs.python.org/library/compiler.html">http://docs.python.org/library/compiler.html</a></aside>
<aside class="footnote brackets" id="id9" role="doc-footnote">
<dt class="label" id="id9">[<a href="#id4">4</a>]</dt>
<dd>Pelletier, Michel. “Python Interface Syntax”, <a class="pep reference internal" href="../pep-0245/" title="PEP 245 Python Interface Syntax">PEP 245</a></aside>
<aside class="footnote brackets" id="id10" role="doc-footnote">
<dt class="label" id="id10">[<a href="#id5">5</a>]</dt>
<dd>The Python Types-SIG
<a class="reference external" href="http://www.python.org/sigs/types-sig/">http://www.python.org/sigs/types-sig/</a></aside>
</aside>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0269.rst">https://github.com/python/peps/blob/main/peps/pep-0269.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0269.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#specification">Specification</a><ul>
<li><a class="reference internal" href="#parsegrammarfile-filename-ast"><code class="docutils literal notranslate"><span class="pre">parseGrammarFile</span> <span class="pre">(fileName)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#parsegrammarstring-text-ast"><code class="docutils literal notranslate"><span class="pre">parseGrammarString</span> <span class="pre">(text)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#buildparser-grammarast-dfa"><code class="docutils literal notranslate"><span class="pre">buildParser</span> <span class="pre">(grammarAst)</span> <span class="pre">-&gt;</span> <span class="pre">DFA</span></code></a></li>
<li><a class="reference internal" href="#parsefile-filename-dfa-start-ast"><code class="docutils literal notranslate"><span class="pre">parseFile</span> <span class="pre">(fileName,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#parsestring-text-dfa-start-ast"><code class="docutils literal notranslate"><span class="pre">parseString</span> <span class="pre">(text,</span> <span class="pre">dfa,</span> <span class="pre">start)</span> <span class="pre">-&gt;</span> <span class="pre">AST</span></code></a></li>
<li><a class="reference internal" href="#symboltostringmap-dfa-dict"><code class="docutils literal notranslate"><span class="pre">symbolToStringMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></li>
<li><a class="reference internal" href="#stringtosymbolmap-dfa-dict"><code class="docutils literal notranslate"><span class="pre">stringToSymbolMap</span> <span class="pre">(dfa)</span> <span class="pre">-&gt;</span> <span class="pre">dict</span></code></a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation-plan">Implementation Plan</a></li>
<li><a class="reference internal" href="#limitations">Limitations</a></li>
<li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0269.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>