diff --git a/pep-0256.txt b/pep-0256.txt new file mode 100644 index 000000000..f0f652ca4 --- /dev/null +++ b/pep-0256.txt @@ -0,0 +1,312 @@ +PEP: 256 +Title: Docstring Processing System Framework +Version: $Revision$ +Last-Modified: $Date$ +Author: dgoodger@bigfoot.com (David Goodger) +Discussions-To: doc-sig@python.org +Status: Draft +Type: Standards Track +Requires: PEP 257 Docstring Conventions + PEP 258 DPS Generic Implementation Details +Created: 01-Jun-2001 +Post-History: + + +Abstract + + Python modules, classes and functions have a string attribute + called __doc__. If the first expression inside the definition is + a literal string, that string is assigned to the __doc__ + attribute, called a documentation string or docstring. It is + often used to summarize the interface of the module, class or + function. + + There is no standard format (markup) for docstrings, nor are there + standard tools for extracting docstrings and transforming them + into useful structured formats (e.g., HTML, DocBook, TeX). Those + tools that do exist are for the most part unmaintained and unused. + The issues surrounding docstring processing have been contentious + and difficult to resolve. + + This PEP proposes a Docstring Processing System (DPS) framework. + It separates out the components (program and conceptual), enabling + the resolution of individual issues either through consensus (one + solution) or through divergence (many). It promotes standard + interfaces which will allow a variety of plug-in components (e.g., + input parsers and output formatters) to be used. + + This PEP presents the concepts of a DPS framework independently of + implementation details. + + +Rationale + + Python lends itself to inline documentation. With its built-in + docstring syntax, a limited form of Literate Programming [2] is + easy to do in Python. However, there are no satisfactory standard + tools for extracting and processing Python docstrings. The lack + of a standard toolset is a significant gap in Python's + infrastructure; this PEP aims to fill the gap. + + There are standard inline documentation systems for some other + languages. For example, Perl has POD (plain old documentation) + and Java has Javadoc, but neither of these mesh with the Pythonic + way. POD is very explicit, but takes after Perl in terms of + readability. Javadoc is HTML-centric; except for '@field' tags, + raw HTML is used for markup. There are also general tools such as + Autoduck and Web (Tangle & Weave), useful for multiple languages. + + There have been many attempts to write autodocumentation systems + for Python (not an exhaustive list): + + - Marc-Andre Lemburg's doc.py [3] + + - Daniel Larsson's pythondoc & gendoc [4] + + - Doug Hellmann's HappyDoc [5] + + - Laurence Tratt's Crystal [6] + + - Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python + standard library; see below) + + - Tony Ibbs' docutils [8] + + These systems, each with different goals, have had varying degrees + of success. A problem with many of the above systems was + over-ambition. They provided a self-contained set of components: a + docstring extraction system, an input parser, an internal + processing system and one or more output formatters. Inevitably, + one or more components had serious shortcomings, preventing the + system from being adopted as a standard tool. + + Throughout the existence of the Python Documentation Special + Interest Group (Doc-SIG) [9], consensus on a single standard + docstring format has never been reached. A lightweight, implicit + markup has been sought, for the following reasons (among others): + + 1. Docstrings written within Python code are available from within + the interactive interpreter, and can be 'print'ed. Thus the + use of plaintext for easy readability. + + 2. Programmers want to add structure to their docstrings, without + sacrificing raw docstring readability. Unadorned plaintext + cannot be transformed ('up-translated') into useful structured + formats. + + 3. Explicit markup (like XML or TeX) has been widely considered + unreadable by the uninitiated. + + 4. Implicit markup is aesthetically compatible with the clean and + minimalist Python syntax. + + Early on, variants of Setext (Structure Enhanced Text) [10], + including Digital Creation's StructuredText [11], were proposed + for Python docstring formatting. Hereafter we will collectively + call these variants 'STexts'. Although used by some (including in + most of the above-listed autodocumentation tools), these markup + schemes have failed to become standard because: + + - STexts have been incomplete: lacking 'essential' constructs that + people want to use in their docstrings, STexts are rendered less + than ideal. Note that these 'essential' constructs are not + universal; everyone has their own requirements. + + - STexts have been sometimes surprising: bits of text are marked + up unexpectedly, leading to user frustration. + + - SText implementations have been buggy. + + - Some STexts have have had no formal specification except for the + implementation itself. A buggy implementation meant a buggy + spec, and vice-versa. + + - There has been no mechanism to get around the SText markup rules + when a markup character is used in a non-markup context. + + Recognizing the deficiencies of STexts, some people have proposed + using explicit markup of some kind. There have been proposals for + using XML, HTML, TeX, POD, and Javadoc at one time or another. + Proponents of STexts have vigorously opposed these proposals, and + the debates have continued off and on for at least five years. + + It has become clear (to this author, at least) that the "all or + nothing" approach cannot succeed, since no all-encompassing + proposal could possibly be agreed upon by all interested parties. + A modular component approach, where components may be multiply + implemented, is the only chance at success. By separating out the + issues, we can form consensus more easily (smaller fights ;-), and + accept divergence more readily. + + Each of the components of a docstring processing system should be + developed independently. A 'best of breed' system should be + chosen and/or developed and eventually included in Python's + standard library. + + +Pydoc & Other Existing Systems + + Pydoc is part of the Python 2.1 standard library. It extracts and + displays docstrings from within the Python interactive + interpreter, from the shell command line, and from a GUI window + into a web browser (HTML). In the case of GUI/HTML, except for + some heuristic hyperlinking of identifier names, no formatting of + the docstrings is done. They are presented within

+ tags to avoid unwanted line wrapping. Unfortunately, the result + is not pretty. + + The functionality proposed in this PEP could be added to or used + by pydoc when serving HTML pages. However, the proposed docstring + processing system's functionality is much more than pydoc needs + (in its current form). Either an independent tool will be + developed (which pydoc may or may not use), or pydoc could be + expanded to encompass this functionality and *become* the + docstring processing system (or one such system). That decision + is beyond the scope of this PEP. + + Similarly for other existing docstring processing systems, their + authors may or may not choose compatibility with this framework. + However, if this framework is accepted and adopted as the Python + standard, compatibility will become an important consideration in + these systems' future. + + +Specification + + The docstring processing system framework consists of components, + as follows:: + + 1. Docstring conventions. Documents issues such as: + + - What should be documented where. + + - First line is a one-line synopsis. + + PEP 257, "Docstring Conventions" [12], documents these issues. + + 2. Docstring processing system generic implementation details. + Documents issues such as: + + - High-level spec: what a DPS does. + + - Command-line interface for executable script. + + - System Python API + + - Docstring extraction rules. + + - Input parser API. + + - Intermediate internal data structure: output from input parser, + input to output formatter. + + - Output formatter API. + + - Output management. + + These issues are applicable to any docstring processing system + implementation. PEP 258, "DPS Generic Implementation Details" + [13], documents these issues. + + 3. Docstring processing system implementation. + + 4. Input markup specifications: docstring syntax. + + 5. Input parser implementations. + + 6. Output formats (HTML, XML, TeX, DocBook, info, etc.). + + 7. Output formatter implementations. + + Components 1, 2, and 3 will be the subject of individual companion + PEPs, although they may be merged into this PEP once consensus is + reached. If there is only one implementation, PEPs for components + 2 & 3 can be combined. Multiple PEPs will be necessary for each + of components 4, 5, 6, and 7. An alternative to the PEP mechanism + may be used instead, since these are not directly related to the + Python language. + + The following diagram shows an overview of the framework. + Interfaces are indicated by double-borders. The ASCII diagram is + very wide; please turn off line wrapping to view it: + + + +========================+ + | Command-Line Interface | + +========================+ + | Executable Script | + +------------------------+ + | + | calls + v + +===========================================+ returns +---------+ + | System Python API |==========>| output | + +--------+ +===========================================+ | objects | + _ writes | Python | reads | Docstring Processing System | +---------+ + / \ ==============>| module |<===========| | + \_/ +--------+ | input | transformation, | output | +--------+ + | +-------------+ follows | docstring | integration, | object | writes | output | + --+-- consults | docstring |<-----------| extraction | linking | management |===========>| files | + | --------->| conventions | +============+=====+=====+=====+============+ +--------+ + / \ +-------------+ | parser API | | formatter API | + / \ +-------------+ +===========+======+ +======+===========+ +--------+ + author consults | markup | implements | input | intermediate | output | implements | output | + --------->| syntax spec |<-----------| parser | data structure | formatter |----------->| format | + +-------------+ +-----------+-------------------+-----------+ +--------+ + + +Project Web Site + + A SourceForge project has been set up for this work at + http://docstring.sf.net. + + +References and Footnotes + + [1] http://python.sf.net/peps/pep-0216.html + + [2] http://www.literateprogramming.com/ + + [3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py + + [4] http://starship.python.net/crew/danilo/pythondoc/ + + [5] http://happydoc.sf.net/ + + [6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html + + [7] http://www.lfw.org/python/ + + [8] http://homepage.ntlworld.com/tibsnjoan/docutils/ + + [9] http://www.python.org/sigs/doc-sig/ + + [10] http://www.bsdi.com/setext/ + + [11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/ + + [12] http://python.sf.net/peps/pep-0257.html + + [13] http://python.sf.net/peps/pep-0258.html + + +Copyright + + This document has been placed in the public domain. + + +Acknowledgements + + This document borrows text from PEP 216 "Docstring Format" by + Moshe Zadka [1]. It is intended as a reorganization of PEP 216 + and its approach. + + This document also borrows ideas from the archives of the Python + Doc-SIG. Thanks to all members past & present. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +End: