313 lines
13 KiB
Plaintext
313 lines
13 KiB
Plaintext
PEP: 256
|
||
Title: Docstring Processing System Framework
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: dgoodger@bigfoot.com (David Goodger)
|
||
Discussions-To: doc-sig@python.org
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Requires: PEP 257 Docstring Conventions
|
||
PEP 258 DPS Generic Implementation Details
|
||
Created: 01-Jun-2001
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
Python modules, classes and functions have a string attribute
|
||
called __doc__. If the first expression inside the definition is
|
||
a literal string, that string is assigned to the __doc__
|
||
attribute, called a documentation string or docstring. It is
|
||
often used to summarize the interface of the module, class or
|
||
function.
|
||
|
||
There is no standard format (markup) for docstrings, nor are there
|
||
standard tools for extracting docstrings and transforming them
|
||
into useful structured formats (e.g., HTML, DocBook, TeX). Those
|
||
tools that do exist are for the most part unmaintained and unused.
|
||
The issues surrounding docstring processing have been contentious
|
||
and difficult to resolve.
|
||
|
||
This PEP proposes a Docstring Processing System (DPS) framework.
|
||
It separates out the components (program and conceptual), enabling
|
||
the resolution of individual issues either through consensus (one
|
||
solution) or through divergence (many). It promotes standard
|
||
interfaces which will allow a variety of plug-in components (e.g.,
|
||
input parsers and output formatters) to be used.
|
||
|
||
This PEP presents the concepts of a DPS framework independently of
|
||
implementation details.
|
||
|
||
|
||
Rationale
|
||
|
||
Python lends itself to inline documentation. With its built-in
|
||
docstring syntax, a limited form of Literate Programming [2] is
|
||
easy to do in Python. However, there are no satisfactory standard
|
||
tools for extracting and processing Python docstrings. The lack
|
||
of a standard toolset is a significant gap in Python's
|
||
infrastructure; this PEP aims to fill the gap.
|
||
|
||
There are standard inline documentation systems for some other
|
||
languages. For example, Perl has POD (plain old documentation)
|
||
and Java has Javadoc, but neither of these mesh with the Pythonic
|
||
way. POD is very explicit, but takes after Perl in terms of
|
||
readability. Javadoc is HTML-centric; except for '@field' tags,
|
||
raw HTML is used for markup. There are also general tools such as
|
||
Autoduck and Web (Tangle & Weave), useful for multiple languages.
|
||
|
||
There have been many attempts to write autodocumentation systems
|
||
for Python (not an exhaustive list):
|
||
|
||
- Marc-Andre Lemburg's doc.py [3]
|
||
|
||
- Daniel Larsson's pythondoc & gendoc [4]
|
||
|
||
- Doug Hellmann's HappyDoc [5]
|
||
|
||
- Laurence Tratt's Crystal [6]
|
||
|
||
- Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python
|
||
standard library; see below)
|
||
|
||
- Tony Ibbs' docutils [8]
|
||
|
||
These systems, each with different goals, have had varying degrees
|
||
of success. A problem with many of the above systems was
|
||
over-ambition. They provided a self-contained set of components: a
|
||
docstring extraction system, an input parser, an internal
|
||
processing system and one or more output formatters. Inevitably,
|
||
one or more components had serious shortcomings, preventing the
|
||
system from being adopted as a standard tool.
|
||
|
||
Throughout the existence of the Python Documentation Special
|
||
Interest Group (Doc-SIG) [9], consensus on a single standard
|
||
docstring format has never been reached. A lightweight, implicit
|
||
markup has been sought, for the following reasons (among others):
|
||
|
||
1. Docstrings written within Python code are available from within
|
||
the interactive interpreter, and can be 'print'ed. Thus the
|
||
use of plaintext for easy readability.
|
||
|
||
2. Programmers want to add structure to their docstrings, without
|
||
sacrificing raw docstring readability. Unadorned plaintext
|
||
cannot be transformed ('up-translated') into useful structured
|
||
formats.
|
||
|
||
3. Explicit markup (like XML or TeX) has been widely considered
|
||
unreadable by the uninitiated.
|
||
|
||
4. Implicit markup is aesthetically compatible with the clean and
|
||
minimalist Python syntax.
|
||
|
||
Early on, variants of Setext (Structure Enhanced Text) [10],
|
||
including Digital Creation's StructuredText [11], were proposed
|
||
for Python docstring formatting. Hereafter we will collectively
|
||
call these variants 'STexts'. Although used by some (including in
|
||
most of the above-listed autodocumentation tools), these markup
|
||
schemes have failed to become standard because:
|
||
|
||
- STexts have been incomplete: lacking 'essential' constructs that
|
||
people want to use in their docstrings, STexts are rendered less
|
||
than ideal. Note that these 'essential' constructs are not
|
||
universal; everyone has their own requirements.
|
||
|
||
- STexts have been sometimes surprising: bits of text are marked
|
||
up unexpectedly, leading to user frustration.
|
||
|
||
- SText implementations have been buggy.
|
||
|
||
- Some STexts have have had no formal specification except for the
|
||
implementation itself. A buggy implementation meant a buggy
|
||
spec, and vice-versa.
|
||
|
||
- There has been no mechanism to get around the SText markup rules
|
||
when a markup character is used in a non-markup context.
|
||
|
||
Recognizing the deficiencies of STexts, some people have proposed
|
||
using explicit markup of some kind. There have been proposals for
|
||
using XML, HTML, TeX, POD, and Javadoc at one time or another.
|
||
Proponents of STexts have vigorously opposed these proposals, and
|
||
the debates have continued off and on for at least five years.
|
||
|
||
It has become clear (to this author, at least) that the "all or
|
||
nothing" approach cannot succeed, since no all-encompassing
|
||
proposal could possibly be agreed upon by all interested parties.
|
||
A modular component approach, where components may be multiply
|
||
implemented, is the only chance at success. By separating out the
|
||
issues, we can form consensus more easily (smaller fights ;-), and
|
||
accept divergence more readily.
|
||
|
||
Each of the components of a docstring processing system should be
|
||
developed independently. A 'best of breed' system should be
|
||
chosen and/or developed and eventually included in Python's
|
||
standard library.
|
||
|
||
|
||
Pydoc & Other Existing Systems
|
||
|
||
Pydoc is part of the Python 2.1 standard library. It extracts and
|
||
displays docstrings from within the Python interactive
|
||
interpreter, from the shell command line, and from a GUI window
|
||
into a web browser (HTML). In the case of GUI/HTML, except for
|
||
some heuristic hyperlinking of identifier names, no formatting of
|
||
the docstrings is done. They are presented within <p><small><tt>
|
||
tags to avoid unwanted line wrapping. Unfortunately, the result
|
||
is not pretty.
|
||
|
||
The functionality proposed in this PEP could be added to or used
|
||
by pydoc when serving HTML pages. However, the proposed docstring
|
||
processing system's functionality is much more than pydoc needs
|
||
(in its current form). Either an independent tool will be
|
||
developed (which pydoc may or may not use), or pydoc could be
|
||
expanded to encompass this functionality and *become* the
|
||
docstring processing system (or one such system). That decision
|
||
is beyond the scope of this PEP.
|
||
|
||
Similarly for other existing docstring processing systems, their
|
||
authors may or may not choose compatibility with this framework.
|
||
However, if this framework is accepted and adopted as the Python
|
||
standard, compatibility will become an important consideration in
|
||
these systems' future.
|
||
|
||
|
||
Specification
|
||
|
||
The docstring processing system framework consists of components,
|
||
as follows::
|
||
|
||
1. Docstring conventions. Documents issues such as:
|
||
|
||
- What should be documented where.
|
||
|
||
- First line is a one-line synopsis.
|
||
|
||
PEP 257, "Docstring Conventions" [12], documents these issues.
|
||
|
||
2. Docstring processing system generic implementation details.
|
||
Documents issues such as:
|
||
|
||
- High-level spec: what a DPS does.
|
||
|
||
- Command-line interface for executable script.
|
||
|
||
- System Python API
|
||
|
||
- Docstring extraction rules.
|
||
|
||
- Input parser API.
|
||
|
||
- Intermediate internal data structure: output from input parser,
|
||
input to output formatter.
|
||
|
||
- Output formatter API.
|
||
|
||
- Output management.
|
||
|
||
These issues are applicable to any docstring processing system
|
||
implementation. PEP 258, "DPS Generic Implementation Details"
|
||
[13], documents these issues.
|
||
|
||
3. Docstring processing system implementation.
|
||
|
||
4. Input markup specifications: docstring syntax.
|
||
|
||
5. Input parser implementations.
|
||
|
||
6. Output formats (HTML, XML, TeX, DocBook, info, etc.).
|
||
|
||
7. Output formatter implementations.
|
||
|
||
Components 1, 2, and 3 will be the subject of individual companion
|
||
PEPs, although they may be merged into this PEP once consensus is
|
||
reached. If there is only one implementation, PEPs for components
|
||
2 & 3 can be combined. Multiple PEPs will be necessary for each
|
||
of components 4, 5, 6, and 7. An alternative to the PEP mechanism
|
||
may be used instead, since these are not directly related to the
|
||
Python language.
|
||
|
||
The following diagram shows an overview of the framework.
|
||
Interfaces are indicated by double-borders. The ASCII diagram is
|
||
very wide; please turn off line wrapping to view it:
|
||
|
||
|
||
+========================+
|
||
| Command-Line Interface |
|
||
+========================+
|
||
| Executable Script |
|
||
+------------------------+
|
||
|
|
||
| calls
|
||
v
|
||
+===========================================+ returns +---------+
|
||
| System Python API |==========>| output |
|
||
+--------+ +===========================================+ | objects |
|
||
_ writes | Python | reads | Docstring Processing System | +---------+
|
||
/ \ ==============>| module |<===========| |
|
||
\_/ +--------+ | input | transformation, | output | +--------+
|
||
| +-------------+ follows | docstring | integration, | object | writes | output |
|
||
--+-- consults | docstring |<-----------| extraction | linking | management |===========>| files |
|
||
| --------->| conventions | +============+=====+=====+=====+============+ +--------+
|
||
/ \ +-------------+ | parser API | | formatter API |
|
||
/ \ +-------------+ +===========+======+ +======+===========+ +--------+
|
||
author consults | markup | implements | input | intermediate | output | implements | output |
|
||
--------->| syntax spec |<-----------| parser | data structure | formatter |----------->| format |
|
||
+-------------+ +-----------+-------------------+-----------+ +--------+
|
||
|
||
|
||
Project Web Site
|
||
|
||
A SourceForge project has been set up for this work at
|
||
http://docstring.sf.net.
|
||
|
||
|
||
References and Footnotes
|
||
|
||
[1] http://python.sf.net/peps/pep-0216.html
|
||
|
||
[2] http://www.literateprogramming.com/
|
||
|
||
[3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py
|
||
|
||
[4] http://starship.python.net/crew/danilo/pythondoc/
|
||
|
||
[5] http://happydoc.sf.net/
|
||
|
||
[6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html
|
||
|
||
[7] http://www.lfw.org/python/
|
||
|
||
[8] http://homepage.ntlworld.com/tibsnjoan/docutils/
|
||
|
||
[9] http://www.python.org/sigs/doc-sig/
|
||
|
||
[10] http://www.bsdi.com/setext/
|
||
|
||
[11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/
|
||
|
||
[12] http://python.sf.net/peps/pep-0257.html
|
||
|
||
[13] http://python.sf.net/peps/pep-0258.html
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Acknowledgements
|
||
|
||
This document borrows text from PEP 216 "Docstring Format" by
|
||
Moshe Zadka [1]. It is intended as a reorganization of PEP 216
|
||
and its approach.
|
||
|
||
This document also borrows ideas from the archives of the Python
|
||
Doc-SIG. Thanks to all members past & present.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|