python-peps/pep-0287.txt

680 lines
27 KiB
Plaintext
Raw Normal View History

PEP: 287
Title: reStructuredText Standard Docstring Format
Version: $Revision$
Last-Modified: $Date$
Author: goodger@users.sourceforge.net (David Goodger)
Discussions-To: doc-sig@python.org
Status: Draft
Type: Informational
Created: 25-Mar-2002
Post-History:
Replaces: 216
Abstract
This PEP proposes that the reStructuredText [1]_ markup be adopted
as the standard markup format for plaintext documentation in
Python docstrings, and (optionally) for PEPs and ancillary
documents as well. reStructuredText is a rich and extensible yet
easy-to-read, what-you-see-is-what-you-get plaintext markup
syntax.
Only the low-level syntax of docstrings is addressed here. This
PEP is not concerned with docstring semantics or processing at
all.
Goals
These are the generally accepted goals for a docstring format, as
discussed in the Python Documentation Special Interest Group
(Doc-SIG) [2]_:
1. It must be easy to type with any standard text editor.
2. It must be readable to the casual observer.
3. It must not need to contain information which can be deduced
from parsing the module.
4. It must contain sufficient information (structure) so it can be
converted to any reasonable markup format.
5. It must be possible to write a module's entire documentation in
docstrings, without feeling hampered by the markup language.
[[Are these in fact the goals of the Doc-SIG members? Anything to
add?]]
reStructuredText meets and exceeds all of these goals, and sets
its own goals as well, even more stringent. See "Features" below.
The goals of this PEP are as follows:
1. To establish a standard docstring format by attaining
"accepted" status (Python community consensus; BDFL
pronouncement). Once reStructuredText is a Python standard,
all effort can be focused on tools instead of arguing for a
standard. Python needs a standard set of documentation tools.
2. To address any related concerns raised by the Python community.
3. To encourage community support. As long as multiple competing
markups are out there, the development community remains
fractured. Once a standard exists, people will start to use
it, and momentum will inevitably gather.
4. To consolidate efforts from related auto-documentation
projects. It is hoped that interested developers will join
forces and work on a joint/merged/common implementation.
5. (Optional.) To adopt reStructuredText as the standard markup
for PEPs. One or both of the following strategies may be
applied:
a) Keep the existing PEP section structure constructs (one-line
section headers, indented body text). Subsections can
either be forbidden or supported with underlined headers in
the indented body text.
b) Replace the PEP section structure constructs with the
reStructuredText syntax. Section headers will require
underlines, subsections will be supported out of the box,
and body text need not be indented (except for block
quotes).
Support for RFC 2822 headers will be added to the
reStructuredText parser (unambiguous given a specific context:
the first contiguous block of a PEP document). It may be
desired to concretely specify what over/underline styles are
allowed for PEP section headers, for uniformity.
6. (Optional.) To adopt reStructuredText as the standard markup
for README-type files and other standalone documents in the
Python distribution.
Rationale
The __doc__ attribute is called a documentation string, or
docstring. It is often used to summarize the interface of the
module, class or function. The lack of a standard syntax for
docstrings has hampered the development of standard tools for
extracting docstrings and transforming them into documentation in
standard formats (e.g., HTML, DocBook, TeX). There have been a
number of proposed markup formats and variations, and many tools
tied to these proposals, but without a standard docstring format
they have failed to gain a strong following and/or floundered
half-finished.
The adoption of a standard will, at the very least, benefit
docstring processing tools by preventing further "reinventing the
wheel".
Throughout the existence of the Doc-SIG, consensus on a single
standard docstring format has never been reached. A lightweight,
implicit markup has been sought, for the following reasons (among
others):
1. Docstrings written within Python code are available from within
the interactive interpreter, and can be 'print'ed. Thus the
use of plaintext for easy readability.
2. Programmers want to add structure to their docstrings, without
sacrificing raw docstring readability. Unadorned plaintext
cannot be transformed ('up-translated') into useful structured
formats.
3. Explicit markup (like XML or TeX) is widely considered
unreadable by the uninitiated.
4. Implicit markup is aesthetically compatible with the clean and
minimalist Python syntax.
Proposed alternatives have included:
- XML [3]_, SGML [4]_, DocBook [5]_, HTML [6]_, XHTML [7]_
XML and SGML are explicit, well-formed meta-languages suitable
for all kinds of documentation. XML is a variant of SGML. They
are best used behind the scenes, because they are verbose,
difficult to type, and too cluttered to read comfortably as
source. DocBook, HTML, and XHTML are all applications of SGML
and/or XML, and all share the same basic syntax and the same
shortcomings.
- TeX [8]_
TeX is similar to XML/SGML in that it's explicit, not very easy
to write, and not easy for the uninitiated to read.
- Perl POD [9]_
Most Perl modules are documented in a format called POD -- Plain
Old Documentation. This is an easy-to-type, very low level
format with strong integration with the Perl parser. Many tools
exist to turn POD documentation into other formats: info, HTML
and man pages, among others. However, the POD syntax takes
after Perl itself in terms of readability.
- JavaDoc [10]_
Special comments before Java classes and functions serve to
document the code. A program to extract these, and turn them
into HTML documentation is called javadoc, and is part of the
standard Java distribution. However, the only output format
that is supported is HTML, and JavaDoc has a very intimate
relationship with HTML, using HTML tags for most markup. Thus
it shares the readability problems of HTML.
- Setext [11]_, StructuredText [12]_
Early on, variants of Setext (Structure Enhanced Text),
including Zope Corp's StructuredText, were proposed for Python
docstring formatting. Hereafter these variants will
collectively be call 'STexts'. STexts have the advantage of
being easy to read without special knowledge, and relatively
easy to write.
Although used by some (including in most existing Python
auto-documentation tools), until now STexts have failed to
become standard because:
- STexts have been incomplete. Lacking "essential" constructs
that people want to use in their docstrings, STexts are
rendered less than ideal. Note that these "essential"
constructs are not universal; everyone has their own
requirements.
- STexts have been sometimes surprising. Bits of text are
marked up unexpectedly, leading to user frustration.
- SText implementations have been buggy.
- Most STexts have have had no formal specification except for
the implementation itself. A buggy implementation meant a
buggy spec, and vice-versa.
- There has been no mechanism to get around the SText markup
rules when a markup character is used in a non-markup context.
Proponents of implicit STexts have vigorously opposed proposals
for explicit markup (XML, HTML, TeX, POD, etc.), and the debates
have continued off and on since 1996 or earlier.
reStructuredText is a complete revision and reinterpretation of
the SText idea, addressing all of the problems listed above.
Features
Rather than repeating or summarizing the extensive
reStructuredText spec, please read the originals available from
http://structuredtext.sourceforge.net/spec/ (.txt & .html files).
Reading the documents in following order is recommended:
- An Introduction to reStructuredText [13]_
- Problems With StructuredText [14]_ (optional, if you've used
StructuredText; it explains many markup decisions made)
- reStructuredText Markup Specification [15]_
- A Record of reStructuredText Syntax Alternatives [16]_ (explains
markup decisions made independently of StructuredText)
- reStructuredText Directives [17]_
There is also a "Quick reStructuredText" user reference [18]_.
A summary of features addressing often-raised docstring markup
concerns follows:
- A markup escaping mechanism.
Backslashes (``\``) are used to escape markup characters when
needed for non-markup purposes. However, the inline markup
recognition rules have been constructed in order to minimize the
need for backslash-escapes. For example, although asterisks are
used for *emphasis*, in non-markup contexts such as "*" or "(*)"
or "x * y", the asterisks are not interpreted as markup and are
left unchanged. For many non-markup uses of backslashes (e.g.,
describing regular expressions), inline literals or literal
blocks are applicable; see the next item.
- Markup to include Python source code and Python interactive
sessions: inline literals, literal blocks, and doctest blocks.
Inline literals use ``double-backquotes`` to indicate program
I/O or code snippets. No markup interpretation (including
backslash-escape [``\``] interpretation) is done within inline
literals.
Literal blocks (block-level literal text, such as code excerpts
or ASCII graphics) are indented, and indicated with a
double-colon ("::") at the end of the preceding paragraph (right
here -->)::
if literal_block:
text = 'is left as-is'
spaces_and_linebreaks = 'are preserved'
markup_processing = None
Doctest blocks begin with ">>> " and end with a blank line.
Neither indentation nor literal block double-colons are
required. For example::
Here's a doctest block:
>>> print 'Python-specific usage examples; begun with ">>>"'
Python-specific usage examples; begun with ">>>"
>>> print '(cut and pasted from interactive sessions)'
(cut and pasted from interactive sessions)
- Markup that isolates a Python identifier: interpreted text.
Text enclosed in single backquotes is recognized as "interpreted
text", whose interpretation is application-dependent. In the
context of a Python docstring, the default interpretation of
interpreted text is as Python identifiers. The text will be
marked up with a hyperlink connected to the documentation for
the identifier given. Lookup rules are the same as in Python
itself: LGB namespace lookups (local, global, builtin). The
"role" of the interpreted text (identifying a class, module,
function, etc.) is determined implicitly from the namespace
lookup. For example::
class Keeper(Storer):
"""
Extend `Storer`. Class attribute `instances` keeps track
of the number of `Keeper` objects instantiated.
"""
instances = 0
"""How many `Keeper` objects are there?"""
def __init__(self):
"""
Extend `Storer.__init__()` to keep track of
instances. Keep count in `self.instances` and data
in `self.data`.
"""
Storer.__init__(self)
self.instances += 1
self.data = []
"""Store data in a list, most recent last."""
def storedata(self, data):
"""
Extend `Storer.storedata()`; append new `data` to a
list (in `self.data`).
"""
self.data = data
Each piece of interpreted text is looked up according to the
local namespace of the block containing its docstring.
- Markup that isolates a Python identifier and specifies its type:
interpreted text with roles.
Although the Python source context reader is designed not to
require explicit roles, they may be used. To classify
identifiers explicitly, the role is given along with the
identifier in either prefix or suffix form::
Use :method:`Keeper.storedata` to store the object's data in
`Keeper.data`:instance_attribute:.
The syntax chosen for roles is verbose, but necessarily so (if
anyone has a better alternative, please post it to the Doc-SIG).
The intention of the markup is that there should be little need
to use explicit roles; their use is to be kept to an absolute
minimum.
- Markup for "tagged lists" or "label lists": field lists.
Field lists represent a mapping from field name to field body.
These are mostly used for extension syntax, such as
"bibliographic field lists" (representing document metadata such
as author, date, and version) and extension attributes for
directives (see below). They may be used to implement docstring
semantics, such as identifying parameters, exceptions raised,
etc.; such usage is beyond the scope of this PEP.
A modified RFC 2822 syntax is used, with a colon *before* as
well as *after* the field name. Field bodies are more versatile
as well; they may contain multiple field bodies (even nested
field lists). For example::
:Date: 2002-03-22
:Version: 1
:Authors:
- Me
- Myself
- I
Standard RFC 2822 header syntax cannot be used for this
construct because it is ambiguous. A word followed by a colon
at the beginning of a line is common in written text. However,
with the addition of a well-defined context, such as when a
field list invariably occurs at the beginning of a document
(e.g., PEPs and email messages), standard RFC 2822 header syntax
can be used.
- Markup extensibility: directives and substitutions.
Directives are used as an extension mechanism for
reStructuredText, a way of adding support for new block-level
constructs without adding new syntax. Directives for images,
admonitions (note, caution, etc.), and tables of contents
generation (among others) have been implemented. For example,
here's how to place an image::
.. image:: mylogo.png
Substitution definitions allow the power and flexibility of
block-level directives to be shared by inline text. For
example::
The |biohazard| symbol must be used on containers used to
dispose of medical waste.
.. |biohazard| image:: biohazard.png
- Section structure markup.
Section headers in reStructuredText use adornment via underlines
(and possibly overlines) rather than indentation. For example::
This is a Section Title
=======================
This is a Subsection Title
--------------------------
This paragraph is in the subsection.
This is Another Section Title
=============================
This paragraph is in the second section.
Questions & Answers
Q: Is reStructuredText rich enough?
A: Yes, it is for most people. If it lacks some construct that is
require for a specific application, it can be added via the
directive mechanism. If a common construct has been
overlooked and a suitably readable syntax can be found, it can
be added to the specification and parser.
Q: Is reStructuredText *too* rich?
A: No.
Since the very beginning, whenever a markup syntax has been
proposed on the Doc-SIG, someone has complained about the lack
of support for some construct or other. The reply was often
something like, "These are docstrings we're talking about, and
docstrings shouldn't have complex markup." The problem is that
a construct that seems superfluous to one person may be
absolutely essential to another.
reStructuredText takes the opposite approach: it provides a
rich set of implicit markup constructs (plus a generic
extension mechanism for explicit markup), allowing for all
kinds of documents. If the set of constructs is too rich for a
particular application, the unused constructs can either be
removed from the parser (via application-specific overrides) or
simply omitted by convention.
Q: Why not use indentation for section structure, like
StructuredText does? Isn't it more "Pythonic"?
A: Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG
post:
I still think that using indentation to indicate sectioning
is wrong. If you look at how real books and other print
publications are laid out, you'll notice that indentation
is used frequently, but mostly at the intra-section level.
Indentation can be used to offset lists, tables,
quotations, examples, and the like. (The argument that
docstrings are different because they are input for a text
formatter is wrong: the whole point is that they are also
readable without processing.)
I reject the argument that using indentation is Pythonic:
text is not code, and different traditions and conventions
hold. People have been presenting text for readability for
over 30 centuries. Let's not innovate needlessly.
See "Section Structure via Indentation" in "Problems With
StructuredText" [14 ]_ for further elaboration.
Q: Why use reStructuredText for PEPs? What's wrong with the
existing standard?
A: The existing standard for PEPs is very limited in terms of
general expressibility, and referencing is especially lacking
for such a reference-rich document type. PEPs are currently
converted into HTML, but the results (mostly monospaced text)
are less than attractive, and most of the value-added potential
of HTML is untapped.
Making reStructuredText the standard markup for PEPs will
enable much richer expression, including support for section
structure, inline markup, graphics, and tables. In several
PEPs there are ASCII graphics diagrams, which are all that
plaintext documents can support. Since PEPs are made available
in HTML form, the ability to include proper diagrams would be
immediately useful.
Current PEP practices allow for reference markers in the form
"[1]" in the text, and the footnotes/references themselves are
listed in a section toward the end of the document. There is
currently no hyperlinking between the reference marker and the
footnote/reference itself (it would be possible to add this to
pep2html.py, but the "markup" as it stands is ambiguous and
mistakes would be inevitable). A PEP with many references
(such as this one ;-) requires a lot of flipping back and
forth. When revising a PEP, often new references are added or
unused references deleted. It is painful to renumber the
references, since it has to be done in two places and can have
a cascading effect (insert a single new reference 1, and every
other reference has to be renumbered; always adding new
references to the end is suboptimal). It is easy for
references to go out of sync.
PEPs use references for two purposes: simple URL references and
footnotes. reStructuredText differentiates between the two. A
PEP might contain references like this::
Abstract
This PEP proposes a adding frungible doodads [1] to the
core. It extends PEP 9876 [2] via the BCA [3]
mechanism.
References and Footnotes
[1] http://www.doodads.org/frungible.html
[2] PEP 9876, Let's Hope We Never Get Here
http://www.python.org/peps/pep-9876.html
[3] "Bogus Complexity Addition"
Reference 1 is a simple URL reference. Reference 2 is a
footnote containing text and a URL. Reference 3 is a footnote
containing text only. Rewritten using reStructuredText, this
PEP could look like this::
Abstract
========
This PEP proposes a adding `frungible doodads`_ to the
core. It extends PEP 9876 [#pep9876] via the BCA [#]
mechanism.
.. _frungible doodads:
http://www.doodads.org/frungible.html
.. [#pep9876] `PEP 9876`__, Let's Hope We Never Get Here
__ http://www.python.org/peps/pep-9876.html
.. [#] "Bogus Complexity Addition"
URLs and footnotes can be defined close to their references if
desired, making them easier to read in the source text, and
making the PEPs easier to revise. The "References and
Footnotes" section can be auto-generated with a document tree
transform. Footnotes from throughout the PEP would be gathered
and displayed under a standard header. If URL references
should likewise be written out explicitly (in citation form),
another tree transform could be used.
URL references can be named ("frungible doodads"), and can be
referenced from multiple places in the document without
additional definitions. When converted to HTML, references
will be replaced with inline hyperlinks (HTML <A> tags). The
two footnotes are automatically numbered, so they will always
stay in sync. The first footnote also contains an internal
reference name, "pep9876", so it's easier to see the connection
between reference and footnote in the source text. Named
footnotes can be referenced multiple times, maintaining
consistent numbering.
The "#pep9876" footnote could also be written in the form of a
citation::
It extends PEP 9876 [PEP9876]_ ...
.. [PEP9876] `PEP 9876`_, Let's Hope We Never Get Here
Footnotes are numbered, whereas citations use text for their
references.
Q: Wouldn't it be better to keep the docstring and PEP proposals
separate?
A: The PEP markup proposal is an option to this PEP. It may be
removed if it is deemed that there is no need for PEP markup.
The PEP markup proposal could be made into a separate PEP if
necessary. If accepted, PEP 1, PEP Purpose and Guidelines [19]_,
and PEP 9, Sample PEP Template [20]_ will be updated.
It seems natural to adopt a single consistent markup standard
for all uses of plaintext in Python.
Q: The existing pep2html.py script converts the existing PEP
format to HTML. How will the new-format PEPs be converted to
HTML?
A: One of the deliverables of the Docutils project [21]_ will be a
new version of pep2html.py with integrated reStructuredText
parsing. The Docutils project will support PEPs with a "PEP
Reader" component, including all functionality currently in
pep2html.py (auto-recognition of PEP & RFC references).
Q: Who's going to convert the existing PEPs to reStructuredText?
A: A call for volunteers will be put out to the Doc-SIG and
greater Python communities. If insufficient volunteers are
forthcoming, I (David Goodger) will convert the documents
myself, perhaps with some level of automation. A transitional
system whereby both old and new standards can coexist will be
easy to implement (and I pledge to implement it if necessary).
Q: Why use reStructuredText for README and other ancillary files?
A: The same reasoning used for PEPs above applies to README and
other ancillary files. By adopting a standard markup, these
files can be converted to attractive cross-referenced HTML and
put up on python.org. Developers of Python projects can also
take advantage of this facility for their own documentation.
References and Footnotes
[1] http://structuredtext.sourceforge.net/
[2] http://www.python.org/sigs/doc-sig/
[3] http://www.w3.org/XML/
[4] http://www.oasis-open.org/cover/general.html
[5] http://docbook.org/tdg/en/html/docbook.html
[6] http://www.w3.org/MarkUp/
[7] http://www.w3.org/MarkUp/#xhtml1
[8] http://www.tug.org/interest.html
[9] http://www.perldoc.com/perl5.6/pod/perlpod.html
[10] http://java.sun.com/j2se/javadoc/
[11] http://docutils.sourceforge.net/mirror/setext.html
[12] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
[13] An Introduction to reStructuredText
http://structuredtext.sourceforge.net/spec/introduction.txt
[14] Problems with StructuredText
http://structuredtext.sourceforge.net/spec/problems.txt
[15] reStructuredText Markup Specification
http://structuredtext.sourceforge.net/spec/reStructuredText.txt
[16] A Record of reStructuredText Syntax Alternatives
http://structuredtext.sourceforge.net/spec/alternatives.txt
[17] reStructuredText Directives
http://structuredtext.sourceforge.net/spec/directives.txt
[18] Quick reStructuredText
http://structuredtext.sourceforge.net/docs/quickref.html
[19] PEP 1, PEP Guidelines, Warsaw, Hylton
http://www.python.org/peps/pep-0001.html
[20] PEP 9, Sample PEP Template, Warsaw
http://www.python.org/peps/pep-0009.html
[21] http://docutils.sourceforge.net/
[22] PEP 216, Docstring Format, Zadka
http://www.python.org/peps/pep-0216.html
Copyright
This document has been placed in the public domain.
Acknowledgements
Some text is borrowed from PEP 216, Docstring Format, by Moshe
Zadka [22]_.
Special thanks to all members past & present of the Python Doc-SIG.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: