930 lines
32 KiB
Plaintext
930 lines
32 KiB
Plaintext
PEP: 258
|
||
Title: Docutils Design Specification
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: David Goodger <goodger@users.sourceforge.net>
|
||
Discussions-To: <doc-sig@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Requires: 256, 257
|
||
Created: 31-May-2001
|
||
Post-History: 13-Jun-2001
|
||
|
||
|
||
==========
|
||
Abstract
|
||
==========
|
||
|
||
This PEP documents design issues and implementation details for
|
||
Docutils, a Python Docstring Processing System (DPS). The rationale
|
||
and high-level concepts of a DPS are documented in PEP 256, "Docstring
|
||
Processing System Framework" [#PEP-256]_. Also see PEP 256 for a
|
||
"Roadmap to the Doctring PEPs".
|
||
|
||
Docutils is being designed modularly so that any of its components can
|
||
be replaced easily. In addition, Docutils is not limited to the
|
||
processing of Python docstrings; it processes standalone documents as
|
||
well, in several contexts.
|
||
|
||
No changes to the core Python language are required by this PEP. Its
|
||
deliverables consist of a package for the standard library and its
|
||
documentation.
|
||
|
||
|
||
===============
|
||
Specification
|
||
===============
|
||
|
||
Docutils Project Model
|
||
======================
|
||
|
||
::
|
||
|
||
+--------------------------+
|
||
| Docutils: |
|
||
| docutils.core.Publisher, |
|
||
| docutils.core.publish() |
|
||
+--------------------------+
|
||
/ \
|
||
/ \
|
||
1,3,5,7 / \ 8,10
|
||
+--------+ +--------+
|
||
| READER | =========================> | WRITER |
|
||
+--------+ +--------+
|
||
/ || \ / \
|
||
/ || \ / \
|
||
2 / 4 || \ 6 9 / \ 11
|
||
+-----+ +--------+ +-------------+ +------------+ +-----+
|
||
| I/O | | PARSER |...| reader | | writer | | I/O |
|
||
+-----+ +--------+ | transforms | | transforms | +-----+
|
||
| | | |
|
||
| - docinfo | | - system |
|
||
| - titles | | messages |
|
||
| - linking | | - final |
|
||
| - lookups | | checks |
|
||
| - reader- | | - writer- |
|
||
| specific | | specific |
|
||
| - parser- | | - etc. |
|
||
| specific | +------------+
|
||
| - layout |
|
||
| (stylist) |
|
||
| - etc. |
|
||
+-------------+
|
||
|
||
The numbers indicate the path a document's data takes through the
|
||
code. Double-width lines between reader & parser and between reader &
|
||
writer indicate that data sent along these paths should be standard
|
||
(pure & unextended) Docutils doc trees. Single-width lines signify
|
||
that internal tree extensions or completely unrelated representations
|
||
are possible, but they must be supported at both ends.
|
||
|
||
|
||
Publisher
|
||
---------
|
||
|
||
The ``docutils.core`` module contains a "Publisher" facade class and
|
||
"publish" convenience function. Publisher encapsulates the high-level
|
||
logic of a Docutils system. The ``Publisher.publish()`` method first
|
||
calls its Reader, which reads data from its source I/O, parses and
|
||
transforms the data, and returns it. ``Publisher.publish()`` then
|
||
passes the resulting document tree to its Writer, which further
|
||
transforms the document before translating it to the final output
|
||
format and writing the formatted data to its destination I/O.
|
||
|
||
Calling the "publish" function (or instantiating a "Publisher" object)
|
||
with component names will result in default behavior. For custom
|
||
behavior (setting component options), create custom component objects
|
||
first, and pass *them* to publish/Publisher.
|
||
|
||
|
||
Readers
|
||
-------
|
||
|
||
Readers understand the input context (where the data is coming from),
|
||
send the whole input or discrete "chunks" to the parser, and provide
|
||
the context to bind the chunks together back into a cohesive whole.
|
||
Using transforms_, Readers also resolve references, footnote numbers,
|
||
interpreted text processing, and anything else that requires
|
||
context-sensitive computation.
|
||
|
||
Each reader is a module or package exporting a "Reader" class with a
|
||
"read" method. The base "Reader" class can be found in the
|
||
``docutils/readers/__init__.py`` module.
|
||
|
||
Most Readers will have to be told what parser to use. So far (see the
|
||
list of examples below), only the Python Source Reader ("PySource";
|
||
still incomplete) will be able to determine the parser on its own.
|
||
|
||
Responsibilities:
|
||
|
||
- Get input text from the source I/O.
|
||
|
||
- Pass the input text to the parser, along with a fresh doctree root.
|
||
|
||
- Run transforms over the doctree(s).
|
||
|
||
Examples:
|
||
|
||
- Standalone (Raw/Plain): Just read a text file and process it.
|
||
The reader needs to be told which parser to use.
|
||
|
||
The "Standalone Reader" has been implemented in module
|
||
``docutils.readers.standalone``.
|
||
|
||
- Python Source: See `Python Source Reader`_ below. This Reader is
|
||
currently in development in the Docutils sandbox.
|
||
|
||
- Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
|
||
|
||
- PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to URIs.
|
||
Either interpret PEPs' indented sections or convert existing PEPs to
|
||
reStructuredText (or both?).
|
||
|
||
The "PEP Reader" is being implemented in module
|
||
``docutils.readers.pep``.
|
||
|
||
- Wiki: Global reference lookups of "wiki links" incorporated into
|
||
transforms. (CamelCase only or unrestricted?) Lazy
|
||
indentation?
|
||
|
||
- Web Page: As standalone, but recognize meta fields as meta tags.
|
||
Support for templates of some sort? (After ``<body>``, before
|
||
``</body>``?)
|
||
|
||
- FAQ: Structured "question & answer(s)" constructs.
|
||
|
||
- Compound document: Merge chapters into a book. Master TOC file?
|
||
|
||
|
||
Parsers
|
||
-------
|
||
|
||
Parsers analyze their input and produce a Docutils `document tree`_.
|
||
They don't know or care anything about the source or destination of
|
||
the data.
|
||
|
||
Each input parser is a module or package exporting a "Parser" class
|
||
with a "parse" method. The base "Parser" class can be found in the
|
||
``docutils/parsers/__init__.py`` module.
|
||
|
||
Responsibilities: Given raw input text and a doctree root node,
|
||
populate the doctree by parsing the input text.
|
||
|
||
Example: The only parser implemented so far is for the
|
||
reStructuredText markup. It is implemented in the
|
||
``docutils/parsers/rst/`` package.
|
||
|
||
|
||
Transforms
|
||
----------
|
||
|
||
Transforms change the document tree from one form to another, add to
|
||
the tree, or prune it. Transforms are run by Reader and Writer
|
||
objects. Some transforms are Reader-specific, some are
|
||
Parser-specific, and others are Writer-specific. The choice and order
|
||
of transforms is specified in the Reader and Writer objects.
|
||
|
||
Each transform is a class in a module in the ``docutils/transforms/``
|
||
package, a subclass of docutils.tranforms.Transform.
|
||
|
||
Responsibilities:
|
||
|
||
- Modify a doctree in-place, either purely transforming one structure
|
||
into another, or adding new structures based on the doctree and/or
|
||
external data.
|
||
|
||
Examples (in the ``docutils/transforms/`` package):
|
||
|
||
- frontmatter.DocInfo: Conversion of document metadata (bibliographic
|
||
information).
|
||
|
||
- references.Hyperlinks: Resolution of hyperlinks.
|
||
|
||
- parts.Contents: Generates a table of contents for a document.
|
||
|
||
- document.Merger: Combining multiple populated doctrees into one (not
|
||
yet implemented or fully understood).
|
||
|
||
- document.Splitter: Splits a document into a tree-structure of
|
||
subdocuments, perhaps by section. It will have to transform
|
||
references appropriately. (Neither implemented not remotely
|
||
understood.)
|
||
|
||
- universal.Pending: Handles transforms that must be executed at
|
||
specific stages of processing.
|
||
|
||
- components.Filter: Includes or excludes elements which depend on a
|
||
specific Docutils component (triggered by the universal.Pending
|
||
transform).
|
||
|
||
|
||
Writers
|
||
-------
|
||
|
||
Writers produce the final output (HTML, XML, TeX, etc.). Writers
|
||
translate the internal document tree structure into the final data
|
||
format, possibly running Writer-specific transforms_ first.
|
||
|
||
Each writer is a module or package exporting a "Writer" class with a
|
||
"write" method. The base "Writer" class can be found in the
|
||
``docutils/writers/__init__.py`` module.
|
||
|
||
Responsibilities:
|
||
|
||
- Run transforms over the doctree(s).
|
||
|
||
- Translate doctree(s) into specific output formats.
|
||
|
||
- Transform references into format-native forms.
|
||
|
||
- Write the translated output to the destination I/O.
|
||
|
||
Examples:
|
||
|
||
- XML: Various forms, such as:
|
||
|
||
- DocBook (being implemented in the Docutils sandbox).
|
||
|
||
- Raw doctree XML (accessible via "``doctree.asdom().toxml()``"; no
|
||
Writer component implemented yet).
|
||
|
||
- HTML (XHTML implemented as ``docutils.writers.html4css1``).
|
||
|
||
- PDF (a ReportLabs interface is being developed in the Docutils
|
||
sandbox).
|
||
|
||
- TeX
|
||
|
||
- Docutils-native pseudo-XML (implemented as
|
||
``docutils.writers.pseudoxml``, used for testing).
|
||
|
||
- Plain text
|
||
|
||
- reStructuredText?
|
||
|
||
|
||
Input/Output
|
||
------------
|
||
|
||
I/O classes provide a uniform API for low-level input and output.
|
||
Subclasses will exist for a variety of input/output mechanisms.
|
||
|
||
I/O classes are currently in the preliminary stages; there's a lot of
|
||
work yet to be done. Issues:
|
||
|
||
- Looking at the list of writers, it seems that only HTML would
|
||
require anything other than monolithic output. Perhaps "Writer"
|
||
variants, one for each output distribution type?
|
||
|
||
- How to represent a multi-file document (files & directories) in the
|
||
API?
|
||
|
||
Responsibilities:
|
||
|
||
- Read data from the input source and/or write data to the output
|
||
destination.
|
||
|
||
Examples of input sources:
|
||
|
||
- A single file on disk or a stream (implemented as
|
||
``docutils.io.FileInput``).
|
||
|
||
- Multiple files on disk (``MultiFileInput``?).
|
||
|
||
- Python source files: modules and packages.
|
||
|
||
- Python strings, as received from a client application
|
||
(implemented as ``docutils.io.StringInput``).
|
||
|
||
Examples of output destinations:
|
||
|
||
- A single file on disk or a stream (implemented as
|
||
``docutils.io.FileOutput``).
|
||
|
||
- A tree of directories and files on disk.
|
||
|
||
- A Python string, returned to a client application (implemented as
|
||
``docutils.io.StringOutput``).
|
||
|
||
- A single tree-shaped data structure in memory.
|
||
|
||
- Some other set of data structures in memory.
|
||
|
||
|
||
Docutils Package Structure
|
||
==========================
|
||
|
||
- Package "docutils".
|
||
|
||
- Class "Component" is a base class for Docutils components.
|
||
|
||
- Module "docutils.core" contains facade class "Publisher" and
|
||
convenience function "publish()". See `Publisher`_ above.
|
||
|
||
- Module "docutils.frontend" provides command-line and option
|
||
processing for Docutils front-end tools.
|
||
|
||
- Module "docutils.io" provides a uniform API for low-level input
|
||
and output. See `Input/Output`_ above.
|
||
|
||
- Module "docutils.nodes" contains the Docutils document tree
|
||
element class library plus Visitor pattern base classes. See
|
||
`Document Tree`_ below.
|
||
|
||
- Module "docutils.optik" provides option parsing and command-line
|
||
help; from Greg Ward's http://optik.sf.net/ project, included for
|
||
convenience.
|
||
|
||
- Module "docutils.roman" contains Roman numeral conversion
|
||
routines.
|
||
|
||
- Module "docutils.statemachine" contains a finite state machine
|
||
specialized for regular-expression-based text filters. The
|
||
reStructuredText parser implementation is based on this module.
|
||
|
||
- Module "docutils.urischemes" contains a mapping of known URI
|
||
schemes ("http", "ftp", "mail", etc.).
|
||
|
||
- Module "docutils.utils" contains utility functions and classes,
|
||
including a logger class ("Reporter"; see `Error Handling`_
|
||
below).
|
||
|
||
- Package "docutils.parsers": markup parsers_.
|
||
|
||
- Function "get_parser_class(parser_name)" returns a parser module
|
||
by name. Class "Parser" is the base class of specific parsers.
|
||
(``docutils/parsers/__init__.py``)
|
||
|
||
- Package "docutils.parsers.rst": the reStructuredText parser.
|
||
|
||
- Alternate markup parsers may be added.
|
||
|
||
See `Parsers`_ above.
|
||
|
||
- Package "docutils.readers": context-aware input readers.
|
||
|
||
- Function "get_reader_class(reader_name)" returns a reader module
|
||
by name or alias. Class "Reader" is the base class of specific
|
||
readers. (``docutils/readers/__init__.py``)
|
||
|
||
- Module "docutils.readers.standalone" reads independent document
|
||
files.
|
||
|
||
- Module "docutils.readers.pep" reads PEPs (Python Enhancement
|
||
Proposals).
|
||
|
||
- Readers to be added for: Python source code (structure &
|
||
docstrings), PEPs, email, FAQ, and perhaps Wiki and others.
|
||
|
||
See `Readers`_ above.
|
||
|
||
- Package "docutils.writers": output format writers.
|
||
|
||
- Function "get_writer_class(writer_name)" returns a writer module
|
||
by name. Class "Writer" is the base class of specific writers.
|
||
(``docutils/writers/__init__.py``)
|
||
|
||
- Module "docutils.writers.pseudoxml" is a simple internal
|
||
document tree writer; it writes indented pseudo-XML.
|
||
|
||
- Module "docutils.writers.html4css1" is a simple HyperText Markup
|
||
Language document tree writer for HTML 4.01 and CSS1.
|
||
|
||
- Writers to be added: HTML 3.2 or 4.01-loose, XML (various forms,
|
||
such as DocBook and the raw internal doctree), PDF, TeX,
|
||
plaintext, reStructuredText, and perhaps others.
|
||
|
||
See `Writers`_ above.
|
||
|
||
- Package "docutils.transforms": tree transform classes.
|
||
|
||
- Class "Transform" is the base class of specific transforms.
|
||
(``docutils/transforms/__init__.py``)
|
||
|
||
- Each module contains related transform classes.
|
||
|
||
See `Transforms`_ above.
|
||
|
||
- Package "docutils.languages": Language modules contain
|
||
language-dependent strings and mappings. They are named for their
|
||
language identifier (as defined in `Choice of Docstring Format`_
|
||
above), converting dashes to underscores.
|
||
|
||
- Function "get_language(language_code)", returns matching
|
||
language module. (``docutils/languages/__init__.py``)
|
||
|
||
- Module "docutils.languages.en" (English).
|
||
|
||
- Other languages to be added.
|
||
|
||
|
||
Front-End Tools
|
||
===============
|
||
|
||
See `Docutils Front-End Tools`_.
|
||
|
||
.. _Docutils Front-End Tools: http://docutils.sf.net/docs/tools.html
|
||
|
||
|
||
Document Tree
|
||
=============
|
||
|
||
A single intermediate data structure is used internally by Docutils,
|
||
in the interfaces between components; it is defined in the
|
||
docutils.nodes module. It is not required that this data structure be
|
||
used *internally* by any of the components, just *between* components.
|
||
|
||
Custom node types are allowed, provided that either (a) a transform
|
||
converts them to standard Docutils nodes before they reach the Writer
|
||
proper, or (b) the custom node is explicitly supported by certain
|
||
Writers, and is wrapped in a filtered "pending" node. An example of
|
||
condition A is the `Python Source Reader`_ (see below), where a
|
||
"stylist" transform converts custom nodes. The HTML ``<meta>`` tag is
|
||
an example of condition B; it is supported by the HTML Writer but not
|
||
by others. The reStructuredText "meta" directive creates a "pending"
|
||
node, which contains knowledge that the embedded "meta" node can only
|
||
be handled by HTML-compatible writers. The "pending" node is resolved
|
||
by the "transforms.components.Filter" transform, which checks that the
|
||
calling writer supports HTML; if it doesn't, the "meta" node is
|
||
removed from the document.
|
||
|
||
The document tree data structure is similar to a DOM tree, but with
|
||
specific node names (classes) instead of DOM's generic nodes. The
|
||
schema is documented in an XML DTD (eXtensible Markup Language
|
||
Document Type Definition), which comes in two parts:
|
||
|
||
- the Docutils Generic DTD, docutils.dtd_, and
|
||
|
||
- the OASIS Exchange Table Model, soextbl.dtd_.
|
||
|
||
The DTD defines a rich set of elements, suitable for many input and
|
||
output formats. The DTD retains all information necessary to
|
||
reconstruct the original input text, or a reasonable facsimile
|
||
thereof.
|
||
|
||
See `The Docutils Document Tree`_ for details (incomplete).
|
||
|
||
|
||
Error Handling
|
||
==============
|
||
|
||
When the parser encounters an error in markup, it inserts a system
|
||
message (DTD element "system_message"). There are five levels of
|
||
system messages:
|
||
|
||
- Level-0, "DEBUG": an internal reporting issue. There is no effect
|
||
on the processing. Level-0 system messages are handled separately
|
||
from the others.
|
||
|
||
- Level-1, "INFO": a minor issue that can be ignored. There is little
|
||
or no effect on the processing. Typically level-1 system messages
|
||
are not reported.
|
||
|
||
- Level-2, "WARNING": an issue that should be addressed. If ignored,
|
||
there may be minor problems with the output. Typically level-2
|
||
system messages are reported but do not halt processing
|
||
|
||
- Level-3, "ERROR": a major issue that should be addressed. If
|
||
ignored, the output will contain unpredictable errors. Typically
|
||
level-3 system messages are reported but do not halt processing
|
||
|
||
- Level-4, "SEVERE": a critical error that must be addressed.
|
||
Typically level-4 system messages are turned into exceptions which
|
||
halt processing. If ignored, the output will contain severe errors.
|
||
|
||
Although the initial message levels were devised independently, they
|
||
have a strong correspondence to `VMS error condition severity
|
||
levels`_; the names in quotes for levels 1 through 4 were borrowed
|
||
from VMS. Error handling has since been influenced by the `log4j
|
||
project`_.
|
||
|
||
|
||
Python Source Reader
|
||
====================
|
||
|
||
The Python Source Reader ("PySource") is the Docutils component that
|
||
reads Python source files, extracts docstrings in context, then
|
||
parses, links, and assembles the docstrings into a cohesive whole. It
|
||
is a major and non-trivial component, currently under experimental
|
||
development in the Docutils sandbox. High-level design issues are
|
||
presented here.
|
||
|
||
|
||
Processing Model
|
||
----------------
|
||
|
||
This model will evolve over time, incorporating experience and
|
||
discoveries.
|
||
|
||
1. The PySource Reader uses an I/O class to read in some Python
|
||
packages and modules, into a tree of strings.
|
||
|
||
2. The Python modules are parsed, converting the tree of strings into
|
||
a tree of abstract syntax trees.
|
||
|
||
3. The abstract syntax trees are converted into an internal
|
||
representation of the packages/modules. Docstrings are extracted,
|
||
as well as code structure details. See `AST Mining`_ below.
|
||
Namespaces are constructed for lookup in step 6.
|
||
|
||
4. One at a time, the docstrings are parsed, producing standard
|
||
Docutils doctrees.
|
||
|
||
5. PySource assembles all the individual docstrings' doctrees into a
|
||
Python-specific custom Docutils tree parallelling the
|
||
package/module/class structure; this is a custom Reader-specific
|
||
internal representation (see the `Docutils Python Source DTD`_).
|
||
Namespaces must be merged: Python identifiers, hyperlink targets.
|
||
|
||
6. Cross-references from docstrings (interpreted text) to Python
|
||
identifiers are resolved according to the Python namespace lookup
|
||
rules. See `Identifier Cross-References`_ below.
|
||
|
||
7. A "Stylist" transform is applied to the custom doctree, custom
|
||
nodes are rendered using standard nodes as primitives, and a
|
||
standard document tree is emitted. See `Stylist Transforms`_
|
||
below.
|
||
|
||
8. Other transforms are applied to the standard doctree.
|
||
|
||
9. The standard doctree is sent to a Writer, which translates the
|
||
document into a concrete format (HTML, PDF, etc.).
|
||
|
||
10. The Writer uses an I/O class to write the resulting data to its
|
||
destination (disk file, directories and files, etc.).
|
||
|
||
|
||
AST Mining
|
||
----------
|
||
|
||
Abstract Syntax Tree mining code will be written that scans a parsed
|
||
Python module, and returns an ordered tree containing the names,
|
||
docstrings (including attribute and additional docstrings; see below),
|
||
and additional info (in parentheses below) of all of the following
|
||
objects:
|
||
|
||
- packages
|
||
- modules
|
||
- module attributes (+ initial values)
|
||
- classes (+ inheritance)
|
||
- class attributes (+ initial values)
|
||
- instance attributes (+ initial values)
|
||
- methods (+ parameters & defaults)
|
||
- functions (+ parameters & defaults)
|
||
|
||
(Extract comments too? For example, comments at the start of a module
|
||
would be a good place for bibliographic field lists.)
|
||
|
||
In order to evaluate interpreted text cross-references, namespaces for
|
||
each of the above will also be required.
|
||
|
||
See python-dev/docstring-develop thread "AST mining", started on
|
||
2001-08-14.
|
||
|
||
|
||
Docstring Extraction Rules
|
||
--------------------------
|
||
|
||
1. What to examine:
|
||
|
||
a) If the "``__all__``" variable is present in the module being
|
||
documented, only identifiers listed in "``__all__``" are
|
||
examined for docstrings.
|
||
|
||
b) In the absense of "``__all__``", all identifiers are examined,
|
||
except those whose names are private (names begin with "_" but
|
||
don't begin and end with "__").
|
||
|
||
c) 1a and 1b can be overridden by a parameter or command-line
|
||
option.
|
||
|
||
2. Where:
|
||
|
||
Docstrings are string literal expressions, and are recognized in
|
||
the following places within Python modules:
|
||
|
||
a) At the beginning of a module, function definition, class
|
||
definition, or method definition, after any comments. This is
|
||
the standard for Python ``__doc__`` attributes.
|
||
|
||
b) Immediately following a simple assignment at the top level of a
|
||
module, class definition, or ``__init__`` method definition,
|
||
after any comments. See `Attribute Docstrings`_ below.
|
||
|
||
c) Additional string literals found immediately after the
|
||
docstrings in (a) and (b) will be recognized, extracted, and
|
||
concatenated. See `Additional Docstrings`_ below.
|
||
|
||
d) @@@ 2.2-style "properties" with attribute docstrings?
|
||
|
||
3. How:
|
||
|
||
Whenever possible, Python modules should be parsed by Docutils, not
|
||
imported. There are several reasons:
|
||
|
||
- Importing untrusted code is inherently insecure.
|
||
|
||
- Information from the source is lost when using introspection to
|
||
examine an imported module, such as comments and the order of
|
||
definitions.
|
||
|
||
- Docstrings are to be recognized in places where the bytecode
|
||
compiler ignores string literal expressions (2b and 2c above),
|
||
meaning importing the module will lose these docstrings.
|
||
|
||
Of course, standard Python parsing tools such as the "parser"
|
||
library module should be used.
|
||
|
||
When the Python source code for a module is not available
|
||
(i.e. only the ``.pyc`` file exists) or for C extension modules, to
|
||
access docstrings the module can only be imported, and any
|
||
limitations must be lived with.
|
||
|
||
Since attribute docstrings and additional docstrings are ignored by
|
||
the Python bytecode compiler, no namespace pollution or runtime bloat
|
||
will result from their use. They are not assigned to ``__doc__`` or
|
||
to any other attribute. The initial parsing of a module may take a
|
||
slight performance hit.
|
||
|
||
|
||
Attribute Docstrings
|
||
''''''''''''''''''''
|
||
|
||
(This is a simplified version of PEP 224 [#PEP-224]_.)
|
||
|
||
A string literal immediately following an assignment statement is
|
||
interpreted by the docstring extration machinery as the docstring of
|
||
the target of the assignment statement, under the following
|
||
conditions:
|
||
|
||
1. The assignment must be in one of the following contexts:
|
||
|
||
a) At the top level of a module (i.e., not nested inside a compound
|
||
statement such as a loop or conditional): a module attribute.
|
||
|
||
b) At the top level of a class definition: a class attribute.
|
||
|
||
c) At the top level of the "``__init__``" method definition of a
|
||
class: an instance attribute.
|
||
|
||
Since each of the above contexts are at the top level (i.e., in the
|
||
outermost suite of a definition), it may be necessary to place
|
||
dummy assignments for attributes assigned conditionally or in a
|
||
loop.
|
||
|
||
2. The assignment must be to a single target, not to a list or a tuple
|
||
of targets.
|
||
|
||
3. The form of the target:
|
||
|
||
a) For contexts 1a and 1b above, the target must be a simple
|
||
identifier (not a dotted identifier, a subscripted expression,
|
||
or a sliced expression).
|
||
|
||
b) For context 1c above, the target must be of the form
|
||
"``self.attrib``", where "``self``" matches the "``__init__``"
|
||
method's first parameter (the instance parameter) and "attrib"
|
||
is a simple indentifier as in 3a.
|
||
|
||
Blank lines may be used after attribute docstrings to emphasize the
|
||
connection between the assignment and the docstring.
|
||
|
||
Examples::
|
||
|
||
g = 'module attribute (module-global variable)'
|
||
"""This is g's docstring."""
|
||
|
||
class AClass:
|
||
|
||
c = 'class attribute'
|
||
"""This is AClass.c's docstring."""
|
||
|
||
def __init__(self):
|
||
self.i = 'instance attribute'
|
||
"""This is self.i's docstring."""
|
||
|
||
|
||
Additional Docstrings
|
||
'''''''''''''''''''''
|
||
|
||
(This idea was adapted from PEP 216 [#PEP-216]_.)
|
||
|
||
Many programmers would like to make extensive use of docstrings for
|
||
API documentation. However, docstrings do take up space in the
|
||
running program, so some of these programmers are reluctant to "bloat
|
||
up" their code. Also, not all API documentation is applicable to
|
||
interactive environments, where ``__doc__`` would be displayed.
|
||
|
||
The docstring processing system's extraction tools will concatenate
|
||
all string literal expressions which appear at the beginning of a
|
||
definition or after a simple assignment. Only the first strings in
|
||
definitions will be available as ``__doc__``, and can be used for
|
||
brief usage text suitable for interactive sessions; subsequent string
|
||
literals and all attribute docstrings are ignored by the Python
|
||
bytecode compiler and may contain more extensive API information.
|
||
|
||
Example::
|
||
|
||
def function(arg):
|
||
"""This is __doc__, function's docstring."""
|
||
"""
|
||
This is an additional docstring, ignored by the bytecode
|
||
compiler, but extracted by the Docutils.
|
||
"""
|
||
pass
|
||
|
||
.. topic:: Issue: ``from __future__ import``
|
||
|
||
This would break "``from __future__ import``" statements introduced
|
||
in Python 2.1 for multiple module docstrings (main docstring plus
|
||
additional docstring(s)). The Python Reference Manual specifies:
|
||
|
||
A future statement must appear near the top of the module. The
|
||
only lines that can appear before a future statement are:
|
||
|
||
* the module docstring (if any),
|
||
* comments,
|
||
* blank lines, and
|
||
* other future statements.
|
||
|
||
Resolution?
|
||
|
||
1. Should we search for docstrings after a ``__future__``
|
||
statement? Very ugly.
|
||
|
||
2. Redefine ``__future__`` statements to allow multiple preceeding
|
||
string literals?
|
||
|
||
3. Or should we not even worry about this? There probably
|
||
shouldn't be ``__future__`` statements in production code, after
|
||
all. Will modules with ``__future__`` statements simply have to
|
||
put up with the single-docstring limitation?
|
||
|
||
|
||
Choice of Docstring Format
|
||
--------------------------
|
||
|
||
Rather than force everyone to use a single docstring format, multiple
|
||
input formats are allowed by the processing system. A special
|
||
variable, ``__docformat__``, may appear at the top level of a module
|
||
before any function or class definitions. Over time or through
|
||
decree, a standard format or set of formats should emerge.
|
||
|
||
The ``__docformat__`` variable is a string containing the name of the
|
||
format being used, a case-insensitive string matching the input
|
||
parser's module or package name (i.e., the same name as required to
|
||
"import" the module or package), or a registered alias. If no
|
||
``__docformat__`` is specified, the default format is "plaintext" for
|
||
now; this may be changed to the standard format once determined.
|
||
|
||
The ``__docformat__`` string may contain an optional second field,
|
||
separated from the format name (first field) by a single space: a
|
||
case-insensitive language identifier as defined in RFC 1766. A
|
||
typical language identifier consists of a 2-letter language code from
|
||
`ISO 639`_ (3-letter codes used only if no 2-letter code exists; RFC
|
||
1766 is currently being revised to allow 3-letter codes). If no
|
||
language identifier is specified, the default is "en" for English.
|
||
The language identifier is passed to the parser and can be used for
|
||
language-dependent markup features.
|
||
|
||
|
||
Identifier Cross-References
|
||
---------------------------
|
||
|
||
In Python docstrings, interpreted text is used to classify and mark up
|
||
program identifiers, such as the names of variables, functions,
|
||
classes, and modules. If the identifier alone is given, its role is
|
||
inferred implicitly according to the Python namespace lookup rules.
|
||
For functions and methods (even when dynamically assigned),
|
||
parentheses ('()') may be included::
|
||
|
||
This function uses `another()` to do its work.
|
||
|
||
For class, instance and module attributes, dotted identifiers are used
|
||
when necessary. For example (using reStructuredText markup)::
|
||
|
||
class Keeper(Storer):
|
||
|
||
"""
|
||
Extend `Storer`. Class attribute `instances` keeps track
|
||
of the number of `Keeper` objects instantiated.
|
||
"""
|
||
|
||
instances = 0
|
||
"""How many `Keeper` objects are there?"""
|
||
|
||
def __init__(self):
|
||
"""
|
||
Extend `Storer.__init__()` to keep track of instances.
|
||
|
||
Keep count in `self.instances`, data in `self.data`.
|
||
"""
|
||
Storer.__init__(self)
|
||
self.instances += 1
|
||
|
||
self.data = []
|
||
"""Store data in a list, most recent last."""
|
||
|
||
def storedata(self, data):
|
||
"""
|
||
Extend `Storer.storedata()`; append new `data` to a
|
||
list (in `self.data`).
|
||
"""
|
||
self.data = data
|
||
|
||
Each of the identifiers quoted with backquotes ("`") will become
|
||
references to the definitions of the identifiers themselves.
|
||
|
||
|
||
Stylist Transforms
|
||
------------------
|
||
|
||
Stylist transforms are specialized transforms specific to a Reader.
|
||
The PySource Reader doesn't have to make any decisions as to style; it
|
||
just produces a logically constructed document tree, parsed and
|
||
linked, including custom node types. Stylist transforms understand
|
||
the custom nodes created by the Reader and convert them into standard
|
||
Docutils nodes.
|
||
|
||
Multiple Stylist transforms may be implemented and one can be chosen
|
||
at runtime (through a "--style" or "--stylist" command-line option).
|
||
Each Stylist transform implements a different layout or style; thus
|
||
the name. They decouple the context-understanding part of the Reader
|
||
from the layout-generating part of processing, resulting in a more
|
||
flexible and robust system. This also serves to "separate style from
|
||
content", the SGML/XML ideal.
|
||
|
||
By keeping the piece of code that does the styling small and modular,
|
||
it becomes much easier for people to roll their own styles. The
|
||
"barrier to entry" is too high with existing tools; extracting the
|
||
stylist code will lower the barrier considerably.
|
||
|
||
|
||
==========================
|
||
References and Footnotes
|
||
==========================
|
||
|
||
.. [#PEP-256] PEP 256, Docstring Processing System Framework, Goodger
|
||
(http://www.python.org/peps/pep-0256.html)
|
||
|
||
.. [#PEP-224] PEP 224, Attribute Docstrings, Lemburg
|
||
(http://www.python.org/peps/pep-0224.html)
|
||
|
||
.. [#PEP-216] PEP 216, Docstring Format, Zadka
|
||
(http://www.python.org/peps/pep-0216.html)
|
||
|
||
.. _docutils.dtd: http://docutils.sourceforge.net/spec/docutils.dtd
|
||
|
||
.. _soextbl.dtd: http://docutils.sourceforge.net/spec/soextblx.dtd
|
||
|
||
.. _The Docutils Document Tree:
|
||
http://docutils.sourceforge.net/spec/doctree.html
|
||
|
||
.. _VMS error condition severity levels:
|
||
http://www.openvms.compaq.com:8000/73final/5841/841pro_027.html
|
||
#error_cond_severity
|
||
|
||
.. _log4j project: http://jakarta.apache.org/log4j/
|
||
|
||
.. _Docutils Python Source DTD:
|
||
http://docutils.sourceforge.net/spec/pysource.dtd
|
||
|
||
.. _ISO 639: http://lcweb.loc.gov/standards/iso639-2/englangn.html
|
||
|
||
.. _Python Doc-SIG: http://www.python.org/sigs/doc-sig/
|
||
|
||
|
||
|
||
==================
|
||
Project Web Site
|
||
==================
|
||
|
||
A SourceForge project has been set up for this work at
|
||
http://docutils.sourceforge.net/.
|
||
|
||
|
||
===========
|
||
Copyright
|
||
===========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
==================
|
||
Acknowledgements
|
||
==================
|
||
|
||
This document borrows ideas from the archives of the `Python
|
||
Doc-SIG`_. Thanks to all members past & present.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|