Updated
This commit is contained in:
parent
4a04782272
commit
a80082ba80
|
@ -34,8 +34,8 @@ The concepts of a DPS framework are presented independently of
|
|||
implementation details.
|
||||
|
||||
|
||||
Roadmap to the Doctring PEPs
|
||||
============================
|
||||
Road Map to the Doctring PEPs
|
||||
=============================
|
||||
|
||||
There are many aspects to docstring processing. The "Docstring PEPs"
|
||||
have broken up the issues in order to deal with each of them in
|
||||
|
@ -184,7 +184,7 @@ The docstring processing system framework is broken up as follows:
|
|||
|
||||
- Docstring extraction rules.
|
||||
|
||||
- Readers, which encapsulate the input context .
|
||||
- Readers, which encapsulate the input context.
|
||||
|
||||
- Parsers.
|
||||
|
||||
|
|
476
pep-0258.txt
476
pep-0258.txt
|
@ -20,7 +20,7 @@ This PEP documents design issues and implementation details for
|
|||
Docutils, a Python Docstring Processing System (DPS). The rationale
|
||||
and high-level concepts of a DPS are documented in PEP 256, "Docstring
|
||||
Processing System Framework" [#PEP-256]_. Also see PEP 256 for a
|
||||
"Roadmap to the Doctring PEPs".
|
||||
"Road Map to the Docstring PEPs".
|
||||
|
||||
Docutils is being designed modularly so that any of its components can
|
||||
be replaced easily. In addition, Docutils is not limited to the
|
||||
|
@ -39,63 +39,65 @@ documentation.
|
|||
Docutils Project Model
|
||||
======================
|
||||
|
||||
::
|
||||
Project components and data flow::
|
||||
|
||||
+--------------------------+
|
||||
| Docutils: |
|
||||
| docutils.core.Publisher, |
|
||||
| docutils.core.publish() |
|
||||
+--------------------------+
|
||||
/ \
|
||||
/ \
|
||||
1,3,5,7 / \ 8,10
|
||||
+--------+ +--------+
|
||||
| READER | =========================> | WRITER |
|
||||
+--------+ +--------+
|
||||
/ || \ / \
|
||||
/ || \ / \
|
||||
2 / 4 || \ 6 9 / \ 11
|
||||
+-----+ +--------+ +-------------+ +------------+ +-----+
|
||||
| I/O | | PARSER |...| reader | | writer | | I/O |
|
||||
+-----+ +--------+ | transforms | | transforms | +-----+
|
||||
| | | |
|
||||
| - docinfo | | - system |
|
||||
| - titles | | messages |
|
||||
| - linking | | - final |
|
||||
| - lookups | | checks |
|
||||
| - reader- | | - writer- |
|
||||
| specific | | specific |
|
||||
| - parser- | | - etc. |
|
||||
| specific | +------------+
|
||||
| - layout |
|
||||
| (stylist) |
|
||||
| - etc. |
|
||||
+-------------+
|
||||
+---------------------------+
|
||||
| Docutils: |
|
||||
| docutils.core.Publisher, |
|
||||
| docutils.core.publish_*() |
|
||||
+---------------------------+
|
||||
/ | \
|
||||
/ | \
|
||||
1,3,5 / 6 | \ 7
|
||||
+--------+ +-------------+ +--------+
|
||||
| READER | ----> | TRANSFORMER | ====> | WRITER |
|
||||
+--------+ +-------------+ +--------+
|
||||
/ \\ |
|
||||
/ \\ |
|
||||
2 / 4 \\ 8 |
|
||||
+-------+ +--------+ +--------+
|
||||
| INPUT | | PARSER | | OUTPUT |
|
||||
+-------+ +--------+ +--------+
|
||||
|
||||
The numbers indicate the path a document's data takes through the
|
||||
code. Double-width lines between reader & parser and between reader &
|
||||
writer indicate that data sent along these paths should be standard
|
||||
(pure & unextended) Docutils doc trees. Single-width lines signify
|
||||
that internal tree extensions or completely unrelated representations
|
||||
are possible, but they must be supported at both ends.
|
||||
The numbers above each component indicate the path a document's data
|
||||
takes. Double-width lines between Reader & Parser and between
|
||||
Transformer & Writer indicate that data sent along these paths should
|
||||
be standard (pure & unextended) Docutils doc trees. Single-width
|
||||
lines signify that internal tree extensions or completely unrelated
|
||||
representations are possible, but they must be supported at both ends.
|
||||
|
||||
|
||||
Publisher
|
||||
---------
|
||||
|
||||
The ``docutils.core`` module contains a "Publisher" facade class and
|
||||
"publish" convenience function. Publisher encapsulates the high-level
|
||||
logic of a Docutils system. The ``Publisher.publish()`` method first
|
||||
calls its Reader, which reads data from its source I/O, parses and
|
||||
transforms the data, and returns it. ``Publisher.publish()`` then
|
||||
passes the resulting document tree to its Writer, which further
|
||||
transforms the document before translating it to the final output
|
||||
format and writing the formatted data to its destination I/O.
|
||||
several convenience functions: "publish_cmdline()" (for command-line
|
||||
front ends), "publish_file()" (for programmatic use with file-like
|
||||
I/O), and "publish_string()" (for programmatic use with string I/O).
|
||||
The Publisher class encapsulates the high-level logic of a Docutils
|
||||
system. The Publisher class has overall responsibility for
|
||||
processing, controlled by the ``Publisher.publish()`` method:
|
||||
|
||||
1. Set up internal settings (may include config files & command-line
|
||||
options) and I/O objects.
|
||||
|
||||
2. Call the Reader object to read data from the source Input object
|
||||
and parse the data with the Parser object. A document object is
|
||||
returned.
|
||||
|
||||
3. Set up and apply transforms via the Transformer object attached to
|
||||
the document.
|
||||
|
||||
4. Call the Writer object which translates the document to the final
|
||||
output format and writes the formatted data to the destination
|
||||
Output object. Depending on the Output object, the output may be
|
||||
returned from the Writer, and then from the ``publish()`` method.
|
||||
|
||||
Calling the "publish" function (or instantiating a "Publisher" object)
|
||||
with component names will result in default behavior. For custom
|
||||
behavior (setting component options), create custom component objects
|
||||
first, and pass *them* to publish/Publisher.
|
||||
behavior (customizing component settings), create custom component
|
||||
objects first, and pass *them* to the Publisher or ``publish_*``
|
||||
convenience functions.
|
||||
|
||||
|
||||
Readers
|
||||
|
@ -104,9 +106,6 @@ Readers
|
|||
Readers understand the input context (where the data is coming from),
|
||||
send the whole input or discrete "chunks" to the parser, and provide
|
||||
the context to bind the chunks together back into a cohesive whole.
|
||||
Using transforms_, Readers also resolve references, footnote numbers,
|
||||
interpreted text processing, and anything else that requires
|
||||
context-sensitive computation.
|
||||
|
||||
Each reader is a module or package exporting a "Reader" class with a
|
||||
"read" method. The base "Reader" class can be found in the
|
||||
|
@ -118,43 +117,40 @@ still incomplete) will be able to determine the parser on its own.
|
|||
|
||||
Responsibilities:
|
||||
|
||||
- Get input text from the source I/O.
|
||||
* Get input text from the source I/O.
|
||||
|
||||
- Pass the input text to the parser, along with a fresh doctree root.
|
||||
|
||||
- Run transforms over the doctree(s).
|
||||
* Pass the input text to the parser, along with a fresh `document
|
||||
tree`_ root.
|
||||
|
||||
Examples:
|
||||
|
||||
- Standalone (Raw/Plain): Just read a text file and process it.
|
||||
* Standalone (Raw/Plain): Just read a text file and process it.
|
||||
The reader needs to be told which parser to use.
|
||||
|
||||
The "Standalone Reader" has been implemented in module
|
||||
``docutils.readers.standalone``.
|
||||
|
||||
- Python Source: See `Python Source Reader`_ below. This Reader is
|
||||
* Python Source: See `Python Source Reader`_ below. This Reader is
|
||||
currently in development in the Docutils sandbox.
|
||||
|
||||
- Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
|
||||
* Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
|
||||
|
||||
- PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to URIs.
|
||||
Either interpret PEPs' indented sections or convert existing PEPs to
|
||||
reStructuredText (or both?).
|
||||
* PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to URIs.
|
||||
The "PEP Reader" has been implemented in module
|
||||
``docutils.readers.pep``; see PEP 287 and PEP 12.
|
||||
|
||||
The "PEP Reader" is being implemented in module
|
||||
``docutils.readers.pep``.
|
||||
|
||||
- Wiki: Global reference lookups of "wiki links" incorporated into
|
||||
* Wiki: Global reference lookups of "wiki links" incorporated into
|
||||
transforms. (CamelCase only or unrestricted?) Lazy
|
||||
indentation?
|
||||
|
||||
- Web Page: As standalone, but recognize meta fields as meta tags.
|
||||
* Web Page: As standalone, but recognize meta fields as meta tags.
|
||||
Support for templates of some sort? (After ``<body>``, before
|
||||
``</body>``?)
|
||||
|
||||
- FAQ: Structured "question & answer(s)" constructs.
|
||||
* FAQ: Structured "question & answer(s)" constructs.
|
||||
|
||||
- Compound document: Merge chapters into a book. Master TOC file?
|
||||
* Compound document: Merge chapters into a book. Master manifest
|
||||
file?
|
||||
|
||||
|
||||
Parsers
|
||||
|
@ -175,93 +171,121 @@ Example: The only parser implemented so far is for the
|
|||
reStructuredText markup. It is implemented in the
|
||||
``docutils/parsers/rst/`` package.
|
||||
|
||||
The development and integration of other parsers is possible and
|
||||
encouraged.
|
||||
|
||||
Transforms
|
||||
----------
|
||||
|
||||
Transforms change the document tree from one form to another, add to
|
||||
the tree, or prune it. Transforms are run by Reader and Writer
|
||||
objects. Some transforms are Reader-specific, some are
|
||||
Parser-specific, and others are Writer-specific. The choice and order
|
||||
of transforms is specified in the Reader and Writer objects.
|
||||
.. _transforms:
|
||||
|
||||
Transformer
|
||||
-----------
|
||||
|
||||
The Transformer class, in ``docutils/transforms/__init__.py``, stores
|
||||
transforms and applies them to documents. A transformer object is
|
||||
attached to every new document tree. The Publisher_ calls
|
||||
``Transformer.apply_transforms()`` to apply all stored transforms to
|
||||
the document tree. Transforms change the document tree from one form
|
||||
to another, add to the tree, or prune it. Transforms resolve
|
||||
references and footnote numbers, processing interpreted text, and do
|
||||
other context-sensitive processing.
|
||||
|
||||
Some transforms are specific to components (Readers, Parser, Writers,
|
||||
Input, Output). Standard component-specific transforms are specified
|
||||
in the ``default_transforms`` attribute of component classes. After
|
||||
the Reader has finished processing, the Publisher_ calls
|
||||
``Transformer.populate_from_components()`` with a list of components
|
||||
and all default transforms are stored.
|
||||
|
||||
Each transform is a class in a module in the ``docutils/transforms/``
|
||||
package, a subclass of docutils.tranforms.Transform.
|
||||
package, a subclass of ``docutils.tranforms.Transform``. Transform
|
||||
classes each have a ``default_priority`` attribute which is used by
|
||||
the Transformer to apply transforms in order (low to high). The
|
||||
default priority can be overridden when adding transforms to the
|
||||
Transformer object.
|
||||
|
||||
Responsibilities:
|
||||
Transformer responsibilities:
|
||||
|
||||
- Modify a doctree in-place, either purely transforming one structure
|
||||
* Apply transforms to the document tree, in priority order.
|
||||
|
||||
* Store a mapping of component type name ('reader', 'writer', etc.) to
|
||||
component objects. These are used by certain transforms (such as
|
||||
"components.Filter") to determine suitability.
|
||||
|
||||
Transform responsibilities:
|
||||
|
||||
* Modify a doctree in-place, either purely transforming one structure
|
||||
into another, or adding new structures based on the doctree and/or
|
||||
external data.
|
||||
|
||||
Examples (in the ``docutils/transforms/`` package):
|
||||
Examples of transforms (in the ``docutils/transforms/`` package):
|
||||
|
||||
- frontmatter.DocInfo: Conversion of document metadata (bibliographic
|
||||
* frontmatter.DocInfo: Conversion of document metadata (bibliographic
|
||||
information).
|
||||
|
||||
- references.Hyperlinks: Resolution of hyperlinks.
|
||||
* references.AnonymousHyperlinks: Resolution of anonymous references
|
||||
to corresponding targets.
|
||||
|
||||
- parts.Contents: Generates a table of contents for a document.
|
||||
* parts.Contents: Generates a table of contents for a document.
|
||||
|
||||
- document.Merger: Combining multiple populated doctrees into one (not
|
||||
yet implemented or fully understood).
|
||||
* document.Merger: Combining multiple populated doctrees into one.
|
||||
(Not yet implemented or fully understood.)
|
||||
|
||||
- document.Splitter: Splits a document into a tree-structure of
|
||||
* document.Splitter: Splits a document into a tree-structure of
|
||||
subdocuments, perhaps by section. It will have to transform
|
||||
references appropriately. (Neither implemented not remotely
|
||||
understood.)
|
||||
|
||||
- universal.Pending: Handles transforms that must be executed at
|
||||
specific stages of processing.
|
||||
|
||||
- components.Filter: Includes or excludes elements which depend on a
|
||||
specific Docutils component (triggered by the universal.Pending
|
||||
transform).
|
||||
* components.Filter: Includes or excludes elements which depend on a
|
||||
specific Docutils component.
|
||||
|
||||
|
||||
Writers
|
||||
-------
|
||||
|
||||
Writers produce the final output (HTML, XML, TeX, etc.). Writers
|
||||
translate the internal document tree structure into the final data
|
||||
translate the internal `document tree`_ structure into the final data
|
||||
format, possibly running Writer-specific transforms_ first.
|
||||
|
||||
By the time the document gets to the Writer, it should be in final
|
||||
form. The Writer's job is simply (and only) to translate from the
|
||||
Docutils doctree structure to the target format. Some small
|
||||
transforms may be required, but they should be local and
|
||||
format-specific.
|
||||
|
||||
Each writer is a module or package exporting a "Writer" class with a
|
||||
"write" method. The base "Writer" class can be found in the
|
||||
``docutils/writers/__init__.py`` module.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- Run transforms over the doctree(s).
|
||||
|
||||
- Translate doctree(s) into specific output formats.
|
||||
* Translate doctree(s) into specific output formats.
|
||||
|
||||
- Transform references into format-native forms.
|
||||
|
||||
- Write the translated output to the destination I/O.
|
||||
* Write the translated output to the destination I/O.
|
||||
|
||||
Examples:
|
||||
|
||||
- XML: Various forms, such as:
|
||||
* XML: Various forms, such as:
|
||||
|
||||
- Docutils XML (an expression of the internal document tree,
|
||||
implemented as ``docutils.writers.docutils_xml``).
|
||||
|
||||
- DocBook (being implemented in the Docutils sandbox).
|
||||
|
||||
- Raw doctree XML (accessible via "``doctree.asdom().toxml()``"; no
|
||||
Writer component implemented yet).
|
||||
* HTML (XHTML implemented as ``docutils.writers.html4css1``).
|
||||
|
||||
- HTML (XHTML implemented as ``docutils.writers.html4css1``).
|
||||
|
||||
- PDF (a ReportLabs interface is being developed in the Docutils
|
||||
* PDF (a ReportLabs interface is being developed in the Docutils
|
||||
sandbox).
|
||||
|
||||
- TeX
|
||||
* TeX (a LaTeX Writer is being implemented in the sandbox).
|
||||
|
||||
- Docutils-native pseudo-XML (implemented as
|
||||
* Docutils-native pseudo-XML (implemented as
|
||||
``docutils.writers.pseudoxml``, used for testing).
|
||||
|
||||
- Plain text
|
||||
* Plain text
|
||||
|
||||
- reStructuredText?
|
||||
* reStructuredText?
|
||||
|
||||
|
||||
Input/Output
|
||||
|
@ -269,68 +293,78 @@ Input/Output
|
|||
|
||||
I/O classes provide a uniform API for low-level input and output.
|
||||
Subclasses will exist for a variety of input/output mechanisms.
|
||||
However, they can be considered an implementation detail. Most
|
||||
applications should be satisfied using one of the convenience
|
||||
functions associated with the Publisher_.
|
||||
|
||||
I/O classes are currently in the preliminary stages; there's a lot of
|
||||
work yet to be done. Issues:
|
||||
|
||||
- Looking at the list of writers, it seems that only HTML would
|
||||
require anything other than monolithic output. Perhaps "Writer"
|
||||
variants, one for each output distribution type?
|
||||
* How to represent multi-file input (files & directories) in the API?
|
||||
|
||||
- How to represent a multi-file document (files & directories) in the
|
||||
API?
|
||||
* How to represent multi-file output? Perhaps "Writer" variants, one
|
||||
for each output distribution type? Or Output objects with
|
||||
associated transforms?
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- Read data from the input source and/or write data to the output
|
||||
destination.
|
||||
* Read data from the input source (Input objects) or write data to the
|
||||
output destination (Output objects).
|
||||
|
||||
Examples of input sources:
|
||||
|
||||
- A single file on disk or a stream (implemented as
|
||||
* A single file on disk or a stream (implemented as
|
||||
``docutils.io.FileInput``).
|
||||
|
||||
- Multiple files on disk (``MultiFileInput``?).
|
||||
* Multiple files on disk (``MultiFileInput``?).
|
||||
|
||||
- Python source files: modules and packages.
|
||||
* Python source files: modules and packages.
|
||||
|
||||
- Python strings, as received from a client application
|
||||
* Python strings, as received from a client application
|
||||
(implemented as ``docutils.io.StringInput``).
|
||||
|
||||
Examples of output destinations:
|
||||
|
||||
- A single file on disk or a stream (implemented as
|
||||
* A single file on disk or a stream (implemented as
|
||||
``docutils.io.FileOutput``).
|
||||
|
||||
- A tree of directories and files on disk.
|
||||
* A tree of directories and files on disk.
|
||||
|
||||
- A Python string, returned to a client application (implemented as
|
||||
* A Python string, returned to a client application (implemented as
|
||||
``docutils.io.StringOutput``).
|
||||
|
||||
- A single tree-shaped data structure in memory.
|
||||
* No output; useful for programmatic applications where only a portion
|
||||
of the normal output is to be used (implemented as
|
||||
``docutils.io.NullOutput``).
|
||||
|
||||
- Some other set of data structures in memory.
|
||||
* A single tree-shaped data structure in memory.
|
||||
|
||||
* Some other set of data structures in memory.
|
||||
|
||||
|
||||
Docutils Package Structure
|
||||
==========================
|
||||
|
||||
- Package "docutils".
|
||||
* Package "docutils".
|
||||
|
||||
- Class "Component" is a base class for Docutils components.
|
||||
- Module "__init__.py" contains: class "Component", a base class for
|
||||
Docutils components; class "SettingsSpec", a base class for
|
||||
specifying runtime settings (used by docutils.frontend); and class
|
||||
"TransformSpec", a base class for specifying transforms.
|
||||
|
||||
- Module "docutils.core" contains facade class "Publisher" and
|
||||
convenience function "publish()". See `Publisher`_ above.
|
||||
convenience functions. See `Publisher`_ above.
|
||||
|
||||
- Module "docutils.frontend" provides command-line and option
|
||||
processing for Docutils front-end tools.
|
||||
- Module "docutils.frontend" provides runtime settings support, for
|
||||
programmatic use and front-end tools (including configuration file
|
||||
support, and command-line argument and option processing).
|
||||
|
||||
- Module "docutils.io" provides a uniform API for low-level input
|
||||
and output. See `Input/Output`_ above.
|
||||
|
||||
- Module "docutils.nodes" contains the Docutils document tree
|
||||
element class library plus Visitor pattern base classes. See
|
||||
`Document Tree`_ below.
|
||||
element class library plus tree-traversal Visitor pattern base
|
||||
classes. See `Document Tree`_ below.
|
||||
|
||||
- Module "docutils.optik" provides option parsing and command-line
|
||||
help; from Greg Ward's http://optik.sf.net/ project, included for
|
||||
|
@ -340,8 +374,9 @@ Docutils Package Structure
|
|||
routines.
|
||||
|
||||
- Module "docutils.statemachine" contains a finite state machine
|
||||
specialized for regular-expression-based text filters. The
|
||||
reStructuredText parser implementation is based on this module.
|
||||
specialized for regular-expression-based text filters and parsers.
|
||||
The reStructuredText parser implementation is based on this
|
||||
module.
|
||||
|
||||
- Module "docutils.urischemes" contains a mapping of known URI
|
||||
schemes ("http", "ftp", "mail", etc.).
|
||||
|
@ -375,7 +410,7 @@ Docutils Package Structure
|
|||
Proposals).
|
||||
|
||||
- Readers to be added for: Python source code (structure &
|
||||
docstrings), PEPs, email, FAQ, and perhaps Wiki and others.
|
||||
docstrings), email, FAQ, and perhaps Wiki and others.
|
||||
|
||||
See `Readers`_ above.
|
||||
|
||||
|
@ -385,20 +420,26 @@ Docutils Package Structure
|
|||
by name. Class "Writer" is the base class of specific writers.
|
||||
(``docutils/writers/__init__.py``)
|
||||
|
||||
- Module "docutils.writers.pseudoxml" is a simple internal
|
||||
document tree writer; it writes indented pseudo-XML.
|
||||
|
||||
- Module "docutils.writers.html4css1" is a simple HyperText Markup
|
||||
Language document tree writer for HTML 4.01 and CSS1.
|
||||
|
||||
- Module "docutils.writers.docutils_xml" writes the internal
|
||||
document tree in XML form.
|
||||
|
||||
- Module "docutils.writers.pseudoxml" is a simple internal
|
||||
document tree writer; it writes indented pseudo-XML.
|
||||
|
||||
- Writers to be added: HTML 3.2 or 4.01-loose, XML (various forms,
|
||||
such as DocBook and the raw internal doctree), PDF, TeX,
|
||||
plaintext, reStructuredText, and perhaps others.
|
||||
such as DocBook), PDF, TeX, plaintext, reStructuredText, and
|
||||
perhaps others.
|
||||
|
||||
See `Writers`_ above.
|
||||
|
||||
- Package "docutils.transforms": tree transform classes.
|
||||
|
||||
- Class "Transformer" stores transforms and applies them to
|
||||
document trees. (``docutils/transforms/__init__.py``)
|
||||
|
||||
- Class "Transform" is the base class of specific transforms.
|
||||
(``docutils/transforms/__init__.py``)
|
||||
|
||||
|
@ -414,7 +455,8 @@ Docutils Package Structure
|
|||
- Function "get_language(language_code)", returns matching
|
||||
language module. (``docutils/languages/__init__.py``)
|
||||
|
||||
- Module "docutils.languages.en" (English).
|
||||
- Modules: en.py (English), de.py (German), fr.py (French), sk.py
|
||||
(Slovak), sv.py (Swedish).
|
||||
|
||||
- Other languages to be added.
|
||||
|
||||
|
@ -422,7 +464,8 @@ Docutils Package Structure
|
|||
Front-End Tools
|
||||
===============
|
||||
|
||||
See `Docutils Front-End Tools`_.
|
||||
The ``tools/`` directory contains several front ends for common
|
||||
Docutils processing. See `Docutils Front-End Tools`_ for details.
|
||||
|
||||
.. _Docutils Front-End Tools: http://docutils.sf.net/docs/tools.html
|
||||
|
||||
|
@ -432,31 +475,34 @@ Document Tree
|
|||
|
||||
A single intermediate data structure is used internally by Docutils,
|
||||
in the interfaces between components; it is defined in the
|
||||
docutils.nodes module. It is not required that this data structure be
|
||||
used *internally* by any of the components, just *between* components.
|
||||
``docutils.nodes`` module. It is not required that this data
|
||||
structure be used *internally* by any of the components, just
|
||||
*between* components as outlined in the diagram in the `Docutils
|
||||
Project Model`_ above.
|
||||
|
||||
Custom node types are allowed, provided that either (a) a transform
|
||||
converts them to standard Docutils nodes before they reach the Writer
|
||||
proper, or (b) the custom node is explicitly supported by certain
|
||||
Writers, and is wrapped in a filtered "pending" node. An example of
|
||||
condition A is the `Python Source Reader`_ (see below), where a
|
||||
condition (a) is the `Python Source Reader`_ (see below), where a
|
||||
"stylist" transform converts custom nodes. The HTML ``<meta>`` tag is
|
||||
an example of condition B; it is supported by the HTML Writer but not
|
||||
by others. The reStructuredText "meta" directive creates a "pending"
|
||||
node, which contains knowledge that the embedded "meta" node can only
|
||||
be handled by HTML-compatible writers. The "pending" node is resolved
|
||||
by the "transforms.components.Filter" transform, which checks that the
|
||||
calling writer supports HTML; if it doesn't, the "meta" node is
|
||||
removed from the document.
|
||||
an example of condition (b); it is supported by the HTML Writer but
|
||||
not by others. The reStructuredText "meta" directive creates a
|
||||
"pending" node, which contains knowledge that the embedded "meta" node
|
||||
can only be handled by HTML-compatible writers. The "pending" node is
|
||||
resolved by the ``docutils.transforms.components.Filter`` transform,
|
||||
which checks that the calling writer supports HTML; if it doesn't, the
|
||||
"pending" node (and enclosed "meta" node) is removed from the
|
||||
document.
|
||||
|
||||
The document tree data structure is similar to a DOM tree, but with
|
||||
specific node names (classes) instead of DOM's generic nodes. The
|
||||
schema is documented in an XML DTD (eXtensible Markup Language
|
||||
Document Type Definition), which comes in two parts:
|
||||
|
||||
- the Docutils Generic DTD, docutils.dtd_, and
|
||||
* the Docutils Generic DTD, docutils.dtd_, and
|
||||
|
||||
- the OASIS Exchange Table Model, soextbl.dtd_.
|
||||
* the OASIS Exchange Table Model, soextbl.dtd_.
|
||||
|
||||
The DTD defines a rich set of elements, suitable for many input and
|
||||
output formats. The DTD retains all information necessary to
|
||||
|
@ -473,23 +519,23 @@ When the parser encounters an error in markup, it inserts a system
|
|||
message (DTD element "system_message"). There are five levels of
|
||||
system messages:
|
||||
|
||||
- Level-0, "DEBUG": an internal reporting issue. There is no effect
|
||||
* Level-0, "DEBUG": an internal reporting issue. There is no effect
|
||||
on the processing. Level-0 system messages are handled separately
|
||||
from the others.
|
||||
|
||||
- Level-1, "INFO": a minor issue that can be ignored. There is little
|
||||
* Level-1, "INFO": a minor issue that can be ignored. There is little
|
||||
or no effect on the processing. Typically level-1 system messages
|
||||
are not reported.
|
||||
|
||||
- Level-2, "WARNING": an issue that should be addressed. If ignored,
|
||||
* Level-2, "WARNING": an issue that should be addressed. If ignored,
|
||||
there may be minor problems with the output. Typically level-2
|
||||
system messages are reported but do not halt processing
|
||||
|
||||
- Level-3, "ERROR": a major issue that should be addressed. If
|
||||
* Level-3, "ERROR": a major issue that should be addressed. If
|
||||
ignored, the output will contain unpredictable errors. Typically
|
||||
level-3 system messages are reported but do not halt processing
|
||||
|
||||
- Level-4, "SEVERE": a critical error that must be addressed.
|
||||
* Level-4, "SEVERE": a critical error that must be addressed.
|
||||
Typically level-4 system messages are turned into exceptions which
|
||||
halt processing. If ignored, the output will contain severe errors.
|
||||
|
||||
|
@ -517,11 +563,11 @@ Processing Model
|
|||
This model will evolve over time, incorporating experience and
|
||||
discoveries.
|
||||
|
||||
1. The PySource Reader uses an I/O class to read in some Python
|
||||
packages and modules, into a tree of strings.
|
||||
1. The PySource Reader uses an Input class to read in Python packages
|
||||
and modules, into a tree of strings.
|
||||
|
||||
2. The Python modules are parsed, converting the tree of strings into
|
||||
a tree of abstract syntax trees.
|
||||
a tree of abstract syntax trees with docstring nodes.
|
||||
|
||||
3. The abstract syntax trees are converted into an internal
|
||||
representation of the packages/modules. Docstrings are extracted,
|
||||
|
@ -532,7 +578,7 @@ discoveries.
|
|||
Docutils doctrees.
|
||||
|
||||
5. PySource assembles all the individual docstrings' doctrees into a
|
||||
Python-specific custom Docutils tree parallelling the
|
||||
Python-specific custom Docutils tree paralleling the
|
||||
package/module/class structure; this is a custom Reader-specific
|
||||
internal representation (see the `Docutils Python Source DTD`_).
|
||||
Namespaces must be merged: Python identifiers, hyperlink targets.
|
||||
|
@ -541,37 +587,38 @@ discoveries.
|
|||
identifiers are resolved according to the Python namespace lookup
|
||||
rules. See `Identifier Cross-References`_ below.
|
||||
|
||||
7. A "Stylist" transform is applied to the custom doctree, custom
|
||||
nodes are rendered using standard nodes as primitives, and a
|
||||
standard document tree is emitted. See `Stylist Transforms`_
|
||||
below.
|
||||
7. A "Stylist" transform is applied to the custom doctree (by the
|
||||
Transformer_), custom nodes are rendered using standard nodes as
|
||||
primitives, and a standard document tree is emitted. See `Stylist
|
||||
Transforms`_ below.
|
||||
|
||||
8. Other transforms are applied to the standard doctree.
|
||||
8. Other transforms are applied to the standard doctree by the
|
||||
Transformer_.
|
||||
|
||||
9. The standard doctree is sent to a Writer, which translates the
|
||||
document into a concrete format (HTML, PDF, etc.).
|
||||
|
||||
10. The Writer uses an I/O class to write the resulting data to its
|
||||
10. The Writer uses an Output class to write the resulting data to its
|
||||
destination (disk file, directories and files, etc.).
|
||||
|
||||
|
||||
AST Mining
|
||||
----------
|
||||
|
||||
Abstract Syntax Tree mining code will be written that scans a parsed
|
||||
Python module, and returns an ordered tree containing the names,
|
||||
docstrings (including attribute and additional docstrings; see below),
|
||||
and additional info (in parentheses below) of all of the following
|
||||
objects:
|
||||
Abstract Syntax Tree mining code will be written (or adapted) that
|
||||
scans a parsed Python module, and returns an ordered tree containing
|
||||
the names, docstrings (including attribute and additional docstrings;
|
||||
see below), and additional info (in parentheses below) of all of the
|
||||
following objects:
|
||||
|
||||
- packages
|
||||
- modules
|
||||
- module attributes (+ initial values)
|
||||
- classes (+ inheritance)
|
||||
- class attributes (+ initial values)
|
||||
- instance attributes (+ initial values)
|
||||
- methods (+ parameters & defaults)
|
||||
- functions (+ parameters & defaults)
|
||||
* packages
|
||||
* modules
|
||||
* module attributes (+ initial values)
|
||||
* classes (+ inheritance)
|
||||
* class attributes (+ initial values)
|
||||
* instance attributes (+ initial values)
|
||||
* methods (+ parameters & defaults)
|
||||
* functions (+ parameters & defaults)
|
||||
|
||||
(Extract comments too? For example, comments at the start of a module
|
||||
would be a good place for bibliographic field lists.)
|
||||
|
@ -579,7 +626,7 @@ would be a good place for bibliographic field lists.)
|
|||
In order to evaluate interpreted text cross-references, namespaces for
|
||||
each of the above will also be required.
|
||||
|
||||
See python-dev/docstring-develop thread "AST mining", started on
|
||||
See the python-dev/docstring-develop thread "AST mining", started on
|
||||
2001-08-14.
|
||||
|
||||
|
||||
|
@ -592,12 +639,11 @@ Docstring Extraction Rules
|
|||
documented, only identifiers listed in "``__all__``" are
|
||||
examined for docstrings.
|
||||
|
||||
b) In the absense of "``__all__``", all identifiers are examined,
|
||||
b) In the absence of "``__all__``", all identifiers are examined,
|
||||
except those whose names are private (names begin with "_" but
|
||||
don't begin and end with "__").
|
||||
|
||||
c) 1a and 1b can be overridden by a parameter or command-line
|
||||
option.
|
||||
c) 1a and 1b can be overridden by runtime settings.
|
||||
|
||||
2. Where:
|
||||
|
||||
|
@ -616,7 +662,8 @@ Docstring Extraction Rules
|
|||
docstrings in (a) and (b) will be recognized, extracted, and
|
||||
concatenated. See `Additional Docstrings`_ below.
|
||||
|
||||
d) @@@ 2.2-style "properties" with attribute docstrings?
|
||||
d) @@@ 2.2-style "properties" with attribute docstrings? Wait for
|
||||
syntax?
|
||||
|
||||
3. How:
|
||||
|
||||
|
@ -629,7 +676,7 @@ Docstring Extraction Rules
|
|||
examine an imported module, such as comments and the order of
|
||||
definitions.
|
||||
|
||||
- Docstrings are to be recognized in places where the bytecode
|
||||
- Docstrings are to be recognized in places where the byte-code
|
||||
compiler ignores string literal expressions (2b and 2c above),
|
||||
meaning importing the module will lose these docstrings.
|
||||
|
||||
|
@ -642,7 +689,7 @@ Docstring Extraction Rules
|
|||
limitations must be lived with.
|
||||
|
||||
Since attribute docstrings and additional docstrings are ignored by
|
||||
the Python bytecode compiler, no namespace pollution or runtime bloat
|
||||
the Python byte-code compiler, no namespace pollution or runtime bloat
|
||||
will result from their use. They are not assigned to ``__doc__`` or
|
||||
to any other attribute. The initial parsing of a module may take a
|
||||
slight performance hit.
|
||||
|
@ -654,7 +701,7 @@ Attribute Docstrings
|
|||
(This is a simplified version of PEP 224 [#PEP-224]_.)
|
||||
|
||||
A string literal immediately following an assignment statement is
|
||||
interpreted by the docstring extration machinery as the docstring of
|
||||
interpreted by the docstring extraction machinery as the docstring of
|
||||
the target of the assignment statement, under the following
|
||||
conditions:
|
||||
|
||||
|
@ -666,7 +713,7 @@ conditions:
|
|||
b) At the top level of a class definition: a class attribute.
|
||||
|
||||
c) At the top level of the "``__init__``" method definition of a
|
||||
class: an instance attribute.
|
||||
class: an instance attribute. (@@@ ``__new__`` methods?)
|
||||
|
||||
Since each of the above contexts are at the top level (i.e., in the
|
||||
outermost suite of a definition), it may be necessary to place
|
||||
|
@ -685,7 +732,7 @@ conditions:
|
|||
b) For context 1c above, the target must be of the form
|
||||
"``self.attrib``", where "``self``" matches the "``__init__``"
|
||||
method's first parameter (the instance parameter) and "attrib"
|
||||
is a simple indentifier as in 3a.
|
||||
is a simple identifier as in 3a.
|
||||
|
||||
Blank lines may be used after attribute docstrings to emphasize the
|
||||
connection between the assignment and the docstring.
|
||||
|
@ -712,25 +759,25 @@ Additional Docstrings
|
|||
|
||||
Many programmers would like to make extensive use of docstrings for
|
||||
API documentation. However, docstrings do take up space in the
|
||||
running program, so some of these programmers are reluctant to "bloat
|
||||
up" their code. Also, not all API documentation is applicable to
|
||||
interactive environments, where ``__doc__`` would be displayed.
|
||||
running program, so some programmers are reluctant to "bloat up" their
|
||||
code. Also, not all API documentation is applicable to interactive
|
||||
environments, where ``__doc__`` would be displayed.
|
||||
|
||||
The docstring processing system's extraction tools will concatenate
|
||||
all string literal expressions which appear at the beginning of a
|
||||
definition or after a simple assignment. Only the first strings in
|
||||
definitions will be available as ``__doc__``, and can be used for
|
||||
brief usage text suitable for interactive sessions; subsequent string
|
||||
literals and all attribute docstrings are ignored by the Python
|
||||
bytecode compiler and may contain more extensive API information.
|
||||
Docutils' docstring extraction tools will concatenate all string
|
||||
literal expressions which appear at the beginning of a definition or
|
||||
after a simple assignment. Only the first strings in definitions will
|
||||
be available as ``__doc__``, and can be used for brief usage text
|
||||
suitable for interactive sessions; subsequent string literals and all
|
||||
attribute docstrings are ignored by the Python byte-code compiler and
|
||||
may contain more extensive API information.
|
||||
|
||||
Example::
|
||||
|
||||
def function(arg):
|
||||
"""This is __doc__, function's docstring."""
|
||||
"""
|
||||
This is an additional docstring, ignored by the bytecode
|
||||
compiler, but extracted by the Docutils.
|
||||
This is an additional docstring, ignored by the byte-code
|
||||
compiler, but extracted by Docutils.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
@ -753,13 +800,13 @@ Example::
|
|||
1. Should we search for docstrings after a ``__future__``
|
||||
statement? Very ugly.
|
||||
|
||||
2. Redefine ``__future__`` statements to allow multiple preceeding
|
||||
2. Redefine ``__future__`` statements to allow multiple preceding
|
||||
string literals?
|
||||
|
||||
3. Or should we not even worry about this? There probably
|
||||
shouldn't be ``__future__`` statements in production code, after
|
||||
all. Will modules with ``__future__`` statements simply have to
|
||||
put up with the single-docstring limitation?
|
||||
all. Perhaps modules with ``__future__`` statements will simply
|
||||
have to put up with the single-docstring limitation.
|
||||
|
||||
|
||||
Choice of Docstring Format
|
||||
|
@ -776,7 +823,8 @@ format being used, a case-insensitive string matching the input
|
|||
parser's module or package name (i.e., the same name as required to
|
||||
"import" the module or package), or a registered alias. If no
|
||||
``__docformat__`` is specified, the default format is "plaintext" for
|
||||
now; this may be changed to the standard format once determined.
|
||||
now; this may be changed to the standard format if one is ever
|
||||
established.
|
||||
|
||||
The ``__docformat__`` string may contain an optional second field,
|
||||
separated from the format name (first field) by a single space: a
|
||||
|
@ -818,17 +866,17 @@ when necessary. For example (using reStructuredText markup)::
|
|||
"""
|
||||
Extend `Storer.__init__()` to keep track of instances.
|
||||
|
||||
Keep count in `self.instances`, data in `self.data`.
|
||||
Keep count in `Keeper.instances`, data in `self.data`.
|
||||
"""
|
||||
Storer.__init__(self)
|
||||
self.instances += 1
|
||||
Keeper.instances += 1
|
||||
|
||||
self.data = []
|
||||
"""Store data in a list, most recent last."""
|
||||
|
||||
def storedata(self, data):
|
||||
def store_data(self, data):
|
||||
"""
|
||||
Extend `Storer.storedata()`; append new `data` to a
|
||||
Extend `Storer.store_data()`; append new `data` to a
|
||||
list (in `self.data`).
|
||||
"""
|
||||
self.data = data
|
||||
|
@ -840,12 +888,12 @@ references to the definitions of the identifiers themselves.
|
|||
Stylist Transforms
|
||||
------------------
|
||||
|
||||
Stylist transforms are specialized transforms specific to a Reader.
|
||||
The PySource Reader doesn't have to make any decisions as to style; it
|
||||
just produces a logically constructed document tree, parsed and
|
||||
linked, including custom node types. Stylist transforms understand
|
||||
the custom nodes created by the Reader and convert them into standard
|
||||
Docutils nodes.
|
||||
Stylist transforms are specialized transforms specific to the PySource
|
||||
Reader. The PySource Reader doesn't have to make any decisions as to
|
||||
style; it just produces a logically constructed document tree, parsed
|
||||
and linked, including custom node types. Stylist transforms
|
||||
understand the custom nodes created by the Reader and convert them
|
||||
into standard Docutils nodes.
|
||||
|
||||
Multiple Stylist transforms may be implemented and one can be chosen
|
||||
at runtime (through a "--style" or "--stylist" command-line option).
|
||||
|
|
|
@ -25,7 +25,7 @@ what-you-see-is-what-you-get plaintext markup syntax.
|
|||
|
||||
Only the low-level syntax of docstrings is addressed here. This PEP
|
||||
is not concerned with docstring semantics or processing at all (see
|
||||
PEP 256 for a "Roadmap to the Doctring PEPs"). Nor is it an attempt
|
||||
PEP 256 for a "Road Map to the Doctring PEPs"). Nor is it an attempt
|
||||
to deprecate pure plaintext docstrings, which are always going to be
|
||||
legitimate. The reStructuredText markup is an alternative for those
|
||||
who want more expressive docstrings.
|
||||
|
|
Loading…
Reference in New Issue