2001-06-06 01:54:49 -04:00
|
|
|
|
PEP: 258
|
|
|
|
|
Title: DPS Generic Implementation Details
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: dgoodger@bigfoot.com (David Goodger)
|
|
|
|
|
Discussions-To: doc-sig@python.org
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 31-May-2001
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP documents generic implementation details for a Python
|
|
|
|
|
Docstring Processing System (DPS). The rationale and high-level
|
|
|
|
|
concepts of the DPS are documented in PEP 256, "Docstring
|
|
|
|
|
Processing System Framework" [1].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
|
|
|
|
|
Docstring Extraction Rules
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
1. If the '__all__' variable is present in the module being
|
|
|
|
|
documented, only identifiers listed in '__all__' are examined
|
|
|
|
|
for docstrings. In the absense of '__all__', all identifiers
|
|
|
|
|
are examined, except those whose names are private (names begin
|
|
|
|
|
with '_' but don't begin and end with '__').
|
|
|
|
|
|
|
|
|
|
2. Docstrings are string literal expressions, and are recognized
|
|
|
|
|
in the following places within Python modules:
|
|
|
|
|
|
|
|
|
|
a) At the beginning of a module, class definition, or function
|
|
|
|
|
definition, after any comments. This is the standard for
|
|
|
|
|
Python __doc__ attributes.
|
|
|
|
|
|
|
|
|
|
b) Immediately following a simple assignment at the top level
|
|
|
|
|
of a module, class definition, or __init__ method
|
|
|
|
|
definition, after any comments. See "Attribute Docstrings"
|
|
|
|
|
below.
|
|
|
|
|
|
|
|
|
|
c) Additional string literals found immediately after the
|
|
|
|
|
docstrings in (a) and (b) will be recognized, extracted, and
|
|
|
|
|
concatenated. See "Additional Docstrings" below.
|
|
|
|
|
|
|
|
|
|
3. Python modules must be parsed by the docstring processing
|
|
|
|
|
system, not imported. There are security reasons for not
|
|
|
|
|
importing untrusted code. Also, docstrings are to be
|
|
|
|
|
recognized in places where the bytecode compiler ignores string
|
|
|
|
|
literal expressions (2b and 2c above), meaning importing the
|
|
|
|
|
module will lose these docstrings. Of course, standard Python
|
|
|
|
|
parsing tools such as the 'parser' library module should be
|
|
|
|
|
used.
|
|
|
|
|
|
|
|
|
|
Since attribute docstrings and additional docstrings are not
|
|
|
|
|
recognized by the Python bytecode compiler, no namespace pollution
|
|
|
|
|
or performance degradation will result from their use. (The
|
|
|
|
|
initial parsing of a module may take a slight performance hit.)
|
|
|
|
|
|
|
|
|
|
Attribute Docstrings
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
XXX A description of attribute docstrings would be appropriate in
|
|
|
|
|
PEP 257 "Docstring Conventions".
|
|
|
|
|
|
|
|
|
|
(This is a simplified version of PEP 224 [3] by Marc-Andre Lemberg.)
|
|
|
|
|
|
|
|
|
|
A string literal immediately following an assignment statement is
|
|
|
|
|
interpreted by the docstring extration machinery as the docstring
|
|
|
|
|
of the target of the assignment statement, under the following
|
|
|
|
|
conditions:
|
|
|
|
|
|
|
|
|
|
1. The assignment must be in one of the following contexts:
|
|
|
|
|
|
|
|
|
|
a) At the top level of a module (i.e., not inside a loop or
|
|
|
|
|
conditional): a module attribute.
|
|
|
|
|
|
|
|
|
|
b) At the top level of a class definition: a class attribute.
|
|
|
|
|
|
|
|
|
|
c) At the top level of a class' '__init__' method definition:
|
|
|
|
|
an instance attribute.
|
|
|
|
|
|
|
|
|
|
Since each of the above contexts are at the top level (i.e.,
|
|
|
|
|
just inside the outermost suite of a definition), it may be
|
|
|
|
|
necessary to place dummy assignments for attributes assigned
|
|
|
|
|
conditionally or in a loop. Blank lines may be used after
|
|
|
|
|
attribute docstrings to emphasize the connection between the
|
|
|
|
|
assignment and the docstring.
|
|
|
|
|
|
|
|
|
|
2. The assignment must be to a single target, not to a list or a
|
|
|
|
|
tuple of targets.
|
|
|
|
|
|
|
|
|
|
3. The form of the target:
|
|
|
|
|
|
|
|
|
|
a) For contexts 1a and 1b above, the target must be a simple
|
|
|
|
|
identifier (not a dotted identifier, a subscripted
|
|
|
|
|
expression, or a sliced expression).
|
|
|
|
|
|
|
|
|
|
b) For context 1c above, the target must be of the form
|
|
|
|
|
'self.attrib', where 'self' matches the '__init__' method's
|
|
|
|
|
first parameter (the instance parameter) and 'attrib' is a
|
|
|
|
|
simple indentifier as in 3a.
|
|
|
|
|
|
|
|
|
|
Examples::
|
|
|
|
|
|
|
|
|
|
g = 'module attribute (global variable)'
|
|
|
|
|
"""This is g's docstring."""
|
|
|
|
|
|
|
|
|
|
class AClass:
|
|
|
|
|
|
|
|
|
|
c = 'class attribute'
|
|
|
|
|
"""This is AClass.c's docstring."""
|
|
|
|
|
|
|
|
|
|
def __init__(self):
|
|
|
|
|
self.i = 'instance attribute'
|
|
|
|
|
"""This is self.i's docstring."""
|
|
|
|
|
|
|
|
|
|
Additional Docstrings
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
XXX A description of additional docstrings would be appropriate in
|
|
|
|
|
the PEP 257, "Docstring Conventions" [4].
|
|
|
|
|
|
|
|
|
|
Many programmers would like to make extensive use of docstrings
|
|
|
|
|
for API documentation. However, docstrings do take up space in
|
|
|
|
|
the running program, so some of these programmers are reluctant to
|
|
|
|
|
'bloat up' their code. Also, not all API documentation is
|
|
|
|
|
applicable to interactive environments, where __doc__ would be
|
|
|
|
|
displayed.
|
|
|
|
|
|
|
|
|
|
The docstring processing system's extraction tools will
|
|
|
|
|
concatenate all string literal expressions which appear at the
|
|
|
|
|
beginning of a definition or after a simple assignment. Only the
|
|
|
|
|
first strings in definitions will be available as __doc__, and can
|
|
|
|
|
be used for brief usage text suitable for interactive sessions;
|
|
|
|
|
subsequent string literals and all attribute docstrings are
|
|
|
|
|
ignored by the Python bytecode compiler and may contain more
|
|
|
|
|
extensive API information.
|
|
|
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
|
|
def function(arg):
|
|
|
|
|
"""This is __doc__, function's docstring."""
|
|
|
|
|
"""
|
|
|
|
|
This is an additional docstring, ignored by the bytecode
|
|
|
|
|
compiler, but extracted by the docstring processing system.
|
|
|
|
|
"""
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
Issue: This breaks 'from __future__ import' statements in Python
|
|
|
|
|
2.1 for multiple module docstrings. Resolution?
|
|
|
|
|
|
|
|
|
|
1. Should we search for docstrings after a __future__ statement?
|
|
|
|
|
Very ugly.
|
|
|
|
|
|
|
|
|
|
2. Redefine __future__ statements to allow multiple preceeding
|
|
|
|
|
string literals?
|
|
|
|
|
|
|
|
|
|
3. Or should we not even worry about this? There shouldn't be
|
|
|
|
|
__future__ statements in production code, after all. Modules
|
|
|
|
|
with __future__ statements will have to put up with the
|
|
|
|
|
single-docstring limitation.
|
|
|
|
|
|
|
|
|
|
Choice of Docstring Format
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Rather than force everyone to use a single docstring format,
|
|
|
|
|
multiple input formats are allowed by the processing system. A
|
|
|
|
|
special variable, __docformat__, may appear at the top level of a
|
|
|
|
|
module before any function or class definitions. Over time or
|
|
|
|
|
through decree, a standard format or set of formats should emerge.
|
|
|
|
|
|
|
|
|
|
The __docformat__ variable is a string containing the name of the
|
|
|
|
|
format being used, a case-insensitive string matching the input
|
|
|
|
|
parser's module or package name (i.e., the same name as required
|
|
|
|
|
to 'import' the module or package), or a registered alias. If no
|
|
|
|
|
__docformat__ is specified, the default format is 'plaintext' for
|
|
|
|
|
now; this may be changed to the standard format once determined.
|
|
|
|
|
|
|
|
|
|
The __docformat__ string may contain an optional second field,
|
|
|
|
|
separated from the format name (first field) by a single space: a
|
|
|
|
|
case-insensitive language identifier as defined in RFC 1766 [5].
|
|
|
|
|
A typical language identifier consists of a 2-letter language code
|
|
|
|
|
from ISO 639 [6] (3-letter codes used only if no 2-letter code
|
|
|
|
|
exists; RFC 1766 is currently being revised to allow 3-letter
|
|
|
|
|
codes). If no language identifier is specified, the default is
|
|
|
|
|
'en' for English. The language identifier is passed to the parser
|
|
|
|
|
and can be used for language-dependent markup features.
|
|
|
|
|
|
|
|
|
|
DPS Structure
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
- package 'dps'
|
|
|
|
|
|
|
|
|
|
- function 'dps.main()' (in 'dps/__init__.py')
|
|
|
|
|
|
|
|
|
|
- package 'dps.parsers'
|
|
|
|
|
|
|
|
|
|
- module 'dps.parsers.model'; see 'Input Parser API' below.
|
|
|
|
|
|
|
|
|
|
- package 'dps.formatters'
|
|
|
|
|
|
|
|
|
|
- module 'dps.formatters.model'; see 'Output Formatter API' below.
|
|
|
|
|
|
|
|
|
|
- package 'dps.languages'
|
|
|
|
|
|
|
|
|
|
- module 'dps.languages.en' (English)
|
|
|
|
|
|
|
|
|
|
- others to be added
|
|
|
|
|
|
|
|
|
|
- utility modules: 'dps.statemachine'
|
|
|
|
|
|
|
|
|
|
Command-Line Interface
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
XXX To be determined.
|
|
|
|
|
|
|
|
|
|
System Python API
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
XXX To be determined.
|
|
|
|
|
|
|
|
|
|
Input Parser API
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
Each input parser is a module or package exporting a 'Parser' class,
|
|
|
|
|
with the following interface:
|
|
|
|
|
|
|
|
|
|
class Parser:
|
|
|
|
|
|
|
|
|
|
def __init__(self, inputstring, errors='warn', language='en'):
|
|
|
|
|
"""Initialize the Parser instance."""
|
|
|
|
|
|
|
|
|
|
def parse(self):
|
|
|
|
|
"""Return a DOM tree, the parsed input string."""
|
|
|
|
|
|
|
|
|
|
XXX This needs a lot of work. What is required for this API?
|
|
|
|
|
|
|
|
|
|
A model 'Parser' class implementing the full interface along with
|
|
|
|
|
utility functions can be found in the 'dps.parsers.model' module.
|
|
|
|
|
|
|
|
|
|
Output Formatter API
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
Each output formatter is a module or package exporting a
|
|
|
|
|
'Formatter' class, with the following interface:
|
|
|
|
|
|
|
|
|
|
class Formatter:
|
|
|
|
|
|
|
|
|
|
def __init__(self, domtree, language='en', showwarnings=0):
|
|
|
|
|
"""Initialize the Formatter instance."""
|
|
|
|
|
|
|
|
|
|
def format(self):
|
|
|
|
|
"""
|
|
|
|
|
Return a formatted string representation of the DOM tree.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
XXX This also needs a lot of work. What is required for this API?
|
|
|
|
|
|
|
|
|
|
A model 'Formatter' class implementing the full interface along
|
|
|
|
|
with utility functions can be found in the 'dps.formatters.model'
|
|
|
|
|
module.
|
|
|
|
|
|
|
|
|
|
Language Module API
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
Language modules will contain language-dependent strings and
|
|
|
|
|
mappings. They will be named for their language identifier (as
|
|
|
|
|
defined in 'Choice of Docstring Format' above), converting dashes
|
|
|
|
|
to underscores.
|
|
|
|
|
|
|
|
|
|
XXX Specifics to be determined.
|
|
|
|
|
|
|
|
|
|
Intermediate Data Structure
|
|
|
|
|
===========================
|
|
|
|
|
|
|
|
|
|
A single intermediate data structure is used internally by the
|
|
|
|
|
docstring processing system. This data structure is a DOM tree
|
|
|
|
|
whose schema is documented in an XML DTD (eXtensible Markup
|
|
|
|
|
Language Document Type Definition), which comes in three parts:
|
|
|
|
|
|
|
|
|
|
- the Python Plaintext Document Interface DTD, ppdi.dtd [7],
|
|
|
|
|
|
|
|
|
|
- the Generic Plaintext Document Interface DTD, gpdi.dtd [8],
|
|
|
|
|
|
|
|
|
|
- and the OASIS Exchange Table Model, soextbl.dtd [9].
|
|
|
|
|
|
|
|
|
|
The DTD defines a rich set of elements, suitable for any input
|
|
|
|
|
syntax or output format. The input parser and the output
|
|
|
|
|
formatter share the same intermediate data structure. The
|
|
|
|
|
processing system may do transformations on the data from the
|
|
|
|
|
input parser before passing it on to the output formatter. The
|
|
|
|
|
DTD retains all information necessary to reconstruct the original
|
|
|
|
|
input text, or a reasonable facsimile thereof.
|
|
|
|
|
|
|
|
|
|
XXX Specifics (about the DOM tree) to be determined.
|
|
|
|
|
|
|
|
|
|
Output Management
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
XXX To be determined.
|
|
|
|
|
|
|
|
|
|
Type of output: filesystem only, or in-memory data structure too?
|
|
|
|
|
File/directory naming & structure conventions. In-memory data
|
|
|
|
|
structure should follow filesystem naming; file/directory ==
|
|
|
|
|
leaf/node. Use a directory hierarchy rather than long file names
|
|
|
|
|
(long file names were one of the reasons pythondoc couldn't run on
|
|
|
|
|
MacOS).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References and Footnotes
|
|
|
|
|
|
2001-07-05 15:20:16 -04:00
|
|
|
|
[1] PEP 256, Docstring Processing System Framework, Goodger
|
|
|
|
|
http://www.python.org/peps/pep-0256.html
|
2001-06-06 01:54:49 -04:00
|
|
|
|
|
|
|
|
|
[2] http://www.python.org/sigs/doc-sig/
|
|
|
|
|
|
2001-07-05 15:20:16 -04:00
|
|
|
|
[3] PEP 224, Attribute Docstrings, Lemburg
|
|
|
|
|
http://www.python.org/peps/pep-0224.html
|
2001-06-06 01:54:49 -04:00
|
|
|
|
|
2001-07-05 15:20:16 -04:00
|
|
|
|
[4] PEP 257, Docstring Conventions, Goodger, Van Rossum
|
|
|
|
|
http://www.python.org/peps/pep-0257.html
|
2001-06-06 01:54:49 -04:00
|
|
|
|
|
|
|
|
|
[5] http://www.rfc-editor.org/rfc/rfc1766.txt
|
|
|
|
|
|
|
|
|
|
[6] http://lcweb.loc.gov/standards/iso639-2/englangn.html
|
|
|
|
|
|
|
|
|
|
[7] http://docstring.sf.net/spec/ppdi.dtd
|
|
|
|
|
|
|
|
|
|
[8] http://docstring.sf.net/spec/ppdi.dtd
|
|
|
|
|
|
|
|
|
|
[9] http://docstring.sf.net/spec/soextblx.dtd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project Web Site
|
|
|
|
|
|
|
|
|
|
A SourceForge project has been set up for this work at
|
|
|
|
|
http://docstring.sf.net.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Acknowledgements
|
|
|
|
|
|
|
|
|
|
This document borrows ideas from the archives of the Python Doc-SIG
|
|
|
|
|
[2]. Thanks to all members past & present.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|