python-peps/pep-0258.txt

356 lines
12 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 258
Title: DPS Generic Implementation Details
Version: $Revision$
Last-Modified: $Date$
Author: dgoodger@bigfoot.com (David Goodger)
Discussions-To: doc-sig@python.org
Status: Draft
Type: Standards Track
Created: 31-May-2001
Post-History:
Abstract
This PEP documents generic implementation details for a Python
Docstring Processing System (DPS). The rationale and high-level
concepts of the DPS are documented in PEP 256, "Docstring
Processing System Framework" [1].
Specification
Docstring Extraction Rules
==========================
1. If the '__all__' variable is present in the module being
documented, only identifiers listed in '__all__' are examined
for docstrings. In the absense of '__all__', all identifiers
are examined, except those whose names are private (names begin
with '_' but don't begin and end with '__').
2. Docstrings are string literal expressions, and are recognized
in the following places within Python modules:
a) At the beginning of a module, class definition, or function
definition, after any comments. This is the standard for
Python __doc__ attributes.
b) Immediately following a simple assignment at the top level
of a module, class definition, or __init__ method
definition, after any comments. See "Attribute Docstrings"
below.
c) Additional string literals found immediately after the
docstrings in (a) and (b) will be recognized, extracted, and
concatenated. See "Additional Docstrings" below.
3. Python modules must be parsed by the docstring processing
system, not imported. There are security reasons for not
importing untrusted code. Also, docstrings are to be
recognized in places where the bytecode compiler ignores string
literal expressions (2b and 2c above), meaning importing the
module will lose these docstrings. Of course, standard Python
parsing tools such as the 'parser' library module should be
used.
Since attribute docstrings and additional docstrings are not
recognized by the Python bytecode compiler, no namespace pollution
or performance degradation will result from their use. (The
initial parsing of a module may take a slight performance hit.)
Attribute Docstrings
--------------------
XXX A description of attribute docstrings would be appropriate in
PEP 257 "Docstring Conventions".
(This is a simplified version of PEP 224 [3] by Marc-Andre Lemberg.)
A string literal immediately following an assignment statement is
interpreted by the docstring extration machinery as the docstring
of the target of the assignment statement, under the following
conditions:
1. The assignment must be in one of the following contexts:
a) At the top level of a module (i.e., not inside a loop or
conditional): a module attribute.
b) At the top level of a class definition: a class attribute.
c) At the top level of a class' '__init__' method definition:
an instance attribute.
Since each of the above contexts are at the top level (i.e.,
just inside the outermost suite of a definition), it may be
necessary to place dummy assignments for attributes assigned
conditionally or in a loop. Blank lines may be used after
attribute docstrings to emphasize the connection between the
assignment and the docstring.
2. The assignment must be to a single target, not to a list or a
tuple of targets.
3. The form of the target:
a) For contexts 1a and 1b above, the target must be a simple
identifier (not a dotted identifier, a subscripted
expression, or a sliced expression).
b) For context 1c above, the target must be of the form
'self.attrib', where 'self' matches the '__init__' method's
first parameter (the instance parameter) and 'attrib' is a
simple indentifier as in 3a.
Examples::
g = 'module attribute (global variable)'
"""This is g's docstring."""
class AClass:
c = 'class attribute'
"""This is AClass.c's docstring."""
def __init__(self):
self.i = 'instance attribute'
"""This is self.i's docstring."""
Additional Docstrings
---------------------
XXX A description of additional docstrings would be appropriate in
the PEP 257, "Docstring Conventions" [4].
Many programmers would like to make extensive use of docstrings
for API documentation. However, docstrings do take up space in
the running program, so some of these programmers are reluctant to
'bloat up' their code. Also, not all API documentation is
applicable to interactive environments, where __doc__ would be
displayed.
The docstring processing system's extraction tools will
concatenate all string literal expressions which appear at the
beginning of a definition or after a simple assignment. Only the
first strings in definitions will be available as __doc__, and can
be used for brief usage text suitable for interactive sessions;
subsequent string literals and all attribute docstrings are
ignored by the Python bytecode compiler and may contain more
extensive API information.
Example::
def function(arg):
"""This is __doc__, function's docstring."""
"""
This is an additional docstring, ignored by the bytecode
compiler, but extracted by the docstring processing system.
"""
pass
Issue: This breaks 'from __future__ import' statements in Python
2.1 for multiple module docstrings. Resolution?
1. Should we search for docstrings after a __future__ statement?
Very ugly.
2. Redefine __future__ statements to allow multiple preceeding
string literals?
3. Or should we not even worry about this? There shouldn't be
__future__ statements in production code, after all. Modules
with __future__ statements will have to put up with the
single-docstring limitation.
Choice of Docstring Format
==========================
Rather than force everyone to use a single docstring format,
multiple input formats are allowed by the processing system. A
special variable, __docformat__, may appear at the top level of a
module before any function or class definitions. Over time or
through decree, a standard format or set of formats should emerge.
The __docformat__ variable is a string containing the name of the
format being used, a case-insensitive string matching the input
parser's module or package name (i.e., the same name as required
to 'import' the module or package), or a registered alias. If no
__docformat__ is specified, the default format is 'plaintext' for
now; this may be changed to the standard format once determined.
The __docformat__ string may contain an optional second field,
separated from the format name (first field) by a single space: a
case-insensitive language identifier as defined in RFC 1766 [5].
A typical language identifier consists of a 2-letter language code
from ISO 639 [6] (3-letter codes used only if no 2-letter code
exists; RFC 1766 is currently being revised to allow 3-letter
codes). If no language identifier is specified, the default is
'en' for English. The language identifier is passed to the parser
and can be used for language-dependent markup features.
DPS Structure
=============
- package 'dps'
- function 'dps.main()' (in 'dps/__init__.py')
- package 'dps.parsers'
- module 'dps.parsers.model'; see 'Input Parser API' below.
- package 'dps.formatters'
- module 'dps.formatters.model'; see 'Output Formatter API' below.
- package 'dps.languages'
- module 'dps.languages.en' (English)
- others to be added
- utility modules: 'dps.statemachine'
Command-Line Interface
======================
XXX To be determined.
System Python API
=================
XXX To be determined.
Input Parser API
================
Each input parser is a module or package exporting a 'Parser' class,
with the following interface:
class Parser:
def __init__(self, inputstring, errors='warn', language='en'):
"""Initialize the Parser instance."""
def parse(self):
"""Return a DOM tree, the parsed input string."""
XXX This needs a lot of work. What is required for this API?
A model 'Parser' class implementing the full interface along with
utility functions can be found in the 'dps.parsers.model' module.
Output Formatter API
====================
Each output formatter is a module or package exporting a
'Formatter' class, with the following interface:
class Formatter:
def __init__(self, domtree, language='en', showwarnings=0):
"""Initialize the Formatter instance."""
def format(self):
"""
Return a formatted string representation of the DOM tree.
"""
XXX This also needs a lot of work. What is required for this API?
A model 'Formatter' class implementing the full interface along
with utility functions can be found in the 'dps.formatters.model'
module.
Language Module API
===================
Language modules will contain language-dependent strings and
mappings. They will be named for their language identifier (as
defined in 'Choice of Docstring Format' above), converting dashes
to underscores.
XXX Specifics to be determined.
Intermediate Data Structure
===========================
A single intermediate data structure is used internally by the
docstring processing system. This data structure is a DOM tree
whose schema is documented in an XML DTD (eXtensible Markup
Language Document Type Definition), which comes in three parts:
- the Python Plaintext Document Interface DTD, ppdi.dtd [7],
- the Generic Plaintext Document Interface DTD, gpdi.dtd [8],
- and the OASIS Exchange Table Model, soextbl.dtd [9].
The DTD defines a rich set of elements, suitable for any input
syntax or output format. The input parser and the output
formatter share the same intermediate data structure. The
processing system may do transformations on the data from the
input parser before passing it on to the output formatter. The
DTD retains all information necessary to reconstruct the original
input text, or a reasonable facsimile thereof.
XXX Specifics (about the DOM tree) to be determined.
Output Management
=================
XXX To be determined.
Type of output: filesystem only, or in-memory data structure too?
File/directory naming & structure conventions. In-memory data
structure should follow filesystem naming; file/directory ==
leaf/node. Use a directory hierarchy rather than long file names
(long file names were one of the reasons pythondoc couldn't run on
MacOS).
References and Footnotes
[1] http://python.sf.net/peps/pep-0256.html
[2] http://www.python.org/sigs/doc-sig/
[3] http://python.sf.net/peps/pep-0224.html
[4] http://python.sf.net/peps/pep-0257.html
[5] http://www.rfc-editor.org/rfc/rfc1766.txt
[6] http://lcweb.loc.gov/standards/iso639-2/englangn.html
[7] http://docstring.sf.net/spec/ppdi.dtd
[8] http://docstring.sf.net/spec/ppdi.dtd
[9] http://docstring.sf.net/spec/soextblx.dtd
Project Web Site
A SourceForge project has been set up for this work at
http://docstring.sf.net.
Copyright
This document has been placed in the public domain.
Acknowledgements
This document borrows ideas from the archives of the Python Doc-SIG
[2]. Thanks to all members past & present.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: