diff --git a/pep-0258.txt b/pep-0258.txt new file mode 100644 index 000000000..353a67ce0 --- /dev/null +++ b/pep-0258.txt @@ -0,0 +1,355 @@ +PEP: 258 +Title: DPS Generic Implementation Details +Version: $Revision$ +Last-Modified: $Date$ +Author: dgoodger@bigfoot.com (David Goodger) +Discussions-To: doc-sig@python.org +Status: Draft +Type: Standards Track +Created: 31-May-2001 +Post-History: + + +Abstract + + This PEP documents generic implementation details for a Python + Docstring Processing System (DPS). The rationale and high-level + concepts of the DPS are documented in PEP 256, "Docstring + Processing System Framework" [1]. + + +Specification + + Docstring Extraction Rules + ========================== + + 1. If the '__all__' variable is present in the module being + documented, only identifiers listed in '__all__' are examined + for docstrings. In the absense of '__all__', all identifiers + are examined, except those whose names are private (names begin + with '_' but don't begin and end with '__'). + + 2. Docstrings are string literal expressions, and are recognized + in the following places within Python modules: + + a) At the beginning of a module, class definition, or function + definition, after any comments. This is the standard for + Python __doc__ attributes. + + b) Immediately following a simple assignment at the top level + of a module, class definition, or __init__ method + definition, after any comments. See "Attribute Docstrings" + below. + + c) Additional string literals found immediately after the + docstrings in (a) and (b) will be recognized, extracted, and + concatenated. See "Additional Docstrings" below. + + 3. Python modules must be parsed by the docstring processing + system, not imported. There are security reasons for not + importing untrusted code. Also, docstrings are to be + recognized in places where the bytecode compiler ignores string + literal expressions (2b and 2c above), meaning importing the + module will lose these docstrings. Of course, standard Python + parsing tools such as the 'parser' library module should be + used. + + Since attribute docstrings and additional docstrings are not + recognized by the Python bytecode compiler, no namespace pollution + or performance degradation will result from their use. (The + initial parsing of a module may take a slight performance hit.) + + Attribute Docstrings + -------------------- + + XXX A description of attribute docstrings would be appropriate in + PEP 257 "Docstring Conventions". + + (This is a simplified version of PEP 224 [3] by Marc-Andre Lemberg.) + + A string literal immediately following an assignment statement is + interpreted by the docstring extration machinery as the docstring + of the target of the assignment statement, under the following + conditions: + + 1. The assignment must be in one of the following contexts: + + a) At the top level of a module (i.e., not inside a loop or + conditional): a module attribute. + + b) At the top level of a class definition: a class attribute. + + c) At the top level of a class' '__init__' method definition: + an instance attribute. + + Since each of the above contexts are at the top level (i.e., + just inside the outermost suite of a definition), it may be + necessary to place dummy assignments for attributes assigned + conditionally or in a loop. Blank lines may be used after + attribute docstrings to emphasize the connection between the + assignment and the docstring. + + 2. The assignment must be to a single target, not to a list or a + tuple of targets. + + 3. The form of the target: + + a) For contexts 1a and 1b above, the target must be a simple + identifier (not a dotted identifier, a subscripted + expression, or a sliced expression). + + b) For context 1c above, the target must be of the form + 'self.attrib', where 'self' matches the '__init__' method's + first parameter (the instance parameter) and 'attrib' is a + simple indentifier as in 3a. + + Examples:: + + g = 'module attribute (global variable)' + """This is g's docstring.""" + + class AClass: + + c = 'class attribute' + """This is AClass.c's docstring.""" + + def __init__(self): + self.i = 'instance attribute' + """This is self.i's docstring.""" + + Additional Docstrings + --------------------- + + XXX A description of additional docstrings would be appropriate in + the PEP 257, "Docstring Conventions" [4]. + + Many programmers would like to make extensive use of docstrings + for API documentation. However, docstrings do take up space in + the running program, so some of these programmers are reluctant to + 'bloat up' their code. Also, not all API documentation is + applicable to interactive environments, where __doc__ would be + displayed. + + The docstring processing system's extraction tools will + concatenate all string literal expressions which appear at the + beginning of a definition or after a simple assignment. Only the + first strings in definitions will be available as __doc__, and can + be used for brief usage text suitable for interactive sessions; + subsequent string literals and all attribute docstrings are + ignored by the Python bytecode compiler and may contain more + extensive API information. + + Example:: + + def function(arg): + """This is __doc__, function's docstring.""" + """ + This is an additional docstring, ignored by the bytecode + compiler, but extracted by the docstring processing system. + """ + pass + + Issue: This breaks 'from __future__ import' statements in Python + 2.1 for multiple module docstrings. Resolution? + + 1. Should we search for docstrings after a __future__ statement? + Very ugly. + + 2. Redefine __future__ statements to allow multiple preceeding + string literals? + + 3. Or should we not even worry about this? There shouldn't be + __future__ statements in production code, after all. Modules + with __future__ statements will have to put up with the + single-docstring limitation. + + Choice of Docstring Format + ========================== + + Rather than force everyone to use a single docstring format, + multiple input formats are allowed by the processing system. A + special variable, __docformat__, may appear at the top level of a + module before any function or class definitions. Over time or + through decree, a standard format or set of formats should emerge. + + The __docformat__ variable is a string containing the name of the + format being used, a case-insensitive string matching the input + parser's module or package name (i.e., the same name as required + to 'import' the module or package), or a registered alias. If no + __docformat__ is specified, the default format is 'plaintext' for + now; this may be changed to the standard format once determined. + + The __docformat__ string may contain an optional second field, + separated from the format name (first field) by a single space: a + case-insensitive language identifier as defined in RFC 1766 [5]. + A typical language identifier consists of a 2-letter language code + from ISO 639 [6] (3-letter codes used only if no 2-letter code + exists; RFC 1766 is currently being revised to allow 3-letter + codes). If no language identifier is specified, the default is + 'en' for English. The language identifier is passed to the parser + and can be used for language-dependent markup features. + + DPS Structure + ============= + + - package 'dps' + + - function 'dps.main()' (in 'dps/__init__.py') + + - package 'dps.parsers' + + - module 'dps.parsers.model'; see 'Input Parser API' below. + + - package 'dps.formatters' + + - module 'dps.formatters.model'; see 'Output Formatter API' below. + + - package 'dps.languages' + + - module 'dps.languages.en' (English) + + - others to be added + + - utility modules: 'dps.statemachine' + + Command-Line Interface + ====================== + + XXX To be determined. + + System Python API + ================= + + XXX To be determined. + + Input Parser API + ================ + + Each input parser is a module or package exporting a 'Parser' class, + with the following interface: + + class Parser: + + def __init__(self, inputstring, errors='warn', language='en'): + """Initialize the Parser instance.""" + + def parse(self): + """Return a DOM tree, the parsed input string.""" + + XXX This needs a lot of work. What is required for this API? + + A model 'Parser' class implementing the full interface along with + utility functions can be found in the 'dps.parsers.model' module. + + Output Formatter API + ==================== + + Each output formatter is a module or package exporting a + 'Formatter' class, with the following interface: + + class Formatter: + + def __init__(self, domtree, language='en', showwarnings=0): + """Initialize the Formatter instance.""" + + def format(self): + """ + Return a formatted string representation of the DOM tree. + """ + + XXX This also needs a lot of work. What is required for this API? + + A model 'Formatter' class implementing the full interface along + with utility functions can be found in the 'dps.formatters.model' + module. + + Language Module API + =================== + + Language modules will contain language-dependent strings and + mappings. They will be named for their language identifier (as + defined in 'Choice of Docstring Format' above), converting dashes + to underscores. + + XXX Specifics to be determined. + + Intermediate Data Structure + =========================== + + A single intermediate data structure is used internally by the + docstring processing system. This data structure is a DOM tree + whose schema is documented in an XML DTD (eXtensible Markup + Language Document Type Definition), which comes in three parts: + + - the Python Plaintext Document Interface DTD, ppdi.dtd [7], + + - the Generic Plaintext Document Interface DTD, gpdi.dtd [8], + + - and the OASIS Exchange Table Model, soextbl.dtd [9]. + + The DTD defines a rich set of elements, suitable for any input + syntax or output format. The input parser and the output + formatter share the same intermediate data structure. The + processing system may do transformations on the data from the + input parser before passing it on to the output formatter. The + DTD retains all information necessary to reconstruct the original + input text, or a reasonable facsimile thereof. + + XXX Specifics (about the DOM tree) to be determined. + + Output Management + ================= + + XXX To be determined. + + Type of output: filesystem only, or in-memory data structure too? + File/directory naming & structure conventions. In-memory data + structure should follow filesystem naming; file/directory == + leaf/node. Use a directory hierarchy rather than long file names + (long file names were one of the reasons pythondoc couldn't run on + MacOS). + + +References and Footnotes + + [1] http://python.sf.net/peps/pep-0256.html + + [2] http://www.python.org/sigs/doc-sig/ + + [3] http://python.sf.net/peps/pep-0224.html + + [4] http://python.sf.net/peps/pep-0257.html + + [5] http://www.rfc-editor.org/rfc/rfc1766.txt + + [6] http://lcweb.loc.gov/standards/iso639-2/englangn.html + + [7] http://docstring.sf.net/spec/ppdi.dtd + + [8] http://docstring.sf.net/spec/ppdi.dtd + + [9] http://docstring.sf.net/spec/soextblx.dtd + + +Project Web Site + + A SourceForge project has been set up for this work at + http://docstring.sf.net. + + +Copyright + + This document has been placed in the public domain. + + +Acknowledgements + + This document borrows ideas from the archives of the Python Doc-SIG + [2]. Thanks to all members past & present. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +End: