python-peps/peps/pep-0727.rst

777 lines
27 KiB
ReStructuredText
Raw Permalink Normal View History

PEP: 727
Title: Documentation in Annotated Metadata
Author: Sebastián Ramírez <tiangolo@gmail.com>
Sponsor: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Discussions-To: https://discuss.python.org/t/32566
Status: Draft
Type: Standards Track
Topic: Typing
Content-Type: text/x-rst
Created: 28-Aug-2023
Python-Version: 3.13
Post-History: `30-Aug-2023 <https://discuss.python.org/t/32566>`__
Abstract
========
This PEP proposes a standardized way to provide documentation strings for Python
symbols defined with :py:class:`~typing.Annotated` using a new class
``typing.Doc``.
Motivation
==========
There's already a well-defined way to provide documentation for classes,
functions, class methods, and modules: using docstrings.
Currently there is no formalized standard to provide documentation strings for other
types of symbols: parameters, return values, class-scoped variables (class variables
and instance variables), local variables, and type aliases.
Nevertheless, to allow documenting most of these additional symbols, several
conventions have been created as microsyntaxes inside of docstrings, and are
currently commonly used: Sphinx, numpydoc, Google, Keras, etc.
There are two scenarios in which these conventions would be supported by tools: for
authors, while **editing** the contents of the documentation strings and for users,
while **rendering** that content in some way (in documentation sites, tooltips
in editors, etc).
Because each of these conventions uses a microsyntax inside a string, when
**editing** those docstrings, editors can't easily provide support for autocompletion,
inline errors for broken syntax, etc. Any type of **editing** support for these
conventions would be on top of the support for editing standard Python syntax.
When documenting parameters with current conventions, because the docstring is in
a different place in the code than the actual parameters and it requires
duplication of information (the parameter name) the information about
a parameter is easily in a place in the code quite far away from the
declaration of the actual parameter and it is disconnected from it.
This means it's easy to refactor a function, remove a parameter, and forget to
remove its docs. The same happens when adding a new parameter: it's easy to forget
to add the docstring for it.
And because of this same duplication of information (the parameter name) editors and
other tools need complex custom logic to check or ensure the consistency of the
parameters in the signature and in their docstring, or they simply don't
fully support that.
As these existing conventions are different types of microsyntaxes inside of
strings, robustly parsing them for **rendering** requires complex logic that
needs to be implemented by the tools supporting them. Additionally, libraries
and tools don't have a straightforward way to obtain the documentation for
each individual parameter or variable at runtime, without depending on a
specific docstring convention parser. Accessing the parameter documentation
strings at runtime would be useful, for example, for testing the contents
of each parameter's documentation, to ensure consistency across several
similar functions, or to extract and expose that same parameter
documentation in some other way (e.g. an API with FastAPI, a CLI with Typer, etc).
Some of these previous formats tried to account for the lack of type annotations
in older Python versions by including typing information in the docstrings (e.g.
`Sphinx <https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#info-field-lists>`__,
`numpydoc <https://numpydoc.readthedocs.io/en/latest/format.html#parameters>`__)
but now that information doesn't need to be in docstrings as there is now an official
:pep:`syntax for type annotations <484>`.
Rationale
=========
This proposal intends to address these shortcomings by extending and complementing the
information in docstrings, keeping backwards compatibility with existing docstrings
(it doesn't deprecate them), and doing it in a way that leverages the Python
language and structure, via type annotations with :py:class:`~typing.Annotated`, and
a new class ``Doc`` in ``typing``.
The reason why this would belong in the standard Python library instead of an
external package is because although the implementation would be quite trivial,
the actual power and benefit from it would come from being a standard, to facilitate
its usage from library authors and to provide a default way to document Python
symbols using :py:class:`~typing.Annotated`. Some tool providers (at least VS Code
and PyCharm) have shown they would consider implementing support for this only if
it was a standard.
This doesn't deprecate current usage of docstrings, docstrings should be considered
the preferred documentation method when available (not available in type aliases,
parameters, etc).
And docstrings would be complemented by this proposal for documentation specific to
the symbols that can be declared with :py:class:`~typing.Annotated`
(currently only covered by the several available microsyntax conventions).
This should be relatively transparent to common developers (library users) unless
they manually open the source files from libraries adopting it.
It should be considered opt-in for library authors who would like to adopt it and
they should be free to decide to use it or not.
It would be only useful for libraries that are willing to use optional type hints.
Summary
-------
Here's a short summary of the features of this proposal in contrast to current
conventions:
* **Editing** would be already fully supported by default by any editor (current
or future) supporting Python syntax, including syntax errors, syntax
highlighting, etc.
* **Rendering** would be relatively straightforward to implement by static tools
(tools that don't need runtime execution), as the information can be extracted
from the AST they normally already create.
* Deduplication of information: the name of a parameter would be defined in a single
place, not duplicated inside of a docstring.
* Elimination of the possibility of having inconsistencies when removing a parameter
or class variable and forgetting to remove its documentation.
* Minimization of the probability of adding a new parameter or class variable
and forgetting to add its documentation.
* Elimination of the possibility of having inconsistencies between the name of a
parameter in the signature and the name in the docstring when it is renamed.
* Access to the documentation string for each symbol at runtime, including existing
(older) Python versions.
* A more formalized way to document other symbols, like type aliases, that could
use :py:class:`~typing.Annotated`.
* No microsyntax to learn for newcomers, it's just Python syntax.
* Parameter documentation inheritance for functions captured
by :py:class:`~typing.ParamSpec`.
Specification
=============
The main proposal is to introduce a new class, ``typing.Doc``.
This class should only be used within :py:class:`~typing.Annotated` annotations.
It takes a single positional-only string argument. It should be used to
document the intended meaning and use of the symbol declared using
:py:class:`~typing.Annotated`.
For example:
.. code:: python
from typing import Annotated, Doc
class User:
name: Annotated[str, Doc("The user's name")]
age: Annotated[int, Doc("The user's age")]
...
:py:class:`~typing.Annotated` is normally used as a type annotation, in those cases,
any ``typing.Doc`` inside of it would document the symbol being annotated.
When :py:class:`~typing.Annotated` is used to declare a type alias, ``typing.Doc``
would then document the type alias symbol.
For example:
.. code:: python
from typing import Annotated, Doc, TypeAlias
from external_library import UserResolver
CurrentUser: TypeAlias = Annotated[str, Doc("The current system user"), UserResolver()]
def create_user(name: Annotated[str, Doc("The user's name")]): ...
def delete_user(name: Annotated[str, Doc("The user to delete")]): ...
In this case, if a user imported ``CurrentUser``, tools like editors could provide
a tooltip with the documentation string when a user hovers over that symbol, or
documentation tools could include the type alias with its documentation in their
generated output.
For tools extracting the information at runtime, they would normally use
:py:func:`~typing.get_type_hints` with the parameter ``include_extras=True``,
and as :py:class:`~typing.Annotated` is normalized (even with type aliases), this
would mean they should use the last ``typing.Doc`` available, if more than one is
used, as that is the last one used.
At runtime, ``typing.Doc`` instances have an attribute ``documentation`` with the
string passed to it.
When a function's signature is captured by a :py:class:`~typing.ParamSpec`,
any documentation strings associated with the parameters should be retained.
Any tool processing ``typing.Doc`` objects should interpret the string as
a docstring, and therefore should normalize whitespace
as if ``inspect.cleandoc()`` were used.
The string passed to ``typing.Doc`` should be of the form that would be a
valid docstring.
This means that `f-strings`__ and string operations should not be used.
As this cannot be enforced by the Python runtime,
tools should not rely on this behavior.
When tools providing **rendering** show the raw signature, they could allow
configuring if the whole raw :py:class:`~typing.Annotated` code should be displayed,
but they should default to not include :py:class:`~typing.Annotated` and its
internal code metadata, only the type of the symbols annotated. When those tools
support ``typing.Doc`` and rendering in other ways than just a raw signature,
they should show the string value passed to ``typing.Doc`` in a convenient way that
shows the relation between the documented symbol and the documentation string.
Tools providing **rendering** could allow ways to configure where to show the
parameter documentation and the prose docstring in different ways. Otherwise, they
could simply show the prose docstring first and then the parameter documentation second.
__ https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals
Examples
--------
Class attributes may be documented:
.. code:: python
from typing import Annotated, Doc
class User:
name: Annotated[str, Doc("The user's name")]
age: Annotated[int, Doc("The user's age")]
...
As can function or method parameters and return values:
.. code:: python
from typing import Annotated, Doc
def create_user(
name: Annotated[str, Doc("The user's name")],
age: Annotated[int, Doc("The user's age")],
cursor: DatabaseConnection | None = None,
) -> Annotated[User, Doc("The created user after saving in the database")]:
"""Create a new user in the system.
It needs the database connection to be already initialized.
"""
pass
Backwards Compatibility
=======================
This proposal is fully backwards compatible with existing code and it doesn't
deprecate existing usage of docstring conventions.
For developers that wish to adopt it before it is available in the standard library,
or to support older versions of Python, they can use ``typing_extensions`` and
import and use ``Doc`` from there.
For example:
.. code:: python
from typing import Annotated
from typing_extensions import Doc
class User:
name: Annotated[str, Doc("The user's name")]
age: Annotated[int, Doc("The user's age")]
...
Security Implications
=====================
There are no known security implications.
How to Teach This
=================
The main mechanism of documentation should continue to be standard docstrings for
prose information, this applies to modules, classes, functions and methods.
For authors that want to adopt this proposal to add more granularity, they can use
``typing.Doc`` inside of :py:class:`~typing.Annotated` annotations for the symbols
that support it.
Library authors that wish to adopt this proposal while keeping backwards compatibility
with older versions of Python should use ``typing_extensions.Doc`` instead of
``typing.Doc``.
Reference Implementation
========================
``typing.Doc`` is implemented equivalently to:
.. code:: python
class Doc:
def __init__(self, documentation: str, /):
self.documentation = documentation
It has been implemented in the `typing_extensions`__ package.
__ https://pypi.org/project/typing-extensions/
Survey of Other languages
=========================
Here's a short survey of how other languages document their symbols.
Java
----
Java functions and their parameters are documented with
`Javadoc <https://www.oracle.com/technical-resources/articles/java/javadoc-tool.html>`__,
a special format for comments put on top of the function definition. This would be
similar to Python current docstring microsyntax conventions (but only one).
For example:
.. code:: java
/**
* Returns an Image object that can then be painted on the screen.
* The url argument must specify an absolute <a href="#{@link}">{@link URL}</a>. The name
* argument is a specifier that is relative to the url argument.
* <p>
* This method always returns immediately, whether or not the
* image exists. When this applet attempts to draw the image on
* the screen, the data will be loaded. The graphics primitives
* that draw the image will incrementally paint on the screen.
*
* @param url an absolute URL giving the base location of the image
* @param name the location of the image, relative to the url argument
* @return the image at the specified URL
* @see Image
*/
public Image getImage(URL url, String name) {
try {
return getImage(new URL(url, name));
} catch (MalformedURLException e) {
return null;
}
}
JavaScript
----------
Both JavaScript and TypeScript use a similar system to Javadoc.
JavaScript uses `JSDoc <https://jsdoc.app/>`__.
For example:
.. code:: javascript
/**
* Represents a book.
* @constructor
* @param {string} title - The title of the book.
* @param {string} author - The author of the book.
*/
function Book(title, author) {
}
TypeScript
----------
TypeScript has
`its own JSDoc reference <https://www.typescriptlang.org/docs/handbook/jsdoc-supported-types.html>`__
with some variations.
For example:
.. code:: typescript
// Parameters may be declared in a variety of syntactic forms
/**
* @param {string} p1 - A string param.
* @param {string=} p2 - An optional param (Google Closure syntax)
* @param {string} [p3] - Another optional param (JSDoc syntax).
* @param {string} [p4="test"] - An optional param with a default value
* @returns {string} This is the result
*/
function stringsStringStrings(p1, p2, p3, p4) {
// TODO
}
Rust
----
Rust uses another similar variation of a microsyntax in
`Doc comments <https://doc.rust-lang.org/rust-by-example/meta/doc.html#doc-comments>`__.
But it doesn't have a particular well defined microsyntax structure to denote what
documentation refers to what symbol/parameter other than what can be inferred from
the pure Markdown.
For example:
.. code:: rust
#![crate_name = "doc"]
/// A human being is represented here
pub struct Person {
/// A person must have a name, no matter how much Juliet may hate it
name: String,
}
impl Person {
/// Returns a person with the name given them
///
/// # Arguments
///
/// * `name` - A string slice that holds the name of the person
///
/// # Examples
///
/// ```
/// // You can have rust code between fences inside the comments
/// // If you pass --test to `rustdoc`, it will even test it for you!
/// use doc::Person;
/// let person = Person::new("name");
/// ```
pub fn new(name: &str) -> Person {
Person {
name: name.to_string(),
}
}
/// Gives a friendly hello!
///
/// Says "Hello, [name](Person::name)" to the `Person` it is called on.
pub fn hello(& self) {
println!("Hello, {}!", self.name);
}
}
fn main() {
let john = Person::new("John");
john.hello();
}
Go Lang
-------
Go also uses a form of `Doc Comments <https://go.dev/doc/comment>`__.
It doesn't have a well defined microsyntax structure to denote what documentation
refers to which symbol/parameter, but parameters can be referenced by name without
any special syntax or marker, this also means that ordinary words that could appear
in the documentation text should be avoided as parameter names.
.. code:: go
package strconv
// Quote returns a double-quoted Go string literal representing s.
// The returned string uses Go escape sequences (\t, \n, \xFF, \u0100)
// for control characters and non-printable characters as defined by IsPrint.
func Quote(s string) string {
...
}
Rejected Ideas
==============
Standardize Current Docstrings
------------------------------
A possible alternative would be to support and try to push as a standard one of the
existing docstring formats. But that would only solve the standardization.
It wouldn't solve any of the other problems derived from using a microsyntax inside
of a docstring instead of pure Python syntax, the same as described above in
the **Rationale - Summary**.
Extra Metadata and Decorator
----------------------------
Some ideas before this proposal included having a function ``doc()`` instead of
the single class ``Doc`` with several parameters to indicate whether
an object is discouraged from use, what exceptions it may raise, etc.
To allow also deprecating functions and classes, it was also expected
that ``doc()`` could be used as a decorator. But this functionality is covered
by ``typing.deprecated()`` in :pep:`702`, so it was dropped from this proposal.
A way to declare additional information could still be useful in the future,
but taking early feedback on this idea, all that was postponed to future
proposals.
This also shifted the focus from an all-encompassing function ``doc()``
with multiple parameters to a single ``Doc`` class to be used in
:py:class:`~typing.Annotated` in a way that could be composed with other
future proposals.
This design change also allows better interoperability with other proposals
like ``typing.deprecated()``, as in the future it could be considered to
allow having ``typing.deprecated()`` also in :py:class:`~typing.Annotated` to deprecate
individual parameters, coexisting with ``Doc``.
String Under Definition
-----------------------
A proposed alternative in the discussion is declaring a string under the definition
of a symbol and providing runtime access to those values:
.. code:: python
class User:
name: str
"The user's name"
age: int
"The user's age"
...
This was already proposed and rejected in :pep:`224`, mainly due to the ambiguity of
how is the string connected with the symbol it's documenting.
Additionally, there would be no way to provide runtime access to this value in previous
versions of Python.
Plain String in Annotated
-------------------------
In the discussion, it was also suggested to use a plain string inside of
:py:class:`~typing.Annotated`:
.. code:: python
from typing import Annotated
class User:
name: Annotated[str, "The user's name"]
age: Annotated[int, "The user's age"]
...
But this would create a predefined meaning for any plain string inside of
:py:class:`~typing.Annotated`, and any tool that was using plain strings in them
for any other purpose, which is currently allowed, would now be invalid.
Having an explicit ``typing.Doc`` makes it compatible with current valid uses of
:py:class:`~typing.Annotated`.
Another Annotated-Like Type
---------------------------
In the discussion it was suggested to define a new type similar to
:py:class:`~typing.Annotated`, it would take the type and a parameter with the
documentation string:
.. code:: python
from typing import Doc
class User:
name: Doc[str, "The user's name"]
age: Doc[int, "The user's age"]
...
This idea was rejected as it would only support that use case and would make it more
difficult to combine it with :py:class:`~typing.Annotated` for other purposes (
e.g. with FastAPI metadata, Pydantic fields, etc.) or adding additional metadata
apart from the documentation string (e.g. deprecation).
Transferring Documentation from Type aliases
--------------------------------------------
A previous version of this proposal specified that when type aliases declared with
:py:class:`~typing.Annotated` were used, and these type aliases were used in
annotations, the documentation string would be transferred to the annotated symbol.
For example:
.. code:: python
from typing import Annotated, Doc, TypeAlias
UserName: TypeAlias = Annotated[str, Doc("The user's name")]
def create_user(name: UserName): ...
def delete_user(name: UserName): ...
This was rejected after receiving feedback from the maintainer of one of the main
components used to provide editor support.
Shorthand with Slices
---------------------
In the discussion, it was suggested to use a shorthand with slices:
.. code:: python
is_approved: Annotated[str: "The status of a PEP."]
Although this is a very clever idea and would remove the need for a new ``Doc`` class,
runtime executing of current versions of Python don't allow it.
At runtime, :py:class:`~typing.Annotated` requires at least two arguments, and it
requires the first argument to be type, it crashes if it is a slice.
Open Issues
===========
Verbosity
---------
The main argument against this would be the increased verbosity.
If the signature was not viewed independently of the documentation and the body of the
function with the docstring was also measured, the total verbosity would be
somewhat similar, as what this proposal does is to move some of the contents
from the docstring in the body to the signature.
Considering the signature alone, without the body, they could be much longer than
they currently are, they could end up being more than one page long. In exchange,
the equivalent docstrings that currently are more than one page long would be
much shorter.
When comparing the total verbosity, including the signature and the docstring,
the main additional verbosity added by this would be from using
:py:class:`~typing.Annotated` and ``typing.Doc``. If :py:class:`~typing.Annotated`
had more usage, it could make sense to have an improved shorter syntax for it and for
the type of metadata it would carry. But that would only make sense once
:py:class:`~typing.Annotated` is more widely used.
On the other hand, this verbosity would not affect end users as they would not see the
internal code using ``typing.Doc``. The majority of users would interact with
libraries through editors without looking at the internals, and if anything, they
would have tooltips from editors supporting this proposal.
The cost of dealing with the additional verbosity would mainly be carried
by those library maintainers that use this feature.
This argument could be analogous to the argument against type annotations
in general, as they do indeed increase verbosity, in exchange for their
features. But again, as with type annotations, this would be optional and only
to be used by those that are willing to take the extra verbosity in exchange
for the benefits.
Of course, more advanced users might want to look at the source code of the libraries
and if the authors of those libraries adopted this, those advanced users would end up
having to look at that code with additional signature verbosity instead of docstring
verbosity.
Any authors that decide not to adopt it should be free to continue using docstrings
with any particular format they decide, no docstrings at all, etc.
Still, there's a high chance that library authors could receive pressure to
adopt this if it became the blessed solution.
Documentation is not Typing
---------------------------
It could also be argued that documentation is not really part of typing, or that
it should live in a different module. Or that this information should not be part
of the signature but live in another place (like the docstring).
Nevertheless, type annotations in Python could already be considered, by default,
additional metadata: they carry additional information about variables,
parameters, return types, and by default they don't have any runtime behavior. And
this proposal would add one more type of metadata to them.
It could be argued that this proposal extends the type of information that
type annotations carry, the same way as :pep:`702` extends them to include
deprecation information.
:py:class:`~typing.Annotated` was added to the standard library precisely to
support adding additional metadata to the annotations, and as the new proposed
``Doc`` class is tightly coupled to :py:class:`~typing.Annotated`, it makes
sense for it to live in the same module. If :py:class:`~typing.Annotated` was moved
to another module, it would make sense to move ``Doc`` with it.
Multiple Standards
------------------
Another argument against this would be that it would create another standard,
and that there are already several conventions for docstrings. It could
seem better to formalize one of the currently existing standards.
Nevertheless, as stated above, none of those conventions cover the general
drawbacks of a doctsring-based approach that this proposal solves naturally.
To see a list of the drawbacks of a docstring-based approach, see the section above
in the **Rationale - Summary**.
In the same way, it can be seen that, in many cases, a new standard that
takes advantage of new features and solves several problems from previous
methods can be worth having. As is the case with the new ``pyproject.toml``,
``dataclass_transform``, the new typing pipe/union (``|``) operator, and other cases.
Adoption
--------
As this is a new standard proposal, it would only make sense if it had
interest from the community.
Fortunately there's already interest from several mainstream libraries
from several developers and teams, including FastAPI, Typer, SQLModel,
Asyncer (from the author of this proposal), Pydantic, Strawberry (GraphQL), and
others.
There's also interest and support from documentation tools, like
`mkdocstrings <https://github.com/mkdocstrings/mkdocstrings>`__, which added
support even for an earlier version of this proposal.
All the CPython core developers contacted for early feedback (at least 4) have
shown interest and support for this proposal.
Editor developers (VS Code and PyCharm) have shown some interest, while showing
concerns about the signature verbosity of the proposal, although not about the
implementation (which is what would affect them the most). And they have shown
they would consider adding support for this if it were to become an
official standard. In that case, they would only need to add support for
rendering, as support for editing, which is normally non-existing for
other standards, is already there, as they already support editing standard
Python syntax.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.