PEP 727: Documentation Metadata in Typing (#3310)
This commit is contained in:
parent
e41aba6009
commit
c8e245dedc
|
@ -602,6 +602,7 @@ pep-0721.rst @encukou
|
|||
pep-0722.rst @pfmoore
|
||||
pep-0723.rst @AA-Turner
|
||||
pep-0725.rst @pradyunsg
|
||||
pep-0727.rst @JelleZijlstra
|
||||
# ...
|
||||
# pep-0754.txt
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,516 @@
|
|||
PEP: 727
|
||||
Title: Documentation Metadata in Typing
|
||||
Author: Sebastián Ramírez <tiangolo@gmail.com>
|
||||
Sponsor: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
||||
Discussions-To:
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Topic: Typing
|
||||
Content-Type: text/x-rst
|
||||
Created: 28-Aug-2023
|
||||
Python-Version: 3.13
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This document proposes a way to complement docstrings to add additional documentation
|
||||
to Python symbols using type annotations with ``Annotated`` (in class attributes,
|
||||
function and method parameters, return values, and variables).
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
The current standard method of documenting code APIs in Python is using docstrings.
|
||||
But there's no standard way to document parameters in docstrings.
|
||||
|
||||
There are several pseudo-standards for the format in these docstrings, and new
|
||||
pseudo-standards can appear easily: numpy, Google, Keras, reST, etc.
|
||||
|
||||
All these formats are some specific syntax inside a string. Because of this, when
|
||||
editing those docstrings, editors can't easily provide support for autocompletion,
|
||||
inline errors for broken syntax, etc.
|
||||
|
||||
Editors don't have a way to support all the possible micro languages in the docstrings
|
||||
and show nice user interfaces when developers use libraries with those different
|
||||
formats. They could even add support for some of the syntaxes, but probably not all,
|
||||
or not completely.
|
||||
|
||||
Because the docstring is in a different place in the code than the actual parameters
|
||||
and it requires duplication of information (the parameter name) the information about
|
||||
a parameter is easily in a place in the code quite far away from the declaration of
|
||||
the actual parameter. This means it's easy to refactor a function, remove a parameter,
|
||||
and forget to remove its docs. The same happens when adding a new parameter: it's easy
|
||||
to forget to add the docstring for it.
|
||||
|
||||
Editors can't check or ensure the consistency of the parameters and their docs.
|
||||
|
||||
It's not possible to programatically test these docstrings, for example to ensure
|
||||
documentation consistency for the parameters across several similar functions, at
|
||||
least not without implementing a custom syntax parser.
|
||||
|
||||
Some of these previous formats tried to account for the lack of type annotations
|
||||
in older Python versions by including typing information in the docstrings,
|
||||
but now that information doesn't need to be in docstrings as there is now an official
|
||||
syntax for type annotations.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
This proposal intends to address these shortcomings by extending and complementing the
|
||||
information in docstrings, keeping backwards compatibility with existing docstrings,
|
||||
and doing it in a way that leverages the Python language and structure, via type
|
||||
annotations with ``Annotated``, and a new function in ``typing``.
|
||||
|
||||
The reason why this would belong in the standard Python library instead of an
|
||||
external package is because although the implementation would be quite trivial,
|
||||
the actual power and benefit from it would come from being a standard, so that
|
||||
editors and other tools could implement support for it.
|
||||
|
||||
This doesn't deprecate current usage of docstrings, it's transparent to common
|
||||
developers (library users), and it's only opt-in for library authors that would
|
||||
like to adopt it.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
|
||||
``typing.doc``
|
||||
--------------
|
||||
|
||||
The main proposal is to have a new function ``doc()`` in the ``typing`` module.
|
||||
Even though this is not strictly related to the type annotations, it's expected
|
||||
to go in ``Annotated`` type annotations, and to interact with type annotations.
|
||||
|
||||
There's also the particular benefit that it could be implemented in the
|
||||
``typing_extensions`` package to have support for older versions of Python and
|
||||
early adopters of this proposal.
|
||||
|
||||
This ``doc()`` function would receive one single parameter ``documentation`` with
|
||||
a documentation string.
|
||||
|
||||
This string could be a multi-line string, in which case, when extracted by tools,
|
||||
should be interpreted cleaning up indentation as if using ``inspect.cleandoc()``,
|
||||
the same procedure used for docstrings.
|
||||
|
||||
This string could probably contain markup, like Markdown or reST. As that could
|
||||
be highly debated, that decision is left for a future proposal, to focus here
|
||||
on the main functionality.
|
||||
|
||||
This specification targets static analysis tools and editors, and as such, the
|
||||
value passed to ``doc()`` should allow static evaluation and analysis. If a
|
||||
developer passes as the value something that requires runtime execution
|
||||
(e.g. a function call) the behavior of static analysis tools is unspecified
|
||||
and they could omit it from their process and results. For static analysis
|
||||
tools to be conformant with this specification they need only to support
|
||||
statically accessible values.
|
||||
|
||||
An example documenting the attributes of a class, or in this case, the keys
|
||||
of a ``TypedDict``, could look like this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from typing import Annotated, TypedDict, NotRequired, doc
|
||||
|
||||
|
||||
class User(TypedDict):
|
||||
firstname: Annotated[str, doc("The user's first name")]
|
||||
lastname: Annotated[str, doc("The user's last name")]
|
||||
|
||||
|
||||
An example documenting the parameters of a function could look like this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from typing import Annotated, doc
|
||||
|
||||
|
||||
def create_user(
|
||||
lastname: Annotated[str, doc("The **last name** of the newly created user")],
|
||||
firstname: Annotated[str | None, doc("The user's **first name**")] = None,
|
||||
) -> Annotated[User, doc("The created user after saving in the database")]:
|
||||
"""
|
||||
Create a new user in the system, it needs the database connection to be already
|
||||
initialized.
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
The return of the ``doc()`` function is an instance of a class that can be checked
|
||||
and used at runtime, defined similar to:
|
||||
|
||||
.. code-block::
|
||||
|
||||
class DocInfo:
|
||||
def __init__(self, documentation: str):
|
||||
self.documentation = documentation
|
||||
|
||||
...where the attribute ``documentation`` contains the same value string passed to
|
||||
the function ``doc()``.
|
||||
|
||||
|
||||
Additional Scenarios
|
||||
--------------------
|
||||
|
||||
The main scenarios that this proposal intends to cover are described above, and
|
||||
for implementers to be conformant to this specification, they only need to support
|
||||
those scenarios described above.
|
||||
|
||||
Here are some additional edge case scenarios with their respective considerations,
|
||||
but implementers are not required to support them.
|
||||
|
||||
|
||||
Type Alias
|
||||
----------
|
||||
|
||||
When creating a type alias, like:
|
||||
|
||||
.. code-block::
|
||||
|
||||
Username = Annotated[str, doc("The name of a user in the system")]
|
||||
|
||||
|
||||
...the documentation would be considered to be carried by the parameter annotated
|
||||
with ``Username``.
|
||||
|
||||
So, in a function like:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: Username,
|
||||
) -> None: ...
|
||||
|
||||
|
||||
...it would be equivalent to:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: Annotated[str, doc("The name of a user in the system")],
|
||||
) -> None: ...
|
||||
|
||||
Nevertheless, implementers would not be required to support type aliases outside
|
||||
of the final type annotation to be conformant with this specification, as it
|
||||
could require more complex dereferencing logic.
|
||||
|
||||
|
||||
Annotating Type Parameters
|
||||
--------------------------
|
||||
|
||||
When annotating type parameters, as in:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: list[Annotated[str, doc("The name of a user in a list")]],
|
||||
) -> None: ...
|
||||
|
||||
...the documentation in ``doc()`` would refer to what it is annotating, in this
|
||||
case, each item in the list, not the list itself.
|
||||
|
||||
There are currently no practical use cases for documenting type parameters,
|
||||
so implementers are not required to support this scenario to be considered
|
||||
conformant, but it's included for completeness.
|
||||
|
||||
|
||||
Annotating Unions
|
||||
-----------------
|
||||
|
||||
If used in one of the parameters of a union, as in:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: str | Annotated[list[str], doc("List of user names")],
|
||||
) -> None: ...
|
||||
|
||||
...again, the documentation in ``doc()`` would refer to what it is annotating,
|
||||
in this case, this documents the list itself, not its items.
|
||||
|
||||
In particular, the documentation would not refer to a single string passed as a
|
||||
parameter, only to a list.
|
||||
|
||||
There are currently no practical use cases for documenting unions, so implementers
|
||||
are not required to support this scenario to be considered conformant, but it's
|
||||
included for completeness.
|
||||
|
||||
|
||||
Nested ``Annotated``
|
||||
--------------------
|
||||
|
||||
Continuing with the same idea above, if ``Annotated`` was used nested and used
|
||||
multiple times in the same parameter, ``doc()`` would refer to the type it
|
||||
is annotating.
|
||||
|
||||
So, in an example like:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: Annotated[
|
||||
Annotated[str, doc("A user name")] | Annotated[list, doc("A list of user names")],
|
||||
doc("Who to say hi to"),
|
||||
],
|
||||
) -> None: ...
|
||||
|
||||
|
||||
The documentation for the whole parameter ``to`` would be considered to be
|
||||
"``Who to say hi to``".
|
||||
|
||||
The documentation for the case where that parameter ``to`` is specifically a ``str``
|
||||
would be considered to be "``A user name``".
|
||||
|
||||
The documentation for the case where that parameter ``to`` is specifically a
|
||||
``list`` would be considered to be "``A list of user names``".
|
||||
|
||||
Implementers would only be required to support the top level use case, where the
|
||||
documentation for ``to`` is considered to be "``Who to say hi to``".
|
||||
They could optionally support having conditional documentation for when the type
|
||||
of the parameter passed is of one type or another, but they are not required to do so.
|
||||
|
||||
|
||||
Duplication
|
||||
-----------
|
||||
|
||||
If ``doc()`` is used multiple times in a single ``Annotated``, it would be
|
||||
considered invalid usage from the developer, for example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: Annotated[str, doc("A user name"), doc("The current user name")],
|
||||
) -> None: ...
|
||||
|
||||
|
||||
Implementers can consider this invalid and are not required to support this to be
|
||||
considered conformant.
|
||||
|
||||
Nevertheless, as it might be difficult to enforce it on developers, implementers
|
||||
can opt to support one of the ``doc()`` declarations.
|
||||
|
||||
In that case, the suggestion would be to support the last one, just because
|
||||
this would support overriding, for example, in:
|
||||
|
||||
.. code-block::
|
||||
|
||||
User = Annotated[str, doc("A user name")]
|
||||
|
||||
CurrentUser = Annotated[User, doc("The current user name")]
|
||||
|
||||
|
||||
Internally, in Python, ``CurrentUser`` here is equivalent to:
|
||||
|
||||
.. code-block::
|
||||
|
||||
CurrentUser = Annotated[str, doc("A user name"), doc("The current user name")]
|
||||
|
||||
|
||||
For an implementation that supports the last ``doc()`` appearance, the above
|
||||
example would be equivalent to:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def hi(
|
||||
to: Annotated[str, doc("The current user name")],
|
||||
) -> None: ...
|
||||
|
||||
|
||||
Early Adopters and Older Python Versions
|
||||
========================================
|
||||
|
||||
For older versions of Python and early adopters of this proposal, ``doc()`` and
|
||||
``DocInfo`` can be imported from the ``typing_extensions`` package.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from typing import Annotated
|
||||
|
||||
from typing_extensions import doc
|
||||
|
||||
|
||||
def hi(
|
||||
to: Annotated[str, doc("The current user name")],
|
||||
) -> None: ...
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
|
||||
Standardize Current Docstrings
|
||||
------------------------------
|
||||
|
||||
A possible alternative would be to support and try to push as a standard one of the
|
||||
existing docstring formats. But that would only solve the standardization.
|
||||
|
||||
It wouldn't solve any of the other problems, like getting editor support
|
||||
(syntax checks) for library authors, the distance and duplication of information
|
||||
between a parameter definition and its documentation in the docstring, etc.
|
||||
|
||||
|
||||
Extra Metadata and Decorator
|
||||
----------------------------
|
||||
|
||||
An earlier version of this proposal included several parameters to indicate whether
|
||||
an object is discouraged from use, what exceptions it may raise, etc.
|
||||
To allow also deprecating functions and classes, it was also expected
|
||||
that ``doc()`` could be used as a decorator. But this functionality is covered
|
||||
by ``typing.deprecated()`` in :pep:`702`, so it was dropped from this proposal.
|
||||
|
||||
A way to declare additional information could still be useful in the future,
|
||||
but taking early feedback on this document, all that was postponed to future
|
||||
proposals.
|
||||
|
||||
This also shifts the focus from an all-encompasing function ``doc()``
|
||||
with multiple parameters to multiple composable functions, having ``doc()``
|
||||
handle one single use case: additional documentation in ``Annotated``.
|
||||
|
||||
This design change also allows better interoperability with other proposals
|
||||
like ``typing.deprecated()``, as in the future it could be considered to
|
||||
allow having ``typing.deprecated()`` also in ``Annotated`` to deprecate
|
||||
individual parameters, coexisting with ``doc()``.
|
||||
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
|
||||
Verbosity
|
||||
---------
|
||||
|
||||
The main argument against this would be the increased verbosity.
|
||||
|
||||
Nevertheless, this verbosity would not affect end users as they would not see the
|
||||
internal code using ``typing.doc()``.
|
||||
|
||||
And the cost of dealing with the additional verbosity would only be carried
|
||||
by those library maintainers that decide to opt-in into this feature.
|
||||
|
||||
Any authors that decide not to adopt it, are free to continue using docstrings
|
||||
with any particular format they decide, no docstrings at all, etc.
|
||||
|
||||
This argument could be analogous to the argument against type annotations
|
||||
in general, as they do indeed increase verbosity, in exchange for their
|
||||
features. But again, as with type annotations, this would be optional and only
|
||||
to be used by those that are willing to take the extra verbosity in exchange
|
||||
for the benefits.
|
||||
|
||||
|
||||
Doc is not Typing
|
||||
-----------------
|
||||
|
||||
It could also be argued that documentation is not really part of typing, or that
|
||||
it should live in a different module. Or that this information should not be part
|
||||
of the signature but live in another place (like the docstring).
|
||||
|
||||
Nevertheless, type annotations in Python could already be considered, by default,
|
||||
mainly documentation: they carry additional information about variables,
|
||||
parameters, return types, and by default they don't have any runtime behavior.
|
||||
|
||||
It could be argued that this proposal extends the type of information that
|
||||
type annotations carry, the same way as :pep:`702` extends them to include
|
||||
deprecation information.
|
||||
|
||||
And as described above, including this in ``typing_extensions`` to support older
|
||||
versions of Python would have a very simple and practical benefit.
|
||||
|
||||
|
||||
Multiple Standards
|
||||
------------------
|
||||
|
||||
Another argument against this would be that it would create another standard,
|
||||
and that there are already several pseudo-standards for docstrings. It could
|
||||
seem better to formalize one of the currently existing standards.
|
||||
|
||||
Nevertheless, as stated above, none of those standards cover the general
|
||||
drawbacks of a doctsring-based approach that this proposal solves naturally.
|
||||
|
||||
None of the editors have full docstring editing support (even when they have
|
||||
rendering support). Again, this is solved by this proposal just by using
|
||||
standard Python syntax and structures instead of a docstring microsyntax.
|
||||
|
||||
The effort required to implement support for this proposal by tools would
|
||||
be minimal compared to that required for alternative docstring-based
|
||||
pseudo-standards, as for this proposal, editors would only need to
|
||||
access an already existing value in their ASTs, instead of writing a parser
|
||||
for a new string microsyntax.
|
||||
|
||||
In the same way, it can be seen that, in many cases, a new standard that
|
||||
takes advantage of new features and solves several problems from previous
|
||||
methods can be worth having. As is the case with the new ``pyproject.toml``,
|
||||
``dataclass_transform``, the new typing pipe/union (``|``) operator, and other cases.
|
||||
|
||||
|
||||
Adoption
|
||||
--------
|
||||
|
||||
As this is a new standard proposal, it would only make sense if it had
|
||||
interest from the community.
|
||||
|
||||
Fortunately there's already interest from several mainstream libraries
|
||||
from several developers and teams, including FastAPI, Typer, SQLModel,
|
||||
Asyncer (from the author of this proposal), Pydantic, Strawberry, and others,
|
||||
from other teams.
|
||||
|
||||
There's also interest and support from documentation tools, like
|
||||
`mkdocstrings <https://github.com/mkdocstrings/mkdocstrings>`__, which added
|
||||
support even for an earlier version of this proposal.
|
||||
|
||||
All the CPython core developers contacted for early feedback (at least 4) have
|
||||
shown interest and support for this proposal.
|
||||
|
||||
Editor developers (VS Code and PyCharm) have shown some interest, while showing
|
||||
concerns about the verbosity of the proposal, although not about the
|
||||
implementation (which is what would affect them the most). And they have shown
|
||||
they would consider adding support for this if it were to become an
|
||||
official standard. In that case, they would only need to add support for
|
||||
rendering, as support for editing, which is normally non-existing for
|
||||
other standards, is already there, as they already support editing standard
|
||||
Python syntax.
|
||||
|
||||
|
||||
Bike Shedding
|
||||
-------------
|
||||
|
||||
I think ``doc()`` is a good name for the main function. But it might make sense
|
||||
to consider changing the names for the other parts.
|
||||
|
||||
The returned class containing info currently named ``DocInfo`` could instead
|
||||
be named just ``Doc``. Although it could make verbal conversations more
|
||||
confusing as it's the same word as the name of the function.
|
||||
|
||||
The parameter received by ``doc()`` currently named ``documentation`` could
|
||||
instead be named also ``doc``, but it would make it more ambiguous in
|
||||
discussions to distinguish when talking about the function and the parameter,
|
||||
although it would simplify the amount of terms, but as these terms refer to
|
||||
different things closely related, it could make sense to have different names.
|
||||
|
||||
The parameter received by ``doc()`` currently named ``documentation`` could
|
||||
instead be named ``value``, but the word "documentation" might convey
|
||||
the meaning better.
|
||||
|
||||
The parameter received by ``doc()`` currently named ``documentation`` could be a
|
||||
position-only parameter, in which case the name wouldn't matter much. But then
|
||||
there wouldn't be a way to make it match with the ``DocInfo`` attribute.
|
||||
|
||||
The ``DocInfo`` class has a single attribute ``documentation``, this name matches
|
||||
the parameter passed to ``doc()``. It could be named something different,
|
||||
like ``doc``, but this would mean a mismatch between the ``doc()`` parameter
|
||||
``documentation`` and the equivalent attribute ``doc``, and it would mean that in
|
||||
one case (in the function), the term ``doc`` refers to a function, and in the
|
||||
other case (the resulting class) the term ``doc`` refers to a string value.
|
||||
|
||||
This shows the logic to select the current terms, but it could all be
|
||||
discussed further.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue