From c8e245dedc0275dd2d58a34b837d346f24ecffa9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Ram=C3=ADrez?= Date: Mon, 28 Aug 2023 21:58:03 +0200 Subject: [PATCH] PEP 727: Documentation Metadata in Typing (#3310) --- .github/CODEOWNERS | 1 + pep-0727.rst | 516 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 517 insertions(+) create mode 100644 pep-0727.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 53755f393..72f2bcd2c 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -602,6 +602,7 @@ pep-0721.rst @encukou pep-0722.rst @pfmoore pep-0723.rst @AA-Turner pep-0725.rst @pradyunsg +pep-0727.rst @JelleZijlstra # ... # pep-0754.txt # ... diff --git a/pep-0727.rst b/pep-0727.rst new file mode 100644 index 000000000..494f31082 --- /dev/null +++ b/pep-0727.rst @@ -0,0 +1,516 @@ +PEP: 727 +Title: Documentation Metadata in Typing +Author: Sebastián Ramírez +Sponsor: Jelle Zijlstra +Discussions-To: +Status: Draft +Type: Standards Track +Topic: Typing +Content-Type: text/x-rst +Created: 28-Aug-2023 +Python-Version: 3.13 +Post-History: + + +Abstract +======== + +This document proposes a way to complement docstrings to add additional documentation +to Python symbols using type annotations with ``Annotated`` (in class attributes, +function and method parameters, return values, and variables). + + +Motivation +========== + +The current standard method of documenting code APIs in Python is using docstrings. +But there's no standard way to document parameters in docstrings. + +There are several pseudo-standards for the format in these docstrings, and new +pseudo-standards can appear easily: numpy, Google, Keras, reST, etc. + +All these formats are some specific syntax inside a string. Because of this, when +editing those docstrings, editors can't easily provide support for autocompletion, +inline errors for broken syntax, etc. + +Editors don't have a way to support all the possible micro languages in the docstrings +and show nice user interfaces when developers use libraries with those different +formats. They could even add support for some of the syntaxes, but probably not all, +or not completely. + +Because the docstring is in a different place in the code than the actual parameters +and it requires duplication of information (the parameter name) the information about +a parameter is easily in a place in the code quite far away from the declaration of +the actual parameter. This means it's easy to refactor a function, remove a parameter, +and forget to remove its docs. The same happens when adding a new parameter: it's easy +to forget to add the docstring for it. + +Editors can't check or ensure the consistency of the parameters and their docs. + +It's not possible to programatically test these docstrings, for example to ensure +documentation consistency for the parameters across several similar functions, at +least not without implementing a custom syntax parser. + +Some of these previous formats tried to account for the lack of type annotations +in older Python versions by including typing information in the docstrings, +but now that information doesn't need to be in docstrings as there is now an official +syntax for type annotations. + + +Rationale +========= + +This proposal intends to address these shortcomings by extending and complementing the +information in docstrings, keeping backwards compatibility with existing docstrings, +and doing it in a way that leverages the Python language and structure, via type +annotations with ``Annotated``, and a new function in ``typing``. + +The reason why this would belong in the standard Python library instead of an +external package is because although the implementation would be quite trivial, +the actual power and benefit from it would come from being a standard, so that +editors and other tools could implement support for it. + +This doesn't deprecate current usage of docstrings, it's transparent to common +developers (library users), and it's only opt-in for library authors that would +like to adopt it. + + +Specification +============= + + +``typing.doc`` +-------------- + +The main proposal is to have a new function ``doc()`` in the ``typing`` module. +Even though this is not strictly related to the type annotations, it's expected +to go in ``Annotated`` type annotations, and to interact with type annotations. + +There's also the particular benefit that it could be implemented in the +``typing_extensions`` package to have support for older versions of Python and +early adopters of this proposal. + +This ``doc()`` function would receive one single parameter ``documentation`` with +a documentation string. + +This string could be a multi-line string, in which case, when extracted by tools, +should be interpreted cleaning up indentation as if using ``inspect.cleandoc()``, +the same procedure used for docstrings. + +This string could probably contain markup, like Markdown or reST. As that could +be highly debated, that decision is left for a future proposal, to focus here +on the main functionality. + +This specification targets static analysis tools and editors, and as such, the +value passed to ``doc()`` should allow static evaluation and analysis. If a +developer passes as the value something that requires runtime execution +(e.g. a function call) the behavior of static analysis tools is unspecified +and they could omit it from their process and results. For static analysis +tools to be conformant with this specification they need only to support +statically accessible values. + +An example documenting the attributes of a class, or in this case, the keys +of a ``TypedDict``, could look like this: + +.. code-block:: + + from typing import Annotated, TypedDict, NotRequired, doc + + + class User(TypedDict): + firstname: Annotated[str, doc("The user's first name")] + lastname: Annotated[str, doc("The user's last name")] + + +An example documenting the parameters of a function could look like this: + +.. code-block:: + + from typing import Annotated, doc + + + def create_user( + lastname: Annotated[str, doc("The **last name** of the newly created user")], + firstname: Annotated[str | None, doc("The user's **first name**")] = None, + ) -> Annotated[User, doc("The created user after saving in the database")]: + """ + Create a new user in the system, it needs the database connection to be already + initialized. + """ + pass + + +The return of the ``doc()`` function is an instance of a class that can be checked +and used at runtime, defined similar to: + +.. code-block:: + + class DocInfo: + def __init__(self, documentation: str): + self.documentation = documentation + +...where the attribute ``documentation`` contains the same value string passed to +the function ``doc()``. + + +Additional Scenarios +-------------------- + +The main scenarios that this proposal intends to cover are described above, and +for implementers to be conformant to this specification, they only need to support +those scenarios described above. + +Here are some additional edge case scenarios with their respective considerations, +but implementers are not required to support them. + + +Type Alias +---------- + +When creating a type alias, like: + +.. code-block:: + + Username = Annotated[str, doc("The name of a user in the system")] + + +...the documentation would be considered to be carried by the parameter annotated +with ``Username``. + +So, in a function like: + +.. code-block:: + + def hi( + to: Username, + ) -> None: ... + + +...it would be equivalent to: + +.. code-block:: + + def hi( + to: Annotated[str, doc("The name of a user in the system")], + ) -> None: ... + +Nevertheless, implementers would not be required to support type aliases outside +of the final type annotation to be conformant with this specification, as it +could require more complex dereferencing logic. + + +Annotating Type Parameters +-------------------------- + +When annotating type parameters, as in: + +.. code-block:: + + def hi( + to: list[Annotated[str, doc("The name of a user in a list")]], + ) -> None: ... + +...the documentation in ``doc()`` would refer to what it is annotating, in this +case, each item in the list, not the list itself. + +There are currently no practical use cases for documenting type parameters, +so implementers are not required to support this scenario to be considered +conformant, but it's included for completeness. + + +Annotating Unions +----------------- + +If used in one of the parameters of a union, as in: + +.. code-block:: + + def hi( + to: str | Annotated[list[str], doc("List of user names")], + ) -> None: ... + +...again, the documentation in ``doc()`` would refer to what it is annotating, +in this case, this documents the list itself, not its items. + +In particular, the documentation would not refer to a single string passed as a +parameter, only to a list. + +There are currently no practical use cases for documenting unions, so implementers +are not required to support this scenario to be considered conformant, but it's +included for completeness. + + +Nested ``Annotated`` +-------------------- + +Continuing with the same idea above, if ``Annotated`` was used nested and used +multiple times in the same parameter, ``doc()`` would refer to the type it +is annotating. + +So, in an example like: + +.. code-block:: + + def hi( + to: Annotated[ + Annotated[str, doc("A user name")] | Annotated[list, doc("A list of user names")], + doc("Who to say hi to"), + ], + ) -> None: ... + + +The documentation for the whole parameter ``to`` would be considered to be +"``Who to say hi to``". + +The documentation for the case where that parameter ``to`` is specifically a ``str`` +would be considered to be "``A user name``". + +The documentation for the case where that parameter ``to`` is specifically a +``list`` would be considered to be "``A list of user names``". + +Implementers would only be required to support the top level use case, where the +documentation for ``to`` is considered to be "``Who to say hi to``". +They could optionally support having conditional documentation for when the type +of the parameter passed is of one type or another, but they are not required to do so. + + +Duplication +----------- + +If ``doc()`` is used multiple times in a single ``Annotated``, it would be +considered invalid usage from the developer, for example: + +.. code-block:: + + def hi( + to: Annotated[str, doc("A user name"), doc("The current user name")], + ) -> None: ... + + +Implementers can consider this invalid and are not required to support this to be +considered conformant. + +Nevertheless, as it might be difficult to enforce it on developers, implementers +can opt to support one of the ``doc()`` declarations. + +In that case, the suggestion would be to support the last one, just because +this would support overriding, for example, in: + +.. code-block:: + + User = Annotated[str, doc("A user name")] + + CurrentUser = Annotated[User, doc("The current user name")] + + +Internally, in Python, ``CurrentUser`` here is equivalent to: + +.. code-block:: + + CurrentUser = Annotated[str, doc("A user name"), doc("The current user name")] + + +For an implementation that supports the last ``doc()`` appearance, the above +example would be equivalent to: + +.. code-block:: + + def hi( + to: Annotated[str, doc("The current user name")], + ) -> None: ... + + +Early Adopters and Older Python Versions +======================================== + +For older versions of Python and early adopters of this proposal, ``doc()`` and +``DocInfo`` can be imported from the ``typing_extensions`` package. + +.. code-block:: + + from typing import Annotated + + from typing_extensions import doc + + + def hi( + to: Annotated[str, doc("The current user name")], + ) -> None: ... + + +Rejected Ideas +============== + + +Standardize Current Docstrings +------------------------------ + +A possible alternative would be to support and try to push as a standard one of the +existing docstring formats. But that would only solve the standardization. + +It wouldn't solve any of the other problems, like getting editor support +(syntax checks) for library authors, the distance and duplication of information +between a parameter definition and its documentation in the docstring, etc. + + +Extra Metadata and Decorator +---------------------------- + +An earlier version of this proposal included several parameters to indicate whether +an object is discouraged from use, what exceptions it may raise, etc. +To allow also deprecating functions and classes, it was also expected +that ``doc()`` could be used as a decorator. But this functionality is covered +by ``typing.deprecated()`` in :pep:`702`, so it was dropped from this proposal. + +A way to declare additional information could still be useful in the future, +but taking early feedback on this document, all that was postponed to future +proposals. + +This also shifts the focus from an all-encompasing function ``doc()`` +with multiple parameters to multiple composable functions, having ``doc()`` +handle one single use case: additional documentation in ``Annotated``. + +This design change also allows better interoperability with other proposals +like ``typing.deprecated()``, as in the future it could be considered to +allow having ``typing.deprecated()`` also in ``Annotated`` to deprecate +individual parameters, coexisting with ``doc()``. + + +Open Issues +=========== + + +Verbosity +--------- + +The main argument against this would be the increased verbosity. + +Nevertheless, this verbosity would not affect end users as they would not see the +internal code using ``typing.doc()``. + +And the cost of dealing with the additional verbosity would only be carried +by those library maintainers that decide to opt-in into this feature. + +Any authors that decide not to adopt it, are free to continue using docstrings +with any particular format they decide, no docstrings at all, etc. + +This argument could be analogous to the argument against type annotations +in general, as they do indeed increase verbosity, in exchange for their +features. But again, as with type annotations, this would be optional and only +to be used by those that are willing to take the extra verbosity in exchange +for the benefits. + + +Doc is not Typing +----------------- + +It could also be argued that documentation is not really part of typing, or that +it should live in a different module. Or that this information should not be part +of the signature but live in another place (like the docstring). + +Nevertheless, type annotations in Python could already be considered, by default, +mainly documentation: they carry additional information about variables, +parameters, return types, and by default they don't have any runtime behavior. + +It could be argued that this proposal extends the type of information that +type annotations carry, the same way as :pep:`702` extends them to include +deprecation information. + +And as described above, including this in ``typing_extensions`` to support older +versions of Python would have a very simple and practical benefit. + + +Multiple Standards +------------------ + +Another argument against this would be that it would create another standard, +and that there are already several pseudo-standards for docstrings. It could +seem better to formalize one of the currently existing standards. + +Nevertheless, as stated above, none of those standards cover the general +drawbacks of a doctsring-based approach that this proposal solves naturally. + +None of the editors have full docstring editing support (even when they have +rendering support). Again, this is solved by this proposal just by using +standard Python syntax and structures instead of a docstring microsyntax. + +The effort required to implement support for this proposal by tools would +be minimal compared to that required for alternative docstring-based +pseudo-standards, as for this proposal, editors would only need to +access an already existing value in their ASTs, instead of writing a parser +for a new string microsyntax. + +In the same way, it can be seen that, in many cases, a new standard that +takes advantage of new features and solves several problems from previous +methods can be worth having. As is the case with the new ``pyproject.toml``, +``dataclass_transform``, the new typing pipe/union (``|``) operator, and other cases. + + +Adoption +-------- + +As this is a new standard proposal, it would only make sense if it had +interest from the community. + +Fortunately there's already interest from several mainstream libraries +from several developers and teams, including FastAPI, Typer, SQLModel, +Asyncer (from the author of this proposal), Pydantic, Strawberry, and others, +from other teams. + +There's also interest and support from documentation tools, like +`mkdocstrings `__, which added +support even for an earlier version of this proposal. + +All the CPython core developers contacted for early feedback (at least 4) have +shown interest and support for this proposal. + +Editor developers (VS Code and PyCharm) have shown some interest, while showing +concerns about the verbosity of the proposal, although not about the +implementation (which is what would affect them the most). And they have shown +they would consider adding support for this if it were to become an +official standard. In that case, they would only need to add support for +rendering, as support for editing, which is normally non-existing for +other standards, is already there, as they already support editing standard +Python syntax. + + +Bike Shedding +------------- + +I think ``doc()`` is a good name for the main function. But it might make sense +to consider changing the names for the other parts. + +The returned class containing info currently named ``DocInfo`` could instead +be named just ``Doc``. Although it could make verbal conversations more +confusing as it's the same word as the name of the function. + +The parameter received by ``doc()`` currently named ``documentation`` could +instead be named also ``doc``, but it would make it more ambiguous in +discussions to distinguish when talking about the function and the parameter, +although it would simplify the amount of terms, but as these terms refer to +different things closely related, it could make sense to have different names. + +The parameter received by ``doc()`` currently named ``documentation`` could +instead be named ``value``, but the word "documentation" might convey +the meaning better. + +The parameter received by ``doc()`` currently named ``documentation`` could be a +position-only parameter, in which case the name wouldn't matter much. But then +there wouldn't be a way to make it match with the ``DocInfo`` attribute. + +The ``DocInfo`` class has a single attribute ``documentation``, this name matches +the parameter passed to ``doc()``. It could be named something different, +like ``doc``, but this would mean a mismatch between the ``doc()`` parameter +``documentation`` and the equivalent attribute ``doc``, and it would mean that in +one case (in the function), the term ``doc`` refers to a function, and in the +other case (the resulting class) the term ``doc`` refers to a string value. + +This shows the logic to select the current terms, but it could all be +discussed further. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.