python-peps/pep-0649.rst

1470 lines
60 KiB
ReStructuredText

PEP: 649
Title: Deferred Evaluation Of Annotations Using Descriptors
Author: Larry Hastings <larry@hastings.org>
Status: Draft
Type: Standards Track
Topic: Typing
Content-Type: text/x-rst
Created: 11-Jan-2021
Post-History: 11-Jan-2021, 11-Apr-2021, 19-Apr-2023
********
Abstract
********
Annotations are a Python technology that allows expressing
type information and other metadata about Python functions,
classes, and modules. But Python's original semantics
for annotations required them to be eagerly evaluated,
at the time the annotated object was bound. This caused
chronic problems for static type analysis users using
"type hints", due to forward-reference and circular-reference
problems.
Python solved this by accepting :pep:`563`, incorporating
a new approach called "stringized annotations" in which
annotations were automatically converted into strings by
Python. This solved the forward-reference and circular-reference
problems, and also fostered intriguing new uses for annotation
metadata. But stringized annotations in turn caused chronic
problems for runtime users of annotations.
This PEP proposes a new and comprehensive third approach
for representing and computing annotations. It adds a new
internal mechanism for lazily computing annotations on demand,
via a new object method called ``__annotate__``.
This approach, when combined with a novel technique for
coercing annotation values into alternative formats, solves
all the above problems, supports all existing use cases,
and should foster future innovations in annotations.
********
Overview
********
This PEP adds a new dunder attribute to the objects that
support annotations--functions, classes, and modules.
The new attribute is called ``__annotate__``, and is
a reference to a function which computes and returns
that object's annotations dict.
At compile time, if the definition of an object includes
annotations, the Python compiler will write the expressions
computing the annotations into its own function. When run,
the function will return the annotations dict. The Python
compiler then stores a reference to this function in
``__annotate__`` on the object.
Furthermore, ``__annotations__`` is redefined to be a
"data descriptor" which calls this annotation function once
and caches the result.
This mechanism delays the evaluation of annotations expressions
until the annotations are examined, which solves many circular
reference problems.
This PEP also defines new functionality for two functions
in the Python standard library:
``inspect.get_annotations`` and ``typing.get_type_hints``.
The functionality is accessed via a new keyword-only parameter,
``format``. ``format`` allows the user to request
the annotations from these functions
in a specific format.
Format identifiers are always predefined integer values.
The formats defined by this PEP are:
* ``inspect.VALUE = 1``
The default value.
The function will return the conventional Python
values for the annotations. This format is identical
to the return value for these functions under Python 3.11.
* ``inspect.FORWARDREF = 2``
The function will attempt to return the conventional
Python values for the annotations. However, if it
encounters an undefined name, or a free variable that
has not yet been associated with a value, it dynamically
creates a proxy object (a ``ForwardRef``) that substitutes
for that value in the expression, then continues evaluation.
The resulting dict may contain a mixture of proxies and
real values. If all real values are defined at the time
the function is called, ``inspect.FORWARDREF`` and
``inspect.VALUE`` produce identical results.
* ``inspect.SOURCE = 3``
The function will produce an annotation dictionary
where the values have been replaced by strings containing
the original source code for the annotation expressions.
These strings may only be approximate, as they may be
reverse-engineered from another format, rather than
preserving the original source code, but the differences
will be minor.
If accepted, this PEP would *supersede* :pep:`563`,
and :pep:`563`'s behavior would be deprecated and
eventually removed.
Comparison Of Annotation Semantics
==================================
.. note:: The code presented in this section is simplified
for clarity, and is intentionally inaccurate in some
critical aspects. This example is intended merely to
communicate the high-level concepts involved without
getting lost in the details. But readers should note
that the actual implementation is quite different in
several important ways. See the Implementation_
section later in this PEP for a far more accurate
description of what this PEP proposes from a technical
level.
Consider this example code:
.. code-block::
def foo(x: int = 3, y: MyType = None) -> float:
...
class MyType:
...
foo_y_annotation = foo.__annotations__['y']
As we see here, annotations are available at runtime through an
``__annotations__`` attribute on functions, classes, and modules.
When annotations are specified on one of these objects,
``__annotations__`` is a dictionary mapping the names of the
fields to the value specified as that field's annotation.
The default behavior in Python is to evaluate the expressions
for the annotations, and build the annotations dict, at the time
the function, class, or module is bound. At runtime the above
code actually works something like this:
.. code-block::
annotations = {'x': int, 'y': MyType, 'return': float}
def foo(x = 3, y = "abc"):
...
foo.__annotations__ = annotations
class MyType:
...
foo_y_annotation = foo.__annotations__['y']
The crucial detail here is that the values ``int``, ``MyType``,
and ``float`` are looked up at the time the function object is
bound, and these values are stored in the annotations dict.
But this code doesn't run—it throws a ``NameError`` on the first
line, because ``MyType`` hasn't been defined yet.
:pep:`563`'s solution is to decompile the expressions back
into strings during compilation and store those strings as the
values in the annotations dict. The equivalent runtime code
would look something like this:
.. code-block::
annotations = {'x': 'int', 'y': 'MyType', 'return': 'float'}
def foo(x = 3, y = "abc"):
...
foo.__annotations__ = annotations
class MyType:
...
foo_y_annotation = foo.__annotations__['y']
This code now runs successfully. However, ``foo_y_annotation``
is no longer a reference to ``MyType``, it is the *string*
``'MyType'``. To turn the string into the real value ``MyType``,
the user would need to evaluate the string using ``eval``,
``inspect.get_annotations``, or ``typing.get_type_hints``.
This PEP proposes a third approach, delaying the evaluation of
the annotations by computing them in their own function. If
this PEP was active, the generated code would work something
like this:
.. code-block::
class function:
# __annotations__ on a function object is already a
# "data descriptor" in Python, we're just changing
# what it does
@property
def __annotations__(self):
return self.__annotate__()
# ...
def annotate_foo():
return {'x': int, 'y': MyType, 'return': float}
def foo(x = 3, y = "abc"):
...
foo.__annotate__ = annotate_foo
class MyType:
...
foo_y_annotation = foo.__annotations__['y']
The important change is that the code constructing the
annotations dict now lives in a function—here, called
``annotate_foo()``. But this function isn't called
until we ask for the value of ``foo.__annotations__``,
and we don't do that until *after* the definition of ``MyType``.
So this code also runs successfully, and ``foo_y_annotation`` now
has the correct value--the class ``MyType``--even though
``MyType`` wasn't defined until *after* the annotation was
defined.
Mistaken Rejection Of This Approach In November 2017
====================================================
During the early days of discussion around :pep:`563`,
in a November 2017 thread in ``comp.lang.python-dev``,
the idea of using code to delay the evaluation of
annotations was briefly discussed. At the time the
technique was termed an "implicit lambda expression".
Guido van Rossum—Python's BDFL at the time—replied,
asserting that these "implicit lambda expression" wouldn't
work, because they'd only be able to resolve symbols at
module-level scope:
IMO the inability of referencing class-level definitions
from annotations on methods pretty much kills this idea.
https://mail.python.org/pipermail/python-dev/2017-November/150109.html
This led to a short discussion about extending lambda-ized
annotations for methods to be able to refer to class-level
definitions, by maintaining a reference to the class-level
scope. This idea, too, was quickly rejected.
:pep:`PEP 563 summarizes the above discussion
<563#keeping-the-ability-to-use-function-local-state-when-defining-annotations>`
The approach taken by this PEP doesn't suffer from these
restrictions. Annotations can access module-level definitions,
class-level definitions, and even local and free variables.
**********
Motivation
**********
A History Of Annotations
========================
Python 3.0 shipped with a new syntax feature, "annotations",
defined in :pep:`3107`.
This allowed specifying a Python value that would be
associated with a parameter of a Python function, or
with the value that function returns.
Said another way, annotations gave Python users an interface
to provide rich metadata about a function parameter or return
value, for example type information.
All the annotations for a function were stored together in
a new attribute ``__annotations__``, in an "annotation dict"
that mapped parameter names (or, in the case of the return
annotation, using the name ``'return'``) to their Python value.
In an effort to foster experimentation, Python
intentionally didn't define what form this metadata should take,
or what values should be used. User code began experimenting with
this new facility almost immediately. But popular libraries that
make use of this functionality were slow to emerge.
After years of little progress, the BDFL chose a particular
approach for expressing static type information, called
*type hints,* as defined in :pep:`484`. Python 3.5 shipped
with a new :mod:`typing` module which quickly became very popular.
Python 3.6 added syntax to annotate local variables,
class attributes, and module attributes, using the approach
proposed in :pep:`526`. Static type analysis continued to
grow in popularity.
However, static type analysis users were increasingly frustrated
by an inconvenient problem: forward references. In classic
Python, if a class C depends on a later-defined class D,
it's normally not a problem, because user code will usually
wait until both are defined before trying to use either.
But annotations added a new complication, because they were
computed at the time the annotated object (function, class,
or module) was bound. If methods on class C are annotated with
type D, and these annotation expressions are computed at the
time that the method is bound, D may not be defined yet.
And if methods in D are also annotated with type C, you now
have an unresolvable circular reference problem.
Initially, static type users worked around this problem
by defining their problematic annotations as strings.
This worked because a string containing the type hint was
just as usable for the static type analysis tool.
And users of static type analysis tools rarely examine the
annotations at runtime, so this representation wasn't
itself an inconvenience. But manually stringizing type
hints was clumsy and error-prone. Also, code bases were
adding more and more annotations, which consumed more and
more CPU time to create and bind.
To solve these problems, the BDFL accepted :pep:`563`, which
added a new feature to Python 3.7: "stringized annotations".
It was activated with a future import::
from __future__ import annotations
Normally, annotation expressions were evaluated at the time
the object was bound, with their values being stored in the
annotations dict. When stringized annotations were active,
these semantics changed: instead, at compile time, the compiler
converted all annotations in that module into string
representations of their source code--thus, *automatically*
turning the users's annotations into strings, obviating the
need to *manually* stringize them as before. :pep:`563`
suggested users could evaluate this string with ``eval``
if the actual value was needed at runtime.
(From here on out, this PEP will refer to the classic
semantics of :pep:`3107` and :pep:`526`, where the
values of annotation expressions are computed at the time
the object is bound, as *"stock" semantics,* to differentiate
them from the new :pep:`563` "stringized" annotation semantics.)
The Current State Of Annotation Use Cases
=========================================
Although there are many specific use cases for annotations,
annotation users in the discussion around this PEP tended
to fall into one of these four categories.
Static typing users
-------------------
Static typing users use annotations to add type information
to their code. But they largely don't examine the annotations
at runtime. Instead, they use static type analysis tools
(mypy, pytype) to examine their source tree and determine
whether or not their code is using types consistently.
This is almost certainly the most popular use case for
annotations today.
Many of the annotations use *type hints,* a la :pep:`484`
(and many subsequent PEPs). Type hints are passive objects,
mere representation of type information; they don't do any actual work.
Type hints are often parameterized with other types or other type hints.
Since they're agnostic about what these actual values are, type hints
work fine with ``ForwardRef`` proxy objects.
Users of static type hints discovered that extensive type hinting under
stock semantics often created large-scale circular reference and circular
import problems that could be difficult to solve. :pep:`563` was designed
specifically to solve this problem, and the solution worked great for
these users. The difficulty of rendering stringized annotations into
real values largely didn't inconvenience these users because of how
infrequently they examine annotations at runtime.
Static typing users often combine :pep:`563` with the
``if typing.TYPE_CHECKING`` idiom to prevent their type hints from being
loaded at runtime. This means they often aren't able to evaluate their
stringized annotations and produce real values at runtime. On the rare
occasion that they do examine annotations at runtime, they often forgo
``eval``, instead using lexical analysis directly on the stringized
annotations.
Under this PEP, static typing users will probably prefer ``FORWARDREF``
or ``SOURCE`` format.
Runtime annotation users
------------------------
Runtime annotation users use annotations as a means of expressing rich
metadata about their functions and classes, which they use as input to
runtime behavior. Specific use cases include runtime type verification
(Pydantic) and glue logic to expose Python APIs in another domain
(FastAPI, Typer). The annotations may or may not be type hints.
As runtime annotation users examine annotations at runtime, they were
traditionally better served with stock semantics. This use case is
largely incompatible with :pep:`563`, particularly with the
``if typing.TYPE_CHECKING`` idiom.
Under this PEP, runtime annotation users will most likely prefer ``VALUE``
format, though some (e.g. if they evaluate annotations eagerly in a decorator
and want to support forward references) may also use ``FORWARDREF`` format.
Wrappers
--------
Wrappers are functions or classes that wrap user functions or
classes and add functionality. Examples of this would be
:func:`dataclass`, :func:`functools.partial`, ``attrs``, and ``wrapt``.
Wrappers are a distinct subcategory of runtime annotation users.
Although they do use annotations at runtime, they may or may not
actually examine the annotations of the objects they wrap--it depends
on the functionality the wrapper provides. As a rule they should
propagate the annotations of the wrapped object to the wrapper
they create, although it's possible they may modify those annotations.
Wrappers were generally designed to work well under stock semantics.
Whether or not they work well under :pep:`563` semantics depends on the
degree to which they examine the wrapped object's annotations.
Often wrappers don't care about the value per se, only needing
specific information about the annotations. Even so, :pep:`563`
and the ``if typing.TYPE_CHECKING`` idiom can make it difficult
for wrappers to reliably determine the information they need at
runtime. This is an ongoing, chronic problem.
Under this PEP, wrappers will probably prefer ``FORWARDREF`` format
for their internal logic. But the wrapped objects need to support
all formats for their users.
Documentation
-------------
:pep:`563` stringized annotations were a boon for tools that
mechanically construct documentation.
Stringized type hints make for excellent documentation; type hints
as expressed in source code are often succinct and readable. However,
at runtime these same type hints can produce value at runtime whose repr
is a sprawling, nested, unreadable mess. Thus documentation users were
well-served by :pep:`563` but poorly served with stock semantics.
Under this PEP, documentation users are expected to use ``SOURCE`` format.
Motivation For This PEP
=======================
Python's original semantics for annotations made its use for
static type analysis painful due to forward reference problems.
:pep:`563` solved the forward reference problem, and many
static type analysis users became happy early adopters of it.
But its unconventional solution created new problems for two
of the above cited use cases: runtime annotation users,
and wrappers.
First, stringized annotations didn't permit referencing local or
free variables, which meant many useful, reasonable approaches
to creating annotations were no longer viable. This was
particularly inconvenient for decorators that wrap existing
functions and classes, as these decorators often use closures.
Second, in order for ``eval`` to correctly look up globals in a
stringized annotation, you must first obtain a reference
to the correct module.
But class objects don't retain a reference to their globals.
:pep:`563` suggests looking up a class's module by name in
``sys.modules``—a surprising requirement for a language-level
feature.
Additionally, complex but legitimate constructions can make it
difficult to determine the correct globals and locals dicts to
give to ``eval`` to properly evaluate a stringized annotation.
Even worse, in some situations it may simply be infeasible.
For example, some libraries (e.g. ``typing.TypedDict``, :mod:`dataclasses`)
wrap a user class, then merge all the annotations from all that
class's base classes together into one cumulative annotations dict.
If those annotations were stringized, calling ``eval`` on them later
may not work properly, because the globals dictionary used for the
``eval`` will be the module where the *user class* was defined,
which may not be the same module where the *annotation* was
defined. However, if the annotations were stringized because
of forward-reference problems, calling ``eval`` on them early
may not work either, due to the forward reference not being
resolvable yet. This has proved to be difficult to reconcile;
of the three bug reports linked to below, only one has been
marked as fixed.
* https://github.com/python/cpython/issues/89687
* https://github.com/python/cpython/issues/85421
* https://github.com/python/cpython/issues/90531
Even with proper globals *and* locals, ``eval`` can be unreliable
on stringized annotations.
``eval`` can only succeed if all the symbols referenced in
an annotations are defined. If a stringized annotation refers
to a mixture of defined and undefined symbols, a simple ``eval``
of that string will fail. This is a problem for libraries with
that need to examine the annotation, because they can't reliably
convert these stringized annotations into real values.
* Some libraries (e.g. :mod:`dataclasses`) solved this by foregoing real
values and performing lexical analysis of the stringized annotation,
which requires a lot of work to get right.
* Other libraries still suffer with this problem,
which can produce surprising runtime behavior.
https://github.com/python/cpython/issues/97727
Also, ``eval()`` is slow, and it isn't always available; it's
sometimes removed for space reasons on certain platforms.
``eval()`` on MicroPython doesn't support the ``locals``
argument, which makes converting stringized annotations
into real values at runtime even harder.
Finally, :pep:`563` requires Python implementations to
stringize their annotations. This is surprising behavior—unprecedented
for a language-level feature, with a complicated implementation,
that must be updated whenever a new operator is added to the
language.
These problems motivated the research into finding a new
approach to solve the problems facing annotations users,
resulting in this PEP.
.. _Implementation:
**************
Implementation
**************
__annotate__ and __annotations__
================================
Python supports annotations on three different types:
functions, classes, and modules. This PEP modifies
the semantics on all three of these types in a similar
way.
First, this PEP adds a new "dunder" attribute, ``__annotate__``.
``__annotate__`` must be a "data descriptor",
implementing all three actions: get, set, and delete.
The ``__annotate__`` attribute is always defined,
and may only be set to either ``None`` or to a callable.
(``__annotate__`` cannot be deleted.) If an object
has no annotations, ``__annotate__`` should be
initialized to ``None``, rather than to a function
that returns an empty dict.
The ``__annotate__`` data descriptor must have dedicated
storage inside the object to store the reference to its value.
The location of this storage at runtime is an implementation
detail. Even if it's visible to Python code, it should still
be considered an internal implementation detail, and Python
code should prefer to interact with it only via the
``__annotate__`` attribute.
The callable stored in ``__annotate__`` must accept a
single required positional argument called ``format``,
which will always be an ``int``. It must either return
a dict (or subclass of dict) or raise
``NotImplementedError()``.
Here's a formal definition of ``__annotate__``, as it will
appear in the "Magic methods" section of the Python
Language Reference:
``__annotate__(format: int) -> dict``
Returns a new dictionary object mapping attribute/parameter
names to their annotation values.
Takes a ``format`` parameter specifying the format in which
annotations values should be provided. Must be one of the
following:
``1`` (exported as ``inspect.VALUE``)
Values are the result of evaluating the annotation expressions.
``2`` (exported as ``inspect.SOURCE``)
Values are the text string of the annotation as it
appears in the source code. May only be approximate;
whitespace may be normalized, and constant values may
be optimized. It's possible the exact values of these
strings could change in future version of Python.
``3`` (exported as ``inspect.FORWARDREF``)
Values are real annotation values (as per ``inspect.VALUE`` format)
for defined values, and ``ForwardRef`` proxies for undefined values.
Real objects may be exposed to, or contain references to,
``ForwardRef`` proxy objects.
If an ``__annotate__`` function doesn't support the requested
format, it must raise ``NotImplementedError()``.
``__annotate__`` functions must always support ``1`` (``inspect.VALUE``)
format; they must not raise ``NotImplementedError()`` when called with
``format=1``.
When called with ``format=1``, an ``__annotate__`` function
may raise ``NameError``; it must not raise ``NameError`` when called
requesting any other format.
If an object doesn't have any annotations, ``__annotate__`` should
preferably be set to ``None`` (it can't be deleted), rather than set to a
function that returns an empty dict.
When the Python compiler compiles an object with
annotations, it simultaneously compiles the appropriate
annotate function. This function, called with
the single positional argument ``inspect.VALUE``,
computes and returns the annotations dict as defined
on that object. The Python compiler and runtime work
in concert to ensure that the function is bound to
the appropriate namespaces:
* For functions and classes, the globals dictionary will
be the module where the object was defined. If the object
is itself a module, its globals dictionary will be its
own dict.
* For methods on classes, and for classes, the locals dictionary
will be the class dictionary.
* If the annotations refer to free variables, the closure will
be the appropriate closure tuple containing cells for free variables.
Second, this PEP requires that the existing
``__annotations__`` must be a "data descriptor",
implementing all three actions: get, set, and delete.
``__annotations__`` must also have its own internal
storage it uses to cache a reference to the annotations dict:
* Class and module objects must
cache the annotations dict in their ``__dict__``, using the key
``__annotations__``. This is required for backwards
compatibility reasons.
* For function objects, storage for the annotations dict
cache is an implementation detail. It's preferably internal
to the function object and not visible in Python.
This PEP defines semantics on how ``__annotations__`` and
``__annotate__`` interact, for all three types that implement them.
In the following examples, ``fn`` represents a function, ``cls``
represents a class, ``mod`` represents a module, and ``o`` represents
an object of any of these three types:
* When ``o.__annotations__`` is evaluated, and the internal storage
for ``o.__annotations__`` is unset, and ``o.__annotate__`` is set
to a callable, the getter for ``o.__annotations__`` calls
``o.__annotate__(1)``, then caches the result in its internal
storage and returns the result.
- To explicitly clarify one question that has come up multiple times:
this ``o.__annotations__`` cache is the *only* caching mechanism
defined in this PEP. There are *no other* caching mechanisms defined
in this PEP. The ``__annotate__`` functions generated by the Python
compiler explicitly don't cache any of the values they compute.
* Setting ``o.__annotate__`` to a callable invalidates the
cached annotations dict.
* Setting ``o.__annotate__`` to ``None`` has no effect on
the cached annotations dict.
* Deleting ``o.__annotate__`` raises ``TypeError``.
``__annotate__`` must always be set; this prevents unannotated
subclasses from inheriting the ``__annotate__`` method of one
of their base classes.
* Setting ``o.__annotations__`` to a legal value
automatically sets ``o.__annotate__`` to ``None``.
* Setting ``cls.__annotations__`` or ``mod.__annotations__``
to ``None`` otherwise works like any other attribute; the
attribute is set to ``None``.
* Setting ``fn.__annotations__`` to ``None`` invalidates
the cached annotations dict. If ``fn.__annotations__``
doesn't have a cached annotations value, and ``fn.__annotate__``
is ``None``, the ``fn.__annotations__`` data descriptor
creates, caches, and returns a new empty dict. (This is for
backwards compatibility with :pep:`3107` semantics.)
Changes to allowable annotations syntax
=======================================
``__annotate__`` now delays the evaluation of annotations until
``__annotations__`` is referenced in the future. It also means
annotations are evaluated in a new function, rather than in the
original context where the object they were defined on was bound.
There are four operators with significant runtime side-effects
that were permitted in stock semantics, but are disallowed when
``from __future__ import annotations`` is active, and will have
to be disallowed when this PEP is active:
```
:=
yield
yield from
await
```
Changes to ``inspect.get_annotations`` and ``typing.get_type_hints``
====================================================================
(This PEP makes frequent reference to these two functions. In the future
it will refer to them collectively as "the helper functions", as they help
user code work with annotations.)
These two functions extract and return the annotations from an object.
``inspect.get_annotations`` returns the annotations unchanged;
for the convenience of static typing users, ``typing.get_type_hints``
makes some modifications to the annotations before it returns them.
This PEP adds a new keyword-only parameter to these two functions,
``format``. ``format`` specifies what format the values in the
annotations dict should be returned in.
``format`` accepts the following values, defined as attributes on the
``inspect`` module::
VALUE = 1
FORWARDREF = 2
SOURCE = 3
The default value for the ``format`` parameter is ``1``,
which is ``VALUE`` format.
The defined ``format`` values are guaranteed to be contiguous,
and the ``inspect`` module also publishes attributes representing
the minimum and maximum supported ``format`` values::
FORMAT_MIN = VALUE
FORMAT_MAX = SOURCE
Also, when either ``__annotations__`` or ``__annotate__`` is updated on an
object, the other of those two attributes is now out-of-date and should also
either be updated or deleted (set to ``None``, in the case of ``__annotate__``
which cannot be deleted). In general, the semantics established in the previous
section ensure that this happens automatically. However, there's one case which
for all practical purposes can't be handled automatically: when the dict cached
by ``o.__annotations__`` is itself modified, or when mutable values inside that
dict are modified.
Since this can't be handled in code, it must be handled in
documentation. This PEP proposes amending the documentation
for ``inspect.get_annotations`` (and similarly for
``typing.get_type_hints``) as follows:
If you directly modify the ``__annotations__`` dict on an object,
by default these changes may not be reflected in the dictionary
returned by ``inspect.get_annotations`` when requesting either
``SOURCE`` or ``FORWARDREF`` format on that object. Rather than
modifying the ``__annotations__`` dict directly, consider replacing
that object's ``__annotate__`` method with a function computing
the annotations dict with your desired values. Failing that, it's
best to overwrite the object's ``__annotate__`` method with ``None``
to prevent ``inspect.get_annotations`` from generating stale results
for ``SOURCE`` and ``FORWARDREF`` formats.
The ``stringizer`` and the ``fake globals`` environment
=======================================================
As originally proposed, this PEP supported many runtime
annotation user use cases, and many static type user use cases.
But this was insufficient--this PEP could not be accepted
until it satisfied *all* extant use cases. This became
a longtime blocker of this PEP until Carl Meyer proposed
the "stringizer" and the "fake globals" environment as
described below. These techniques allow this PEP to support
both the ``FORWARDREF`` and ``SOURCE`` formats, ably
satisfying all remaining uses cases.
In a nutshell, this technique involves running a
Python-compiler-generated ``__annotate__`` function in
an exotic runtime environment. Its normal ``globals``
dict is replaced with what's called a "fake globals" dict.
A "fake globals" dict is a dict with one important difference:
every time you "get" a key from it that isn't mapped,
it creates, caches, and returns a new value for that key
(as per the ``__missing__`` callback for a dictionary).
That value is a an instance of a novel type referred to
as a "stringizer".
A "stringizer" is a Python class with highly unusual behavior.
Every stringizer is initialized with its "value", initially
the name of the missing key in the "fake globals" dict. The
stringizer then implements every Python "dunder" method used to
implement operators, and the value returned by that method
is a new stringizer whose value is a text representation
of that operation.
When these stringizers are used in expressions, the result
of the expression is a new stringizer whose name textually
represents that expression. For example, let's say
you have a variable ``f``, which is a reference to a
stringizer initialized with the value ``'f'``. Here are
some examples of operations you could perform on ``f`` and
the values they would return::
>>> f
Stringizer('f')
>>> f + 3
Stringizer('f + 3')
>> f["key"]
Stringizer('f["key"]')
Bringing it all together: if we run a Python-generated
``__annotate__`` function, but we replace its globals
with a "fake globals" dict, all undefined symbols it
references will be replaced with stringizer proxy objects
representing those symbols, and any operations performed
on those proxies will in turn result in proxies
representing that expression. This allows ``__annotate__``
to complete, and to return an annotations dict, with
stringizer instances standing in for names and entire
expressions that could not have otherwise been evaluated.
In practice, the "stringizer" functionality will be implemented
in the ``ForwardRef`` object currently defined in the
``typing`` module. ``ForwardRef`` will be extended to
implement all stringizer functionality; it will also be
extended to support evaluating the string it contains,
to produce the real value (assuming all symbols referenced
are defined). This means the ``ForwardRef`` object
will retain references to the appropriate "globals",
"locals", and even "closure" information needed to
evaluate the expression.
This technique is the core of how ``inspect.get_annotations``
supports ``FORWARDREF`` and ``SOURCE`` formats. Initially,
``inspect.get_annotations`` will call the object's
``__annotate__`` method requesting the desired format.
If that raises ``NotImplementedError``, ``inspect.get_annotations``
will construct a "fake globals" environment, then call
the object's ``__annotate__`` method.
* ``inspect.get_annotations`` produces ``SOURCE`` format
by creating a new empty "fake globals" dict, binding it
to the object's ``__annotate__`` method, calling that
requesting ``VALUE`` format, and then extracting the string
"value" from each ``ForwardRef`` object
in the resulting dict.
* ``inspect.get_annotations`` produces ``FORWARDREF`` format
by creating a new empty "fake globals" dict, pre-populating
it with the current contents of the ``__annotate__`` method's
globals dict, binding the "fake globals" dict to the object's
``__annotate__`` method, calling that requesting ``VALUE``
format, and returning the result.
This entire technique works because the ``__annotate__`` functions
generated by the compiler are controlled by Python itself, and
are simple and predictable. They're
effectively a single ``return`` statement, computing and
returning the annotations dict. Since most operations needed
to compute an annotation are implemented in Python using dunder
methods, and the stringizer supports all the relevant dunder
methods, this approach is a reliable, practical solution.
However, it's not reasonable to attempt this technique with
just any ``__annotate__`` method. This PEP assumes that
third-party libraries may implement their own ``__annotate__``
methods, and those functions would almost certainly work
incorrectly when run in this "fake globals" environment.
For that reason, this PEP allocates a flag on code objects,
one of the unused bits in ``co_flags``, to mean "This code
object can be run in a 'fake globals' environment." This
makes the "fake globals" environment strictly opt-in, and
it's expected that only ``__annotate__`` methods generated
by the Python compiler will set it.
The weakness in this technique is in handling operators which
don't directly map to dunder methods on an object. These are
all operators that implement some manner of flow control,
either branching or iteration:
* Short-circuiting ``or``
* Short-circuiting ``and``
* Ternary operator (the ``if`` / ``then`` operator)
* Generator expressions
* List / dict / set comprehensions
* Iterable unpacking
As a rule these techniques aren't used in annotations,
so it doesn't pose a problem in practice. However, the
recent addition of ``TypeVarTuple`` to Python does use
iterable unpacking. The dunder methods
involved (``__iter__`` and ``__next__``) don't permit
distinguishing between iteration use cases; in order to
correctly detect which use case was involved, mere
"fake globals" and a "stringizer" wouldn't be sufficient;
this would require a custom bytecode interpreter designed
specifically around producing ``SOURCE`` and ``FORWARDREF``
formats.
Thankfully there's a shortcut that will work fine:
the stringizer will simply assume that when its
iteration dunder methods are called, it's in service
of iterator unpacking being performed by ``TypeVarTuple``.
It will hard-code this behavior. This means no other
technique using iteration will work, but in practice
this won't inconvenience real-world use cases.
Finally, note that the "fake globals" environment
will also require constructing a matching "fake locals"
dictionary, which for ``FORWARDREF`` format will be
pre-populated with the relevant locals dict. The
"fake globals" environment will also have to create
a fake "closure", a tuple of ``FowardRef`` objects
pre-created with the names of the free variables
referenced by the ``__annotate__`` method.
``ForwardRef`` proxies created from ``__annotate__``
methods that reference free variables will map the
names and closure values of those free variables into
the locals dictionary, to ensure that ``eval`` uses
the correct values for those names.
Compiler-generated ``__annotate__`` functions
==============================================
As mentioned in the previous section, the ``__annotate__``
functions generated by the compiler are simple. They're
mainly a single ``return`` statement, computing and
returning the annotations dict.
However, the protocol for ``inspect.get_annotations``
to request either ``FORWARDREF`` or ``SOURCE`` format
requires first asking the ``__annotate__`` method to
produce it. ``__annotate__`` methods generated by
the Python compiler won't support either of these
formats and will raise ``NotImplementedError()``.
Third-party ``__annotate__`` functions
======================================
Third-party classes and functions will likely need
to implement their own ``__annotate__`` methods,
so that downstream users of
those objects can take full advantage of annotations.
In particular, wrappers will likely need to transform
the annotation dicts produced by the wrapped object: adding,
removing, or modifying the dictionary in some way.
Most of the time, third-party code will implement
their ``__annotate__`` methods by calling
``inspect.get_annotations`` on some existing upstream
object. For example, wrappers will likely request the
annotations dict for their wrapped object,
in the format that was requested from them, then
modify the returned annotations dict as appropriate
and return that. This allows third-party code to
leverage the "fake globals" technique without
having to understand or participate in it.
Third-party libraries that support both pre- and
post-PEP-649 versions of Python will have to innovate
their own best practices on how to support both.
One sensible approach would be for their wrapper to
always support ``__annotate__``, then call it requesting
``VALUE`` format and store the result as the
``__annotations__`` on their wrapper object.
This would support pre-649 Python semantics, and be
forward-compatible with post-649 semantics.
Pseudocode
==========
Here's high-level pseudocode for ``inspect.get_annotations``::
def get_annotations(o, format):
if format == VALUE:
return dict(o.__annotations__)
if format == FORWARDREF:
try:
return dict(o.__annotations__)
except NameError:
pass
if not hasattr(o.__annotate__):
return {}
c_a = o.__annotate__
try:
return c_a(format)
except NotImplementedError:
if not can_be_called_with_fake_globals(c_a):
return {}
c_a_with_fake_globals = make_fake_globals_version(c_a, format)
return c_a_with_fake_globals(VALUE)
Here's what a Python compiler-generated ``__annotate__`` method
might look like if it was written in Python::
def __annotate__(self, format):
if format != 1:
raise NotImplementedError()
return { ... }
Here's how a third-party wrapper class might implement
``__annotate__``. In this example, the wrapper works
like ``functools.partial``, pre-binding one parameter of
the wrapped callable, which for simplicity must be named
``arg``::
def __annotate__(self, format):
ann = inspect.get_annotations(self.wrapped_fn, format)
if 'arg' in ann:
del ann['arg']
return ann
Other modifications to existing objects
=======================================
This PEP adds two more attributes to existing Python objects:
a ``__locals__`` attribute to function objects, and
an optional ``__globals__`` attribute to class objects.
In Python, the bytecode interpreter can reference both a
"globals" and a "locals" dictionary. However, the current
function object can only be bound to a globals dictionary,
via the ``__globals__`` attribute. Traditionally the
"locals" dictionary is only set when executing a class.
This PEP needs to set the "locals" dictionary to the class dict
when evaluating annotations defined inside a class namespace.
So this PEP defines a new ``__locals__`` attribute on
functions. By default it is uninitialized, or rather is set
to an internal value that indicates it hasn't been explicitly set.
It can be set to either ``None`` or a dictionary. If it's set to
a dictionary, the interpreter will use that dictionary as
the "locals" dictionary when running the function.
In Python, function objects contain a reference to their own
``__globals__``. However, class objects aren't currently
defined as doing so in Python. The implementation of
``__annotate__`` in CPython needs a reference to the module
globals in order to bind the unbound code object. So this PEP
defines a new ``__globals__`` attribute on class objects,
which stores a reference to the globals for the module where
the class was defined. Note that this attribute is optional,
but was useful for the CPython implementation.
(The class ``__globals__`` attribute does create a new reference
cycle, between a class and its module. However, any class that
contains a method already participates in at least one such cycle.)
Interactive REPL Shell
======================
The semantics established in this PEP also hold true when executing
code in Python's interactive REPL shell, except for module annotations
in the interactive module (``__main__``) itself. Since that module is
never "finished", there's no specific point where we can compile the
``__annotate__`` function.
For the sake of simplicity, in this case we forego delayed evaluation.
Module-level annotations in the REPL shell will continue to work
exactly as they do with "stock semantics", evaluating immediately and
setting the result directly inside the ``__annotations__`` dict.
Annotations On Local Variables Inside Functions
===============================================
Python supports syntax for local variable annotations inside
functions. However, these annotations have no runtime
effect--they're discarded at compile-time. Therefore, this
PEP doesn't need to do anything to support them, the same
as stock semantics and :pep:`563`.
Prototype
=========
The original prototype implementation of this PEP can be found here:
https://github.com/larryhastings/co_annotations/
As of this writing, the implementation is severely out of date;
it's based on Python 3.10 and implements the semantics of the
first draft of this PEP, from early 2021. It will be updated
shortly.
Performance Comparison
======================
Performance with this PEP is generally favorable. There are four
scenarios to consider:
* the runtime cost when annotations aren't defined,
* the runtime cost when annotations are defined but *not* referenced, and
* the runtime cost when annotations are defined and referenced as objects.
* the runtime cost when annotations are defined and referenced as strings.
We'll examine each of these scenarios in the context of all three
semantics for annotations: stock, :pep:`563`, and this PEP.
When there are no annotations, all three semantics have the same
runtime cost: zero. No annotations dict is created and no code is
generated for it. This requires no runtime processor time and
consumes no memory.
When annotations are defined but not referenced, the runtime cost
of Python with this PEP is roughly the same as :pep:`563`, and
improved over stock. The specifics depend on the object
being annotated:
* With stock semantics, the annotations dict is always built, and
set as an attribute of the object being annotated.
* In :pep:`563` semantics, for function objects, a precompiled
constant (a specially constructed tuple) is set as an attribute
of the function. For class and module objects, the annotations
dict is always built and set as an attribute of the class or module.
* With this PEP, a single object is set as an attribute of the
object being annotated. Most of the time, this object is
a constant (a code object), but when the annotations require a
class namespace or closure, this object will be a tuple constructed
at binding time.
When annotations are both defined and referenced as objects, code using
this PEP should be much faster than :pep:`563`, and be as fast
or faster than stock. :pep:`563` semantics requires invoking
``eval()`` for every value inside an annotations dict which is
enormously slow. And the implementation of this PEP generates measurably
more efficient bytecode for class and module annotations than stock
semantics; for function annotations, this PEP and stock semantics
should be about the same speed.
The one case where this PEP will be noticeably slower than :pep:`563` is when
annotations are requested as strings; it's hard to beat "they are already
strings." But stringified annotations are intended for online documentation use
cases, where performance is less likely to be a key factor.
Memory use should also be comparable in all three scenarios across
all three semantic contexts. In the first and third scenarios,
memory usage should be roughly equivalent in all cases.
In the second scenario, when annotations are defined but not
referenced, using this PEP's semantics will mean the
function/class/module will store one unused code object (possibly
bound to an unused function object); with the other two semantics,
they'll store one unused dictionary or constant tuple.
***********************
Backwards Compatibility
***********************
Backwards Compatibility With Stock Semantics
============================================
This PEP preserves nearly all existing behavior of
annotations from stock semantics:
* The format of the annotations dict stored in
the ``__annotations__`` attribute is unchanged.
Annotations dicts contain real values, not strings
as per :pep:`563`.
* Annotations dicts are mutable, and any changes to them are
preserved.
* The ``__annotations__`` attribute can be explicitly set,
and any legal value set this way will be preserved.
* The ``__annotations__`` attribute can be deleted using
the ``del`` statement.
Most code that works with stock semantics should
continue to work when this PEP is active without any
modification necessary. But there are exceptions,
as follows.
First, there's a well-known idiom for accessing class
annotations which may not work correctly when this
PEP is active. The original implementation of class
annotations had what can only be called a bug: if a class
didn't define any annotations of its own, but one
of its base classes did define annotations, the class
would "inherit" those annotations. This behavior
was never desirable, so user code found a workaround:
instead of accessing the annotations on the class
directly via ``cls.__annotations__``, code would
access the class's annotations via its dict as in
``cls.__dict__.get("__annotations__", {})``. This
idiom worked because classes stored their annotations
in their ``__dict__``, and accessing them this way
avoided the lookups in the base classes. The technique
relied on implementation details of CPython, so it
was never supported behavior--though it was necessary.
However, when this PEP is active, a class may have
annotations defined but hasn't yet called ``__annotate__``
and cached the result, in which case this approach
would lead to mistakenly assuming the class didn't have
annotations.
In any case, the bug was fixed as of Python 3.10, and the
idiom should no longer be used. Also as of Python 3.10,
there's an
`Annotations HOWTO <https://docs.python.org/3/howto/annotations.html>`_
that defines best practices
for working with annotations; code that follows these
guidelines will work correctly even when this PEP is
active, because it suggests using different approaches
to get annotations from class objects based on the
Python version the code runs under.
Since delaying the evaluation of annotations until they are
introspected changes the semantics of the language, it's observable
from within the language. Therefore it's *possible* to write code
that behaves differently based on whether annotations are
evaluated at binding time or at access time, e.g.
.. code-block::
mytype = str
def foo(a:mytype): pass
mytype = int
print(foo.__annotations__['a'])
This will print ``<class 'str'>`` with stock semantics
and ``<class 'int'>`` when this PEP is active. This is
therefore a backwards-incompatible change. However, this
example is poor programming style, so this change seems
acceptable.
There are two uncommon interactions possible with class
and module annotations that work with stock semantics
that would no longer work when this PEP was active.
These two interactions would have to be prohibited. The
good news is, neither is common, and neither is considered
good practice. In fact, they're rarely seen outside of
Python's own regression test suite. They are:
* *Code that sets annotations on module or class attributes
from inside any kind of flow control statement.* It's
currently possible to set module and class attributes with
annotations inside an ``if`` or ``try`` statement, and it works
as one would expect. It's untenable to support this behavior
when this PEP is active.
* *Code in module or class scope that references or modifies the
local* ``__annotations__`` *dict directly.* Currently, when
setting annotations on module or class attributes, the generated
code simply creates a local ``__annotations__`` dict, then adds
mappings to it as needed. It's possible for user code
to directly modify this dict, though this doesn't seem to be
an intentional feature. Although it would be possible to support
this after a fashion once this PEP was active, the semantics
would likely be surprising and wouldn't make anyone happy.
Note that these are both also pain points for static type checkers,
and are unsupported by those tools. It seems reasonable to
declare that both are at the very least unsupported, and their
use results in undefined behavior. It might be worth making a
small effort to explicitly prohibit them with compile-time checks.
Finally, if this PEP is active, annotation values shouldn't use
the ``if / else`` ternary operator. Although this will work
correctly when accessing ``o.__annotations__`` or requesting
``inspect.VALUE`` from a helper function, the boolean expression
may not compute correctly with ``inspect.FORWARDREF`` when
some names are defined, and would be far less correct with
``inspect.SOURCE``.
Backwards Compatibility With PEP 563 Semantics
==============================================
:pep:`563` changed the semantics of annotations. When its semantics
are active, annotations must assume they will be evaluated in
*module-level* or *class-level* scope. They may no longer refer directly
to local variables in the current function or an enclosing function.
This PEP removes that restriction, and annotations may refer any
local variable.
:pep:`563` requires using ``eval`` (or a helper function like
``typing.get_type_hints`` or ``inspect.get_annotations`` that
uses ``eval`` for you) to convert stringized annotations into
their "real" values. Existing code that activates stringized
annotations, and calls ``eval()`` directly to convert the strings
back into real values, can simply remove the ``eval()`` call.
Existing code using a helper function would continue to work
unchanged, though use of those functions may become optional.
Static typing users often have modules that only contain
inert type hint definitions--but no live code. These modules
are only needed when running static type checking; they aren't
used at runtime. But under stock semantics, these modules
have to be imported in order for the runtime to evaluate and
compute the annotations. Meanwhile, these modules often
caused circular import problems that could be difficult or
even impossible to solve. :pep:`563` allowed users to solve
these circular import problems by doing two things. First,
they activated :pep:`563` in their modules, which meant annotations
were constant strings, and didn't require the real symbols to
be defined in order for the annotations to be computable.
Second, this permitted users to only import the problematic
modules in an ``if typing.TYPE_CHECKING`` block. This allowed
the static type checkers to import the modules and the type
definitions inside, but they wouldn't be imported at runtime.
So far, this approach will work unchanged when this PEP is
active; ``if typing.TYPE_CHECKING`` is supported behavior.
However, some codebases actually *did* examine their
annotations at runtime, even when using the ``if typing.TYPE_CHECKING``
technique and not importing definitions used in their annotations.
These codebases examined the annotation strings *without
evaluating them,* instead relying on identity checks or
simple lexical analysis on the strings.
This PEP supports these technqiues too. But users will need
to port their code to it. First, user code will need to use
``inspect.get_annotations`` or ``typing.get_type_hints`` to
access the annotations; they won't be able to simply get the
``__annotations__`` attribute from their object. Second,
they will need to specify either ``inspect.FORWARDREF``
or ``inspect.SOURCE`` for the ``format`` when calling that
function. This means the helper function can succeed in
producing the annotations dict, even when not all the symbols
are defined. Code expecting stringized annotations should
work unmodified with ``inspect.SOURCE`` formatted annotations
dicts; however, users should consider switching to
``inspect.FORWARDREF``, as it may make their analysis easier.
Similarly, :pep:`563` permitted use of class decorators on
annotated classes in a way that hadn't previously been possible.
Some class decorators (e.g. :mod:`dataclasses`) examine the annotations
on the class. Because class decorators using the ``@`` decorator
syntax are run before the class name is bound, they can cause
unsolvable circular-definition problems. If you annotate attributes
of a class with references to the class itself, or annotate attributes
in multiple classes with circular references to each other, you
can't decorate those classes with the ``@`` decorator syntax
using decorators that examine the annotations. :pep:`563` allowed
this to work, as long as the decorators examined the strings lexically
and didn't use ``eval`` to evaluate them (or handled the ``NameError``
with further workarounds). When this PEP is active, decorators will
be able to compute the annotations dict in ``inspect.SOURCE`` or
``inspect.FORWARDREF`` format using the helper functions. This
will permit them to analyze annotations containing undefined
symbols, in the format they prefer.
Early adopters of :pep:`563` discovered that "stringized"
annotations were useful for automatically-generated documentation.
Users experimented with this use case, and Python's ``pydoc``
has expressed some interest in this technique. This PEP supports
this use case; the code generating the documentation will have to be
updated to use a helper function to access the annotations in
``inspect.SOURCE`` format.
Finally, the warnings about using the ``if / else`` ternary
operator in annotations apply equally to users of :pep:`563`.
It currently works for them, but could produce incorrect
results when requesting some formats from the helper functions.
If this PEP is accepted, :pep:`563` will be deprecated and
eventually removed. To facilitate this transition for early
adopters of :pep:`563`, who now depend on its semantics,
``inspect.get_annotations`` and ``typing.get_type_hints`` will
implement a special affordance.
The Python compiler won't generate annotation code objects
for objects defined in a module where :pep:`563` semantics are
active, even if this PEP is accepted. So, under normal
circumstances, requesting ``inspect.SOURCE`` format from a
helper function would return an empty dict. As an affordance,
to facilitate the transition, if the helper functions detect
that an object was defined in a module with :pep:`563` active,
and the user requests ``inspect.SOURCE`` format, they'll return
the current value of the ``__annotations__`` dict, which in
this case will be the stringized annotations. This will allow
:pep:`563` users who lexically analyze stringized annotations
to immediately change over to requesting ``inspect.SOURCE`` format
from the helper functions, which will hopefully smooth their
transition away from :pep:`563`.
**************
Rejected Ideas
**************
"Just store the strings"
========================
One proposed idea for supporting ``SOURCE`` format was for
the Python compiler to emit the actual source code for the
annotation values somewhere, and to furnish that when
the user requested ``SOURCE`` format.
This idea wasn't rejected so much as categorized as
"not yet". We already know we need to support ``FORWARDREF``
format, and that technique can be adapted to support
``SOURCE`` format in just a few lines. There are many
unanswered questions about this approach:
* Where would we store the strings? Would they always
be loaded when the annotated object was created, or
would they be lazy-loaded on demand? If so, how
would the lazy-loading work?
* Would the "source code" include the newlines and
comments of the original? Would it preserve all
whitespace, including indents and extra spaces used
purely for formatting?
It's possible we'll revisit this topic in the future,
if improving the fidelity of ``SOURCE`` values to the
original source code is judged sufficiently important.
****************
Acknowledgements
****************
Thanks to Carl Meyer, Barry Warsaw, Eric V. Smith,
Mark Shannon, Jelle Ziljstra, and Guido van Rossum for ongoing
feedback and encouragement.
Particular thanks to several individuals who contributed key ideas
that became some of the best aspects of this proposal:
* Carl Meyer suggested the "stringizer" technique that made
``FORWARDREF`` and ``SOURCE`` formats possible, which
allowed making forward progress on this PEP possible after
a year of languishing due to seemingly-unfixable problems.
He also suggested the affordance for :pep:`563` users where
``inspect.SOURCE`` will return the stringized annotations,
and many more suggestions besides. Carl was also the primary
correspondent in private email threads discussing this PEP,
and was a tireless resource and voice of sanity. This PEP
would almost certainly not have been accepted it were it not
for Carl's contributions.
* Mark Shannon suggested building the entire annotations dict
inside a single code object, and only binding it to a function
on demand.
* Guido van Rossum suggested that ``__annotate__``
functions should duplicate the name visibility rules of
annotations under "stock" semantics.
* Jelle Zijlstra contributed not only feedback--but code!
**********
References
**********
* https://github.com/larryhastings/co_annotations/issues
* https://discuss.python.org/t/two-polls-on-how-to-revise-pep-649/23628
* https://discuss.python.org/t/a-massive-pep-649-update-with-some-major-course-corrections/25672
*********
Copyright
*********
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.