python-peps/pep-0649.rst

994 lines
39 KiB
ReStructuredText

PEP: 649
Title: Deferred Evaluation Of Annotations Using Descriptors
Author: Larry Hastings <larry@hastings.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Jan-2021
Post-History: 11-Jan-2021, 11-Apr-2021
Abstract
========
As of Python 3.9, Python supports two different behaviors
for annotations:
* original or "stock" Python semantics, in which annotations
are evaluated at the time they are bound, and
* PEP 563 semantics, currently enabled per-module by
``from __future__ import annotations``, in which annotations
are converted back into strings and must be reparsed and
executed by ``eval()`` to be used.
Original Python semantics created a circular references problem
for static typing analysis. PEP 563 solved that problem--but
its novel semantics introduced new problems, including its
restriction that annotations can only reference names at
module-level scope.
This PEP proposes a third way that embodies the best of both
previous approaches. It solves the same circular reference
problems solved by PEP 563, while otherwise preserving Python's
original annotation semantics, including allowing annotations
to refer to local and class variables.
In this new approach, the code to generate the annotations
dict is written to its own function which computes and returns
the annotations dict. Then, ``__annotations__`` is a "data
descriptor" which calls this annotation function once and
retains the result. This delays the evaluation of annotations
expressions until the annotations are examined, at which point
all circular references have likely been resolved. And if
the annotations are never examined, the function is never
called and the annotations are never computed.
Annotations defined using this PEP's semantics have the same
visibility into the symbol table as annotations under "stock"
semantics--any name visible to an annotation in Python 3.9
is visible to an annotation under this PEP. In addition,
annotations under this PEP can refer to names defined *after*
the annotation is defined, as long as the name is defined in
a scope visible to the annotation. Specifically, when this PEP
is active:
* An annotation can refer to a local variable defined in the
current function scope.
* An annotation can refer to a local variable defined in an
enclosing function scope.
* An annotation can refer to a class variable defined in the
current class scope.
* An annotation can refer to a global variable.
And in all four of these cases, the variable referenced by
the annotation needn't be defined at the time the annotation
is defined--it can be defined afterwards. The only restriction
is that the name or variable be defined before the annotation
is *evaluated.*
If accepted, these new semantics for annotations would initially
be gated behind ``from __future__ import co_annotations``.
However, these semantics would eventually be promoted to be
Python's default behavior. Thus this PEP would *supersede*
PEP 563, and PEP 563's behavior would be deprecated and
eventually removed.
Overview
========
.. note:: The code presented in this section is simplified
for clarity. The intention is to communicate the high-level
concepts involved without getting lost in with the details.
The actual details are often quite different. See the
Implementation_ section later in this PEP for a much more
accurate description of how this PEP works.
Consider this example code:
.. code-block::
def foo(x: int = 3, y: MyType = None) -> float:
...
class MyType:
...
foo_y_type = foo.__annotations__['y']
As we see here, annotations are available at runtime through an
``__annotations__`` attribute on functions, classes, and modules.
When annotations are specified on one of these objects,
``__annotations__`` is a dictionary mapping the names of the
fields to the value specified as that field's annotation.
The default behavior in Python 3.9 is to evaluate the expressions
for the annotations, and build the annotations dict, at the time
the function, class, or module is bound. At runtime the above
code actually works something like this:
.. code-block::
annotations = {'x': int, 'y': MyType, 'return': float}
def foo(x = 3, y = "abc"):
...
foo.__annotations__ = annotations
class MyType:
...
foo_y_type = foo.__annotations__['y']
The crucial detail here is that the values ``int``, ``MyType``,
and ``float`` are looked up at the time the function object is
bound, and these values are stored in the annotations dict.
But this code doesn't run—it throws a ``NameError`` on the first
line, because ``MyType`` hasn't been defined yet.
PEP 563's solution is to decompile the expressions back
into strings, and store those *strings* in the annotations dict.
The equivalent runtime code would look something like this:
.. code-block::
annotations = {'x': 'int', 'y': 'MyType', 'return': 'float'}
def foo(x = 3, y = "abc"):
...
foo.__annotations__ = annotations
class MyType:
...
foo_y_type = foo.__annotations__['y']
This code now runs successfully. However, ``foo_y_type``
is no longer a reference to ``MyType``, it is the *string*
``'MyType'``. The code would have to be further modified to
call ``eval()`` or ``typing.get_type_hints()`` to convert
the string into a useful reference to the actual ``MyType``
object.
This PEP proposes a third approach, delaying the evaluation of
the annotations by computing them in their own function. If
this PEP was active, the generated code would work something
like this:
.. code-block::
class function:
# __annotations__ on a function object is already a
# "data descriptor" in Python, we're just changing what it does
@property
def __annotations__(self):
return self.__co_annotations__()
# ...
def foo_annotations_fn():
return {'x': int, 'y': MyType, 'return': float}
def foo(x = 3, y = "abc"):
...
foo.__co_annotations__ = foo_annotations_fn
class MyType:
...
foo_y_type = foo.__annotations__['y']
The important change is that the code constructing the
annotations dict now lives in a function—here, called
``foo_annotations_fn()``. But this function isn't called
until we ask for the value of ``foo.__annotations__``,
and we don't do that until *after* the definition of ``MyType``.
So this code also runs successfully, and ``foo_y_type`` now
has the correct value--the class ``MyType``--even though
``MyType`` wasn't defined until *after* the annotation was
defined.
Motivation
==========
Python's original semantics for annotations made its use for
static type analysis painful due to forward reference problems.
This was the main justification for PEP 563, and we need not
revisit those arguments here.
However, PEP 563's solution was to decompile code for Python
annotations back into strings at compile time, requiring
users of annotations to ``eval()`` those strings to restore
them to their actual Python values. This has several drawbacks:
* It requires Python implementations to stringize their
annotations. This is surprising behavior—unprecedented
for a language-level feature. Also, adding this feature
to CPython was complicated, and this complicated code would
need to be reimplemented independently by every other Python
implementation.
* It requires that all annotations be evaluated at module-level
scope. Annotations under PEP 563 can no longer refer to
* class variables,
* local variables in the current function, or
* local variables in enclosing functions.
* It requires a code change every time existing code uses an
annotation, to handle converting the stringized
annotation back into a useful value.
* ``eval()`` is slow.
* ``eval()`` isn't always available; it's sometimes removed
from Python for space reasons.
* In order to evaluate the annotations on a class,
it requires obtaining a reference to that class's globals,
which PEP 563 suggests should be done by looking up that class
by name in ``sys.modules``—another surprising requirement for
a language-level feature.
* It adds an ongoing maintenance burden to Python implementations.
Every time the language adds a new feature available in expressions,
the implementation's stringizing code must be updated in
tandem in order to support decompiling it.
This PEP also solves the forward reference problem outlined in
PEP 563 while avoiding the problems listed above:
* Python implementations would generate annotations as code
objects. This is simpler than stringizing, and is something
Python implementations are already quite good at. This means:
- alternate implementations would need to write less code to
implement this feature, and
- the implementation would be simpler overall, which should
reduce its ongoing maintenance cost.
* Existing annotations would not need to be changed to only
use global scope. Actually, annotations would become much
easier to use, as they would now also handle forward
references.
* Code examining annotations at runtime would no longer need
to use ``eval()`` or anything else—it would automatically
see the correct values. This is easier, faster, and
removes the dependency on ``eval()``.
Backwards Compatibility
=======================
PEP 563 changed the semantics of annotations. When its semantics
are active, annotations must assume they will be evaluated in
*module-level* scope. They may no longer refer directly
to local variables or class attributes.
This PEP removes that restriction; annotations may refer to globals,
local variables inside functions, local variables defined in enclosing
functions, and class members in the current class. In addition,
annotations may refer to any of these that haven't been defined yet
at the time the annotation is defined, as long as the not-yet-defined
name is created normally (in such a way that it is known to the symbol
table for the relevant block, or is a global or class variable found
using normal name resolution). Thus, this PEP demonstrates *improved*
backwards compatibility over PEP 563.
PEP 563 also requires using ``eval()`` or ``typing.get_type_hints()``
to examine annotations. Code updated to work with PEP 563 that calls
``eval()`` directly would have to be updated simply to remove the
``eval()`` call. Code using ``typing.get_type_hints()`` would
continue to work unchanged, though future use of that function
would become optional in most cases.
Because this PEP makes semantic changes to how annotations are
evaluated, this PEP will be initially gated with a per-module
``from __future__ import co_annotations`` before it eventually
becomes the default behavior.
Apart from the delay in evaluating values stored in annotations
dicts, this PEP preserves nearly all existing behavior of
annotations dicts. Specifically:
* Annotations dicts are mutable, and any changes to them are
preserved.
* The ``__annotations__`` attribute can be explicitly set,
and any value set this way will be preserved.
* The ``__annotations__`` attribute can be deleted using
the ``del`` statement.
However, there are two uncommon interactions possible with class
and module annotations that work today—both with stock semantics,
and with PEP 563 semantics—that would no longer work when this PEP
was active. These two interactions would have to be prohibited.
The good news is, neither is common, and neither is considered good
practice. In fact, they're rarely seen outside of Python's own
regression test suite. They are:
* *Code that sets annotations on module or class attributes
from inside any kind of flow control statement.* It's
currently possible to set module and class attributes with
annotations inside an ``if`` or ``try`` statement, and it works
as one would expect. It's untenable to support this behavior
when this PEP is active.
* *Code in module or class scope that references or modifies the
local* ``__annotations__`` *dict directly.* Currently, when
setting annotations on module or class attributes, the generated
code simply creates a local ``__annotations__`` dict, then sets
mappings in it as needed. It's also possible for user code
to directly modify this dict, though this doesn't seem like it's
an intentional feature. Although it would be possible to support
this after a fashion when this PEP was active, the semantics
would likely be surprising and wouldn't make anyone happy.
Note that these are both also pain points for static type checkers,
and are unsupported by those checkers. It seems reasonable to
declare that both are at the very least unsupported, and their
use results in undefined behavior. It might be worth making a
small effort to explicitly prohibit them with compile-time checks.
In addition, there are a few operators that would no longer be
valid for use in annotations, because their side effects would
affect the *annotation function* instead of the
class/function/module the annotation was nominally defined in:
* ``:=`` (aka the "walrus operator"),
* ``yield`` and ``yield from``, and
* ``await``.
Use of any of these operators in an annotation will result in a
compile-time error.
Since delaying the evaluation of annotations until they are
evaluated changes the semantics of the language, it's observable
from within the language. Therefore it's possible to write code
that behaves differently based on whether annotations are
evaluated at binding time or at access time, e.g.
.. code-block::
mytype = str
def foo(a:mytype): pass
mytype = int
print(foo.__annotations__['a'])
This will print ``<class 'str'>`` with stock semantics
and ``<class 'int'>`` when this PEP is active. Since
this is poor programming style to begin with, it seems
acceptable that this PEP changes its behavior.
Finally, there's a standard idiom that's actually somewhat common
when accessing class annotations, and which will become more
problematic when this PEP is active: code often accesses class
annotations via ``cls.__dict__.get("__annotations__", {})``
rather than simply ``cls.__annotations__``. It's due to a flaw
in the original design of annotations themselves. This topic
will be examined in a separate discussion; the outcome of
that discussion will likely guide the future evolution of this
PEP.
Mistaken Rejection Of This Approach In November 2017
====================================================
During the early days of discussion around PEP 563,
using code to delay the evaluation of annotations was
briefly discussed, in a November 2017 thread in
``comp.lang.python-dev``. At the time the
technique was termed an "implicit lambda expression".
Guido van Rossum—Python's BDFL at the time—replied,
asserting that these "implicit lambda expression" wouldn't
work, because they'd only be able to resolve symbols at
module-level scope:
IMO the inability of referencing class-level definitions
from annotations on methods pretty much kills this idea.
https://mail.python.org/pipermail/python-dev/2017-November/150109.html
This led to a short discussion about extending lambda-ized
annotations for methods to be able to refer to class-level
definitions, by maintaining a reference to the class-level
scope. This idea, too, was quickly rejected.
PEP 563 summarizes the above discussion here:
https://www.python.org/dev/peps/pep-0563/#keeping-the-ability-to-use-function-local-state-when-defining-annotations
What's puzzling is PEP 563's own changes to the scoping rules
of annotations—it *also* doesn't permit annotations to reference
class-level definitions. It's not immediately clear why an
inability to reference class-level definitions was enough to
reject using "implicit lambda expressions" for annotations,
but was acceptable for stringized annotations.
In retrospect there was probably a pivot during the development
of PEP 563. It seems that, early on, there was a prevailing
assumption that PEP 563 would support references to class-level
definitions. But by the time PEP 563 was finalized, this
assumption had apparently been abandoned. And it looks like
"implicit lambda expressions" were never reconsidered in this
new light.
In any case, annotations are still able to refer to class-level
definitions under this PEP, rendering the objection moot.
.. _Implementation:
Implementation
==============
There's a prototype implementation of this PEP, here:
https://github.com/larryhastings/co_annotations/
As of this writing, all features described in this PEP are
implemented, and there are some rudimentary tests in the
test suite. There are still some broken tests, and the
``co_annotations`` repo is many months behind the
CPython repo.
from __future__ import co_annotations
-------------------------------------
In the prototype, the semantics presented in this PEP are gated with:
.. code-block::
from __future__ import co_annotations
__co_annotations__
------------------
Python supports runtime metadata for annotations for three different
types: function, classes, and modules. The basic approach to
implement this PEP is much the same for all three with only minor
variations.
With this PEP, each of these types adds a new attribute,
``__co_annotations__``. ``__co_annotations__`` is a function:
it takes no arguments, and must return either ``None`` or a dict
(or subclass of dict). It adds the following semantics:
* ``__co_annotations__`` is always set, and may contain either
``None`` or a callable.
* ``__co_annotations__`` cannot be deleted.
* ``__annotations__`` and ``__co_annotations__`` can't both
be set to a useful value simultaneously:
- If you set ``__annotations__`` to a dict, this also sets
``__co_annotations__`` to None.
- If you set ``__co_annotations__`` to a callable, this also
deletes ``__annotations__``
Internally, ``__co_annotations__`` is a "data descriptor",
where functions are called whenever user code gets, sets,
or deletes the attribute. In all three cases, the object
has separate internal storage for the current value
of the ``__co_annotations__`` attribute.
``__annotations__`` is also as a data descriptor, with its own
separate internal storage for its internal value. The code
implementing the "get" for ``__annotations__`` works something
like this:
.. code-block::
if (the internal value is set)
return the internal annotations dict
if (__co_annotations__ is not None)
call the __co_annotations__ function
if the result is a dict:
store the result as the internal value
set __co_annotations__ to None
return the internal value
do whatever this object does when there are no annotations
Unbound code objects
--------------------
When Python code defines one of these three objects with
annotations, the Python compiler generates a separate code
object which builds and returns the appropriate annotations
dict. Wherever possible, the "annotation code object" is
then stored *unbound* as the internal value of
``__co_annotations__``; it is then bound on demand when
the user asks for ``__annotations__``.
This is a useful optimization for both speed and memory
consumption. Python processes rarely examine annotations
at runtime. Therefore, pre-binding these code objects to
function objects would usually be a waste of resources.
When is this optimization not possible?
* When an annotation function contains references to
free variables, in the current function or in an
outer function.
* When an annotation function is defined on a method
(a function defined inside a class) and the annotations
possibly refer directly to class variables.
Note that user code isn't permitted to directly access these
unbound code objects. If the user "gets" the value of
``__co_annotations__``, and the internal value of
``__co_annotations__`` is an unbound code object,
it immediately binds the code object, and the resulting
function object is stored as the new value of
``__co_annotations__`` and returned.
(However, these unbound code objects *are* stored in the
``.pyc`` file. So a determined user could examine them
should that be necessary for some reason.)
Function Annotations
--------------------
When compiling a function, the CPython bytecode compiler
visits the annotations for the function all in one place,
starting with ``compiler_visit_annotations()`` in ``compile.c``.
If there are any annotations, they create the scope for
the annotations function on demand, and
``compiler_visit_annotations()`` assembles it.
The code object is passed in place of the annotations dict
for the ``MAKE_FUNCTION`` bytecode instruction.
``MAKE_FUNCTION`` supports a new bit in its oparg
bitfield, ``0x10``, which tells it to expect a
``co_annotations`` code object on the stack.
The bitfields for ``annotations`` (``0x04``) and
``co_annotations`` (``0x10``) are mutually exclusive.
When binding an unbound annotation code object, a function will
use its own ``__globals__`` as the new function's globals.
One quirk of Python: you can't actually remove the annotations
from a function object. If you delete the ``__annotations__``
attribute of a function, then get its ``__annotations__`` member,
it will create an empty dict and use that as its
``__annotations__``. The implementation of this PEP maintains
this quirk for backwards compatibility.
Class Annotations
-----------------
When compiling a class body, the compiler maintains two scopes:
one for the normal class body code, and one for annotations.
(This is facilitated by four new functions: ``compiler.c``
adds ``compiler_push_scope()`` and ``compiler_pop_scope()``,
and ``symtable.c`` adds ``symtable_push_scope()`` and
``symtable_pop_scope()``.)
Once the code generator reaches the end of the class body,
but before it generates the bytecode for the class body,
it assembles the bytecode for ``__co_annotations__``, then
assigns that to ``__co_annotations__`` using ``STORE_NAME``.
It also sets a new ``__globals__`` attribute. Currently it
does this by calling ``globals()`` and storing the result.
(Surely there's a more elegant way to find the class's
globals--but this was good enough for the prototype.) When
binding an unbound annotation code object, a class will use
the value of this ``__globals__`` attribute. When the class
drops its reference to the unbound code object--either because
it has bound it to a function, or because ``__annotations__``
has been explicitly set--it also deletes its ``__globals__``
attribute.
As discussed above, examination or modification of
``__annotations__`` from within the class body is no
longer supported. Also, any flow control (``if`` or ``try`` blocks)
around declarations of members with annotations is unsupported.
If you delete the ``__annotations__`` attribute of a class,
then get its ``__annotations__`` member, it will return the
annotations dict of the first base class with annotations set.
If no base classes have annotations set, it will raise
``AttributeError``.
Although it's an implementation-specific detail, currently
classes store the internal value of ``__co_annotations__``
in their ``tp_dict`` under the same name.
Module Annotations
------------------
Module annotations work much the same as class annotations.
The main difference is, a module uses its own dict as the
``__globals__`` when binding the function.
If you delete the ``__annotations__`` attribute of a class,
then get its ``__annotations__`` member, the module will
raise ``AttributeError``.
Annotations With Closures
-------------------------
It's possible to write annotations that refer to
free variables, and even free variables that have yet
to be defined. For example:
.. code-block::
from __future__ import co_annotations
def outer():
def middle():
def inner(a:mytype, b:mytype2): pass
mytype = str
return inner
mytype2 = int
return middle()
fn = outer()
print(fn.__annotations__)
At the time ``fn`` is set, ``inner.__co_annotations__()``
hasn't been run. So it has to retain a reference to
the *future* definitions of ``mytype`` and ``mytype2`` if
it is to correctly evaluate its annotations.
If an annotation function refers to a local variable
from the current function scope, or a free variable
from an enclosing function scope--if, in CPython, the
annotation function code object contains one or more
``LOAD_DEREF`` opcodes--then the annotation code object
is bound at definition time with references to these
variables. ``LOAD_DEREF`` instructions require the annotation
function to be bound with special run-time information
(in CPython, a ``freevars`` array). Rather than store
that separately and use that to later lazy-bind the
function object, the current implementation simply
early-binds the function object.
Note that, since the annotation function ``inner.__co_annotations__()``
is defined while parsing ``outer()``, from Python's perspective
the annotation function is a "nested function". So "local
variable inside the 'current' function" and "free variable
from an enclosing function" are, from the perspective of
the annotation function, the same thing.
Annotations That Refer To Class Variables
-----------------------------------------
It's possible to write annotations that refer to
class variables, and even class variables that haven't
yet been defined. For example:
.. code-block::
from __future__ import co_annotations
class C:
def method(a:mytype): pass
mytype = str
print(C.method.__annotations__)
Internally, annotation functions are defined as
a new type of "block" in CPython's symbol table
called an ``AnnotationBlock``. An ``AnnotationBlock``
is almost identical to a ``FunctionBlock``. It differs
in that it's permitted to see names from an enclosing
class scope. (Again: annotation functions are functions,
and they're defined *inside* the same scope as
the thing they're being defined on. So in the above
example, the annotation function for ``C.method()``
is defined inside ``C``.)
If it's possible that an annotation function refers
to class variables--if all these conditions are true:
* The annotation function is being defined inside
a class scope.
* The generated code for the annotation function
has at least one ``LOAD_NAME`` instruction.
Then the annotation function is bound at the time
it's set on the class/function, and this binding
includes a reference to the class dict. The class
dict is pushed on the stack, and the ``MAKE_FUNCTION``
bytecode instruction takes a new second bitfield (0x20)
indicating that it should consume that stack argument
and store it as ``__locals__`` on the newly created
function object.
Then, at the time the function is executed, the
``f_locals`` field of the frame object is set to
the function's ``__locals__``, if set. This permits
``LOAD_NAME`` opcodes to work normally, which means
the code generated for annotation functions is nearly
identical to that generated for conventional Python
functions.
Interactive REPL Shell
----------------------
Everything works the same inside Python's interactive REPL shell,
except for module annotations in the interactive module (``__main__``)
itself. Since that module is never "finished", there's no specific
point where we can compile the ``__co_annotations__`` function.
For the sake of simplicity, in this case we forego delayed evaluation.
Module-level annotations in the REPL shell will continue to work
exactly as they do today, evaluating immediately and setting the
result directly inside the ``__annotations__`` dict.
(It might be possible to support delayed evaluation here.
But it gets complicated quickly, and for a nearly-non-existent
use case.)
Annotations On Local Variables Inside Functions
-----------------------------------------------
Python supports syntax for local variable annotations inside
functions. However, these annotations have no runtime
effect--they're discarded at compile-time. Therefore, this
PEP doesn't need to do anything to support them, the same
as stock semantics and PEP 563.
Performance Comparison
----------------------
Performance with this PEP should be favorable, when compared with either
stock behavior or PEP 563. In general, resources are only consumed
on demand—"you only pay for what you use".
There are three scenarios to consider:
* the runtime cost when annotations aren't defined,
* the runtime cost when annotations are defined but *not* referenced, and
* the runtime cost when annotations are defined *and* referenced.
We'll examine each of these scenarios in the context of all three
semantics for annotations: stock, PEP 563, and this PEP.
When there are no annotations, all three semantics have the same
runtime cost: zero. No annotations dict is created and no code is
generated for it. This requires no runtime processor time and
consumes no memory.
When annotations are defined but not referenced, the runtime cost
of Python with this PEP should be roughly equal to or slightly better
than PEP 563 semantics, and slightly better than "stock" Python
semantics. The specifics depend on the object being annotated:
* With stock semantics, the annotations dict is always built, and
set as an attribute of the object being annotated.
* In PEP 563 semantics, for function objects, a single constant
(a tuple) is set as an attribute of the function. For class and
module objects, the annotations dict is always built and set as
an attribute of the class or module.
* With this PEP, a single object is set as an attribute of the
object being annotated. Most often, this object is a constant
(a code object). In cases where the annotation refers to local
variables or class variables, the code object will be bound to
a function object, and the function object is set as the attribute
of the object being annotated.
When annotations are both defined and referenced, code using
this PEP should be much faster than code using PEP 563 semantics,
and equivalent to or slightly improved over original Python
semantics. PEP 563 semantics requires invoking ``eval()`` for
every value inside an annotations dict, which is enormously slow.
And, as already mentioned, this PEP generates measurably more
efficient bytecode for class and module annotations than stock
semantics; for function annotations, this PEP and stock semantics
should be roughly equivalent.
Memory use should also be comparable in all three scenarios across
all three semantic contexts. In the first and third scenarios,
memory usage should be roughly equivalent in all cases.
In the second scenario, when annotations are defined but not
referenced, using this PEP's semantics will mean the
function/class/module will store one unused code object (possibly
bound to an unused function object); with the other two semantics,
they'll store one unused dictionary (or constant tuple).
Bytecode Comparison
-------------------
The bytecode generated for annotations functions with
this PEP uses the efficient ``BUILD_CONST_KEY_MAP`` opcode
to build the dict for all annotatable objects:
functions, classes, and modules.
Stock semantics also uses ``BUILD_CONST_KEY_MAP`` bytecode
for function annotations. PEP 563 has an even more efficient
method for building annotations dicts on functions, leveraging
the fact that its annotations dicts only contain strings for
both keys and values. At compile-time it constructs a tuple
containing pairs of keys and values at compile-time, then
at runtime it converts that tuple into a dict on demand.
This is a faster technique than either stock semantics
or this PEP can employ, because in those two cases
annotations dicts can contain Python values of any type.
Of course, this performance win is negated if the
annotations are examined, due to the overhead of ``eval()``.
For class and module annotations, both stock semantics
and PEP 563 generate a longer and slightly-less-efficient
stanza of bytecode, creating the dict and setting the
annotations individually.
For Future Discussion
=====================
Circular Imports
----------------
There is one unfortunately-common scenario where PEP 563
currently provides a better experience, and it has to do
with large code bases, with circular dependencies and
imports, that examine their annotations at run-time.
PEP 563 permitted defining *and examining* invalid
expressions as annotations. Its implementation requires
annotations to be legal Python expressions, which it then
converts into strings at compile-time. But legal Python
expressions may not be computable at runtime, if for
example the expression references a name that isn't defined.
This is a problem for stringized annotations if they're
evaluated, e.g. with ``typing.get_type_hints()``. But
any stringized annotation may be examined harmlessly at
any time--as long as you don't evaluate it, and only
examine it as a string.
Some large organizations have code bases that unfortunately
have circular dependency problems with their annotations--class
A has methods annotated with class B, but class B has methods
annotated with class A--that can be difficult to resolve.
Since PEP 563 stringizes their annotations, it allows them
to leave these circular dependencies in place, and they can
sidestep the circular import problem by never importing the
module that defines the types used in the annotations. Their
annotations can no longer be evaluated, but this appears not
to be a concern in practice. They can then examine the
stringized form of the annotations at runtime and this seems
to be sufficient for their needs.
This PEP allows for many of the same behaviors.
Annotations must be legal Python expressions, which
are compiled into a function at compile-time.
And if the code never examines an annotation, it won't
have any runtime effect, so here too annotations can
harmlessly refer to undefined names. (It's exactly
like defining a function that refers to undefined
names--then never calling that function. Until you
call the function, nothing bad will happen.)
But examining an annotation when this PEP is active
means evaluating it, which means the names evaluated
in that expression must be defined. An undefined name
will throw a ``NameError`` in an annotation function,
just as it would with a stringized annotation passed
in to ``typing.get_type_hints()``, and just like any
other context in Python where an expression is evaluated.
In discussions we have yet to find a solution to this
problem that makes all the participants in the
conversation happy. There are various avenues to explore
here:
* One workaround is to continue to stringize one's
annotations, either by hand or done automatically
by the Python compiler (as it does today with
``from __future__ import annotations``). This might
mean preserving Python's current stringizing annotations
going forward, although leaving it turned off by default,
only available by explicit request (though likely with
a different mechanism than
``from __future__ import annotations``).
* Another possible workaround involves importing
the circularly-dependent modules separately, then
externally adding ("monkey-patching") their dependencies
to each other after the modules are loaded. As long
as the modules don't examine their annotations until
after they are completely loaded, this should work fine
and be maintainable with a minimum of effort.
* A third and more radical approach would be to change the
semantics of annotations so that they don't raise a
``NameError`` when an unknown name is evaluated,
but instead create some sort of proxy "reference" object.
* Of course, even if we do deprecate PEP 563, it will be
several releases before the functionality is removed,
giving us several years in which to research and innovate
new solutions for this problem.
In any case, the participants of the discussion agree that
this PEP should still move forward, even as this issue remains
currently unresolved [1]_.
.. [1] https://github.com/larryhastings/co_annotations/issues/1
cls.__globals__ and fn.__locals__
---------------------------------
Is it permissible to add the ``__globals__`` reference to class
objects as proposed here? It's not clear why this hasn't already
been done; PEP 563 could have made use of class globals, but instead
made do with looking up classes inside ``sys.modules``. Python
seems strangely allergic to adding a ``__globals__`` reference to
class objects.
If adding ``__globals__`` to class objects is indeed a bad idea
(for reasons I don't know), here are two alternatives as to
how classes could get a reference to their globals for the
implementation of this PEP:
* The generate code for a class could bind its annotations code
object to a function at the time the class is bound, rather than
waiting for ``__annotations__`` to be referenced, making them an
exception to the rule (even though "special cases aren't special
enough to break the rules"). This would result in a small
additional runtime cost when annotations were defined but not
referenced on class objects. Honestly I'm more worried about
the lack of symmetry in semantics. (But I wouldn't want to
pre-bind all annotations code objects, as that would become
much more costly for function objects, even as annotations are
rarely used at runtime.)
* Use the class's ``__module__`` attribute to look up its module
by name in ``sys.modules``. This is what PEP 563 advises.
While this is passable for userspace or library code, it seems
like a little bit of a code smell for this to be defined semantics
baked into the language itself.
Also, the prototype gets globals for class objects by calling
``globals()`` then storing the result. I'm sure there's a much
faster way to do this, I just didn't know what it was when I was
prototyping. I'm sure we can revise this to something much faster
and much more sanitary. I'd prefer to make it completely internal
anyway, and not make it visible to the user (via this new
__globals__ attribute). There's possibly already a good place to
put it anyway--``ht_module``.
Similarly, this PEP adds one new dunder member to functions,
classes, and modules (``__co_annotations__``), and a second new
dunder member to functions (``__locals__``). This might be
considered excessive.
Bikeshedding the name
---------------------
During most of the development of this PEP, user code actually
could see the raw annotation code objects. ``__co_annotations__``
could only be set to a code object; functions and other callables
weren't permitted. In that context the name ``co_annotations``
makes a lot of sense. But with this last-minute pivot where
``__co_annotations__`` now presents itself as a callable,
perhaps the name of the attribute and the name of the
``from __future__ import`` needs a re-think.
Acknowledgements
================
Thanks to Barry Warsaw, Eric V. Smith, Mark Shannon,
and Guido van Rossum for feedback and encouragement.
Thanks in particular to Mark Shannon for two key
suggestions—build the entire annotations dict inside
a single code object, and only bind it to a function
on demand—that quickly became among the best aspects
of this proposal. Also, thanks in particular to Guido
van Rossum for suggesting that ``__co_annotations__``
functions should duplicate the name visibility rules of
annotations under "stock" semantics--this resulted in
a sizeable improvement to the second draft. Finally,
special thanks to Jelle Zijlstra, who contributed not
just feedback--but code!
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: