python-peps/pep-0560.rst

330 lines
12 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 560
Title: Core support for typing module and generic types
Author: Ivan Levkivskyi <levkivskyi@gmail.com>
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 03-Sep-2017
Python-Version: 3.7
Post-History: 09-Sep-2017, 14-Nov-2017
Resolution: https://mail.python.org/pipermail/python-dev/2017-December/151038.html
Abstract
========
Initially :pep:`484` was designed in such way that it would not introduce
*any* changes to the core CPython interpreter. Now type hints and
the ``typing`` module are extensively used by the community, e.g. :pep:`526`
and :pep:`557` extend the usage of type hints, and the backport of ``typing``
on PyPI has 1M downloads/month. Therefore, this restriction can be removed.
It is proposed to add two special methods ``__class_getitem__`` and
``__mro_entries__`` to the core CPython for better support of
generic types.
Rationale
=========
The restriction to not modify the core CPython interpreter led to some
design decisions that became questionable when the ``typing`` module started
to be widely used. There are three main points of concern:
performance of the ``typing`` module, metaclass conflicts, and the large
number of hacks currently used in ``typing``.
Performance
-----------
The ``typing`` module is one of the heaviest and slowest modules in
the standard library even with all the optimizations made. Mainly this is
because subscripted generic types (see :pep:`484` for definition of terms used
in this PEP) are class objects (see also [1]_). There are three main ways how
the performance can be improved with the help of the proposed special methods:
- Creation of generic classes is slow since the ``GenericMeta.__new__`` is
very slow; we will not need it anymore.
- Very long method resolution orders (MROs) for generic classes will be
half as long; they are present because we duplicate the ``collections.abc``
inheritance chain in ``typing``.
- Instantiation of generic classes will be faster (this is minor however).
Metaclass conflicts
-------------------
All generic types are instances of ``GenericMeta``, so if a user uses
a custom metaclass, then it is hard to make a corresponding class generic.
This is particularly hard for library classes that a user doesn't control.
A workaround is to always mix-in ``GenericMeta``::
class AdHocMeta(GenericMeta, LibraryMeta):
pass
class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
...
but this is not always practical or even possible. With the help of the
proposed special attributes the ``GenericMeta`` metaclass will not be needed.
Hacks and bugs that will be removed by this proposal
----------------------------------------------------
- ``_generic_new`` hack that exists because ``__init__`` is not called on
instances with a type differing from the type whose ``__new__`` was called,
``C[int]().__class__ is C``.
- ``_next_in_mro`` speed hack will be not necessary since subscription will
not create new classes.
- Ugly ``sys._getframe`` hack. This one is particularly nasty since it looks
like we can't remove it without changes outside ``typing``.
- Currently generics do dangerous things with private ABC caches
to fix large memory consumption that grows at least as O(N\ :sup:`2`),
see [2]_. This point is also important because it was recently proposed to
re-implement ``ABCMeta`` in C.
- Problems with sharing attributes between subscripted generics,
see [3]_. The current solution already uses ``__getattr__`` and ``__setattr__``,
but it is still incomplete, and solving this without the current proposal
will be hard and will need ``__getattribute__``.
- ``_no_slots_copy`` hack, where we clean up the class dictionary on every
subscription thus allowing generics with ``__slots__``.
- General complexity of the ``typing`` module. The new proposal will not
only allow to remove the above-mentioned hacks/bugs, but also simplify
the implementation, so that it will be easier to maintain.
Specification
=============
``__class_getitem__``
---------------------
The idea of ``__class_getitem__`` is simple: it is an exact analog of
``__getitem__`` with an exception that it is called on a class that
defines it, not on its instances. This allows us to avoid
``GenericMeta.__getitem__`` for things like ``Iterable[int]``.
The ``__class_getitem__`` is automatically a class method and
does not require ``@classmethod`` decorator (similar to
``__init_subclass__``) and is inherited like normal attributes.
For example::
class MyList:
def __getitem__(self, index):
return index + 1
def __class_getitem__(cls, item):
return f"{cls.__name__}[{item.__name__}]"
class MyOtherList(MyList):
pass
assert MyList()[0] == 1
assert MyList[int] == "MyList[int]"
assert MyOtherList()[0] == 1
assert MyOtherList[int] == "MyOtherList[int]"
Note that this method is used as a fallback, so if a metaclass defines
``__getitem__``, then that will have the priority.
``__mro_entries__``
-------------------
If an object that is not a class object appears in the tuple of bases of
a class definition, then method ``__mro_entries__`` is searched on it.
If found, it is called with the original tuple of bases as an argument.
The result of the call must be a tuple, that is unpacked in the base classes
in place of this object. (If the tuple is empty, this means that the original
bases is simply discarded.) If there are more than one object with
``__mro_entries__``, then all of them are called with the same original tuple
of bases. This step happens first in the process of creation of a class,
all other steps, including checks for duplicate bases and MRO calculation,
happen normally with the updated bases.
Using the method API instead of just an attribute is necessary to avoid
inconsistent MRO errors, and perform other manipulations that are currently
done by ``GenericMeta.__new__``. The original bases are stored as
``__orig_bases__`` in the class namespace (currently this is also done by
the metaclass). For example::
class GenericAlias:
def __init__(self, origin, item):
self.origin = origin
self.item = item
def __mro_entries__(self, bases):
return (self.origin,)
class NewList:
def __class_getitem__(cls, item):
return GenericAlias(cls, item)
class Tokens(NewList[int]):
...
assert Tokens.__bases__ == (NewList,)
assert Tokens.__orig_bases__ == (NewList[int],)
assert Tokens.__mro__ == (Tokens, NewList, object)
Resolution using ``__mro_entries__`` happens *only* in bases of a class
definition statement. In all other situations where a class object is
expected, no such resolution will happen, this includes ``isinstance``
and ``issubclass`` built-in functions.
NOTE: These two method names are reserved for use by the ``typing`` module
and the generic types machinery, and any other use is discouraged.
The reference implementation (with tests) can be found in [4]_, and
the proposal was originally posted and discussed on the ``typing`` tracker,
see [5]_.
Dynamic class creation and ``types.resolve_bases``
--------------------------------------------------
``type.__new__`` will not perform any MRO entry resolution. So that a direct
call ``type('Tokens', (List[int],), {})`` will fail. This is done for
performance reasons and to minimize the number of implicit transformations.
Instead, a helper function ``resolve_bases`` will be added to
the ``types`` module to allow an explicit ``__mro_entries__`` resolution in
the context of dynamic class creation. Correspondingly, ``types.new_class``
will be updated to reflect the new class creation steps while maintaining
the backwards compatibility::
def new_class(name, bases=(), kwds=None, exec_body=None):
resolved_bases = resolve_bases(bases) # This step is added
meta, ns, kwds = prepare_class(name, resolved_bases, kwds)
if exec_body is not None:
exec_body(ns)
ns['__orig_bases__'] = bases # This step is added
return meta(name, resolved_bases, ns, **kwds)
Using ``__class_getitem__`` in C extensions
-------------------------------------------
As mentioned above, ``__class_getitem__`` is automatically a class method
if defined in Python code. To define this method in a C extension, one
should use flags ``METH_O|METH_CLASS``. For example, a simple way to make
an extension class generic is to use a method that simply returns the
original class objects, thus fully erasing the type information at runtime,
and deferring all check to static type checkers only::
typedef struct {
PyObject_HEAD
/* ... your code ... */
} SimpleGeneric;
static PyObject *
simple_class_getitem(PyObject *type, PyObject *item)
{
Py_INCREF(type);
return type;
}
static PyMethodDef simple_generic_methods[] = {
{"__class_getitem__", simple_class_getitem, METH_O|METH_CLASS, NULL},
/* ... other methods ... */
};
PyTypeObject SimpleGeneric_Type = {
PyVarObject_HEAD_INIT(NULL, 0)
"SimpleGeneric",
sizeof(SimpleGeneric),
0,
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
.tp_methods = simple_generic_methods,
};
Such class can be used as a normal generic in Python type annotations
(a corresponding stub file should be provided for static type checkers,
see :pep:`484` for details)::
from simple_extension import SimpleGeneric
from typing import TypeVar
T = TypeVar('T')
Alias = SimpleGeneric[str, T]
class SubClass(SimpleGeneric[T, int]):
...
data: Alias[int] # Works at runtime
more_data: SubClass[str] # Also works at runtime
Backwards compatibility and impact on users who don't use ``typing``
====================================================================
This proposal may break code that currently uses the names
``__class_getitem__`` and ``__mro_entries__``. (But the language
reference explicitly reserves *all* undocumented dunder names, and
allows "breakage without warning"; see [6]_.)
This proposal will support almost complete backwards compatibility with
the current public generic types API; moreover the ``typing`` module is still
provisional. The only two exceptions are that currently
``issubclass(List[int], List)`` returns True, while with this proposal it will
raise ``TypeError``, and ``repr()`` of unsubscripted user-defined generics
cannot be tweaked and will coincide with ``repr()`` of normal (non-generic)
classes.
With the reference implementation I measured negligible performance effects
(under 1% on a micro-benchmark) for regular (non-generic) classes. At the same
time performance of generics is significantly improved:
* ``importlib.reload(typing)`` is up to 7x faster
* Creation of user defined generic classes is up to 4x faster (on a
micro-benchmark with an empty body)
* Instantiation of generic classes is up to 5x faster (on a micro-benchmark
with an empty ``__init__``)
* Other operations with generic types and instances (like method lookup and
``isinstance()`` checks) are improved by around 10-20%
* The only aspect that gets slower with the current proof of concept
implementation is the subscripted generics cache look-up. However it was
already very efficient, so this aspect gives negligible overall impact.
References
==========
.. [1] Discussion following Mark Shannon's presentation at Language Summit
(https://github.com/python/typing/issues/432)
.. [2] Pull Request to implement shared generic ABC caches (merged)
(https://github.com/python/typing/pull/383)
.. [3] An old bug with setting/accessing attributes on generic types
(https://github.com/python/typing/issues/392)
.. [4] The reference implementation
(https://github.com/ilevkivskyi/cpython/pull/2/files,
https://github.com/ilevkivskyi/cpython/tree/new-typing)
.. [5] Original proposal
(https://github.com/python/typing/issues/468)
.. [6] Reserved classes of identifiers
(https://docs.python.org/3/reference/lexical_analysis.html#reserved-classes-of-identifiers)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: