python-peps/pep-0589.rst

697 lines
25 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 589
Title: TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys
Author: Jukka Lehtosalo <jukka.lehtosalo@iki.fi>
Sponsor: Guido van Rossum <guido@python.org>
BDFL-Delegate: Guido van Rossum <guido@python.org>
Discussions-To: typing-sig@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Mar-2019
Python-Version: 3.8
Post-History:
Abstract
========
PEP 484 [#PEP-484]_ defines the type ``Dict[K, V]`` for uniform
dictionaries, where each value has the same type, and arbitrary key
values are supported. It doesn't properly support the common pattern
where the type of a dictionary value depends on the string value of
the key. This PEP proposes a type constructor ``typing.TypedDict`` to
support the use case where a dictionary object has a specific set of
string keys, each with a value of a specific type.
Here is an example where PEP 484 doesn't allow us to annotate
satisfactorily::
movie = {'name': 'Blade Runner',
'year': 1982}
This PEP proposes the addition of a new type constructor, called
``TypedDict``, to allow the type of ``movie`` to be represented
precisely::
from typing import TypedDict
class Movie(TypedDict):
name: str
year: int
Now a type checker should accept this code::
movie: Movie = {'name': 'Blade Runner',
'year': 1982}
Motivation
==========
Representing an object or structured data using (potentially nested)
dictionaries with string keys (instead of a user-defined class) is a
common pattern in Python programs. Representing JSON objects is
perhaps the canonical use case, and this is popular enough that Python
ships with a JSON library. This PEP proposes a way to allow such code
to be type checked more effectively.
More generally, representing pure data objects using only Python
primitive types such as dictionaries, strings and lists has had
certain appeal. They are are easy to serialize and deserialize even
when not using JSON. They trivially support various useful operations
with no extra effort, including pretty-printing (through ``str()`` and
the ``pprint`` module), iteration, and equality comparisons.
PEP 484 doesn't properly support the use cases mentioned above. Let's
consider a dictionary object that has exactly two valid string keys,
``'name'`` with value type ``str``, and ``'year'`` with value type
``int``. The PEP 484 type ``Dict[str, Any]`` would be suitable, but
it is too lenient, as arbitrary string keys can be used, and arbitrary
values are valid. Similarly, ``Dict[str, Union[str, int]]`` is too
general, as the value for key ``'name'`` could be an ``int``, and
arbitrary string keys are allowed. Also, the type of a subscription
expression such as ``d['name']`` (assuming ``d`` to be a dictionary of
this type) would be ``Union[str, int]``, which is too wide.
Dataclasses are a more recent alternative to solve this use case, but
there is still a lot of existing code that was written before
dataclasses became available, especially in large existing codebases
where type hinting and checking has proven to be helpful. Unlike
dictionary objects, dataclasses don't directly support JSON
serialization, though there is a third-party package that implements
it [#dataclasses-json]_.
Specification
=============
A TypedDict type represents dictionary objects with a specific set of
string keys, and with specific value types for each valid key. Each
string key can be either required (it must be present) or
non-required (it doesn't need to exist).
This PEP proposes two ways of defining TypedDict types. The first uses
a class-based syntax. The second is an alternative
assignment-based syntax that is provided for backwards compatibility,
to allow the feature to be backported to older Python versions. The
rationale is similar to why PEP 484 supports a comment-based
annotation syntax for Python 2.7: type hinting is particularly useful
for large existing codebases, and these often need to run on older
Python versions. The two syntax options parallel the syntax variants
supported by ``typing.NamedTuple``. Other proposed features include
TypedDict inheritance and totality (specifying whether keys are
required or not).
This PEP also provides a sketch of how a type checker is expected
to support type checking operations involving TypedDict objects.
Similar to PEP 484, this discussion is left somewhat vague on purpose,
to allow experimentation with a wide variety of different type
checking approaches. In particular, type compatibility should be
based on structural compatibility: a more specific TypedDict type can
be compatible with a smaller (more general) TypedDict type.
Class-based Syntax
------------------
A TypedDict type can be defined using the class definition syntax with
``typing.TypedDict`` as the sole base class::
from typing import TypedDict
class Movie(TypedDict):
name: str
year: int
``Movie`` is a TypedDict type with two items: ``'name'`` (with type
``str``) and ``'year'`` (with type ``int``).
A type checker should validate that the body of a class-based
TypedDict definition conforms to the following rules:
* The class body should only contain lines with item definitions of the
form ``key: value_type``, optionally preceded by a docstring. The
syntax for item definitions is identical to attribute annotations,
but there must be no initializer, and the key name actually refers
to the string value of the key instead of an attribute name.
* Type comments cannot be used with the class-based syntax, for
consistency with the class-based ``NamedTuple`` syntax. (Note that
it would not be sufficient to support type comments for backwards
compatibility with Python 2.7, since the class definition may have a
``total`` keyword argument, as discussed below, and this isn't valid
syntax in Python 2.7.) Instead, this PEP provides an alternative,
assignment-based syntax for backwards compatibility, discussed in
`Alternative Syntax`_.
* String literal forward references are valid in the value types.
* Methods are not allowed, since the runtime type of a TypedDict
object will always be just ``dict`` (it is never a subclass of
``dict``).
* Specifying a metaclass is not allowed.
An empty TypedDict can be created by only including ``pass`` in the
body (if there is a docstring, ``pass`` can be omitted)::
class EmptyDict(TypedDict):
pass
Using TypedDict Types
---------------------
Here is an example of how the type ``Movie`` can be used::
movie: Movie = {'name': 'Blade Runner',
'year': 1982}
An explicit ``Movie`` type annotation is generally needed, as
otherwise an ordinary dictionary type could be assumed by a type
checker, for backwards compatibility. When a type checker can infer
that a constructed dictionary object should be a TypedDict, an
explicit annotation can be omitted. A typical example is a dictionary
object as a function argument. In this example, a type checker is
expected to infer that the dictionary argument should be understood as
a TypedDict::
def record_movie(movie: Movie) -> None: ...
record_movie({'name': 'Blade Runner', 'year': 1982})
Another example where a type checker should treat a dictionary display
as a TypedDict is in an assignment to a variable with a previously
declared TypedDict type::
movie: Movie
...
movie = {'name': 'Blade Runner', 'year': 1982}
Operations on ``movie`` can be checked by a static type checker::
movie['director'] = 'Ridley Scott' # Error: invalid key 'director'
movie['year'] = '1982' # Error: invalid value type ("int" expected)
The code below should be rejected, since ``'title'`` is not a valid
key, and the ``'name'`` key is missing::
movie2: Movie = {'title': 'Blade Runner',
'year': 1982}
The created TypedDict type object is not a real class object. Here
are the only uses of the type a type checker is expected to allow:
* It can be used in type annotations and in any context where an
arbitrary type hint is valid, such as in type aliases and as the
target type of a cast.
* It can be used as a callable object with keyword arguments
corresponding to the TypedDict items. Non-keyword arguments are not
allowed. Example::
m = Movie(name='Blade Runner', year=1982)
When called, the TypedDict type object returns an ordinary
dictionary object at runtime::
print(type(m)) # <class 'dict'>
* It can be used as a base class, but only when defining a derived
TypedDict. This is discussed in more detail below.
In particular, TypedDict type objects cannot be used in
``isinstance()`` tests such as ``isinstance(d, Movie)``. The reason is
that there is no existing support for checking types of dictionary
item values, since ``isinstance()`` does not work with many PEP 484
types, including common ones like ``List[str]``. This would be needed
for cases like this::
class Strings(TypedDict):
items: List[str]
print(isinstance({'items': [1]}, Strings)) # Should be False
print(isinstance({'items': ['x']}, Strings)) # Should be True
The above use case is not supported. This is consistent with how
``isinstance()`` is not supported for ``List[str]``.
Inheritance
-----------
It is possible for a TypedDict type to inherit from one or more
TypedDict types using the class-based syntax. In this case the
``TypedDict`` base class should not be included. Example::
class BookBasedMovie(Movie):
based_on: str
Now ``BookBasedMovie`` has keys ``name``, ``year``, and ``based_on``.
It is equivalent to this definition, since TypedDict types use
structural compatibility::
class BookBasedMovie(TypedDict):
name: str
year: int
based_on: str
Here is an example of multiple inheritance::
class X(TypedDict):
x: int
class Y(TypedDict):
y: str
class XYZ(X, Y):
z: bool
The TypedDict ``XYZ`` has three items: ``x`` (type ``int``), ``y``
(type ``str``), and ``z`` (type ``bool``).
A TypedDict cannot inherit from both a TypedDict type and a
non-TypedDict base class.
Totality
--------
By default, all keys must be present in a TypedDict. It is possible
to override this by specifying *totality*. Here is how to do this
using the class-based syntax::
class Movie(TypedDict, total=False):
name: str
year: int
This means that a ``Movie`` TypedDict can have any of the keys omitted. Thus
these are valid::
m: Movie = {}
m2: Movie = {'year': 2015}
A type checker is only expected to support a literal ``False`` or
``True`` as the value of the ``total`` argument. ``True`` is the
default, and makes all items defined in the class body be required.
The totality flag only applies to items defined in the body of the
TypedDict definition. Inherited items won't be affected, and instead
use totality of the TypedDict type where they were defined. This makes
it possible to have a combination of required and non-required keys in
a single TypedDict type.
Alternative Syntax
------------------
This PEP also proposes an alternative syntax that can be backported to
older Python versions such as 3.5 and 2.7 that don't support the
variable definition syntax introduced in PEP 526 [#PEP-526]. It
resembles the traditional syntax for defining named tuples::
Movie = TypedDict('Movie', {'name': str, 'year': int})
It is also possible to specify totality using the alternative syntax::
Movie = TypedDict('Movie',
{'name': str, 'year': int},
total=False)
The semantics are equivalent to the class-based syntax. This syntax
doesn't support inheritance, however, and there is no way to
have both required and non-required fields in a single type. The
motivation for this is keeping the backwards compatible syntax as
simple as possible while covering the most common use cases.
A type checker is only expected to accept a dictionary display expression
as the second argument to ``TypedDict``. In particular, a variable that
refers to a dictionary object does not need to be supported, to simplify
implementation.
Type Consistency
----------------
Informally speaking, *type consistency* is a generalization of the
is-subtype-of relation to support the ``Any`` type. It is defined
more formally in PEP 483 [#PEP-483]_). This section introduces the
new, non-trivial rules needed to support type consistency for
TypedDict types.
First, any TypedDict type is consistent with ``Mapping[str, object]``.
Second, a TypedDict type ``A`` is consistent with TypedDict ``B`` if
``A`` is structurally compatible with ``B``. This is true if and only
if both of these conditions are satisfied:
* For each key in ``B``, ``A`` has the corresponding key and the
corresponding value type in ``A`` is consistent with the value type
in ``B``. For each key in ``B``, the value type in ``B`` is also
consistent with the corresponding value type in ``A``.
* For each required key in ``B``, the corresponding key is required
in ``A``. For each non-required key in ``B``, the corresponding key
is not required in ``A``.
Discussion:
* Value types behave invariantly, since TypedDict objects are mutable.
This is similar to mutable container types such as ``List`` and
``Dict``. Example where this is relevant::
class A(TypedDict):
x: Optional[int]
class B(TypedDict):
x: int
def f(a: A) -> None:
a['x'] = None
b: B = {'x': 0}
f(b) # Type check error: 'B' not compatible with 'A'
b['x'] + 1 # Runtime error: None + 1
* A TypedDict type with a required key is not consistent with a
TypedDict type where the same key is a non-required key, since the
latter allows keys to be deleted. Example where this is relevant::
class A(TypedDict, total=False):
x: int
class B(TypedDict):
x: int
def f(a: A) -> None:
del a['x']
b: B = {'x': 0}
f(b) # Type check error: 'B' not compatible with 'A'
b['x'] + 1 # Runtime KeyError: 'x'
* A TypedDict type ``A`` with no key ``'x'`` is not consistent with a
TypedDict type with a non-required key ``'x'``, since at runtime
the key ``'x'`` could be present and have an incompatible type
(which may not be visible through ``A`` due to structural subtyping).
Example::
class A(TypedDict, total=False):
x: int
y: int
class B(TypedDict, total=False):
x: int
class C(TypedDict, total=False):
x: int
y: str
def f(a: A) -> None:
a[y] = 1
def g(b: B) -> None:
f(b) # Type check error: 'B' incompatible with 'A'
c: C = {'x': 0, 'y': 'foo'}
g(c)
c['y'] + 'bar' # Runtime error: int + str
* A TypedDict isn't consistent with any ``Dict[...]`` type, since
dictionary types allow destructive operations, including
``clear()``. They also allow arbitrary keys to be set, which
would compromise type safety. Example::
class A(TypedDict):
x: int
class B(A):
y: str
def f(d: Dict[str, int]) -> None:
d['y'] = 0
def g(a: A) -> None:
f(a) # Type check error: 'A' incompatible with Dict[str, int]
b: B = {'x': 0, 'y': 'foo'}
g(b)
b['y'] + 'bar' # Runtime error: int + str
* A TypedDict with all ``int`` values is not consistent with
``Mapping[str, int]``, since there may be additional non-``int``
values not visible through the type, due to structural subtyping.
These can be accessed using the ``values()`` and ``items()``
methods in ``Mapping``, for example. Example::
class A(TypedDict):
x: int
class B(TypedDict):
x: int
y: str
def sum_values(m: Mapping[str, int]) -> int:
n = 0
for v in m.values():
n += v # Runtime error
return n
def f(a: A) -> None:
sum_values(a) # Error: 'A' incompatible with Mapping[str, int]
b: B = {'x': 0, 'y': 'foo'}
f(b)
Supported and Unsupported Operations
------------------------------------
Type checkers should support restricted forms of most ``dict``
operations on TypedDict objects. The guiding principle is that
operations not involving ``Any`` types should be rejected by type
checkers if they may violate runtime type safety. Here are some of
the most important type safety violations to prevent:
1. A required key is missing.
2. A value has an invalid type.
3. A key that is not defined in the TypedDict type is added.
A key that is not a literal should generally be rejected, since its
value is unknown during type checking, and thus can cause some of the
above violations. (`Use of Final Values and Literal Types`_
generalizes this to cover final names and literal types.)
The use of a key that is not known to exist should be reported as an
error, even if this wouldn't necessarily generate a runtime type
error. These are often mistakes, and these may insert values with an
invalid type if structural subtyping hides the types of certain items.
For example, ``d['x'] = 1`` should generate a type check error if
``'x'`` is not a valid key for ``d`` (which is assumed to be a
TypedDict type).
Extra keys included in TypedDict object construction should also be
caught. In this example, the ``director`` key is not defined in
``Movie`` and is expected to generate an error from a type checker::
m: Movie = dict(
name='Alien',
year=1979,
director='Ridley Scott') # error: Unexpected key 'director'
Type checkers should reject the following operations on TypedDict
objects as unsafe, even though they are valid for normal dictionaries:
* Operations with arbitrary ``str`` keys (instead of string literals
or other expressions with known string values) should generally be
rejected. This involves both destructive operations such as setting
an item and read-only operations such as subscription expressions.
As an exception to the above rule, ``d.get(e)`` and ``e in d``
should be allowed for TypedDict objects, for an arbitrary expression
``e`` with type ``str``. The motivation is that these are safe and
can be useful for introspecting TypedDict objects. The static type
of ``d.get(e)`` should be ``object`` if the string value of ``e``
cannot be determined statically.
* ``clear()`` is not safe since it could remove required keys, some of
which may not be directly visible because of structural
subtyping. ``popitem()`` is similarly unsafe, even if all known
keys are not required (``total=False``).
* ``del obj['key']`` should be rejected unless ``'key'`` is a
non-required key.
Type checkers may allow reading an item using ``d['x']`` even if
the key ``'x'`` is not required, instead of requiring the use of
``d.get('x')`` or an explicit ``'x' in d`` check. The rationale is
that tracking the existence of keys is difficult to implement in full
generality, and that disallowing this could require many changes to
existing code.
The exact type checking rules are up to each type checker to decide.
In some cases potentially unsafe operations may be accepted if the
alternative is to generate false positive errors for idiomatic code.
Use of Final Values and Literal Types
-------------------------------------
Type checkers should allow final names (PEP 591 [#PEP-591]_) with
string values to be used instead of string literals in operations on
TypedDict objects. For example, this is valid::
YEAR: Final = 'year'
m: Movie = {'name': 'Alien', 'year': 1979}
years_since_epoch = m[YEAR] - 1970
Similarly, an expression with a suitable literal type
(PEP 586 [#PEP-586]_) can be used instead of a literal value::
def get_value(movie: Movie,
key: Literal['year', 'name']) -> Union[int, str]:
return movie[key]
Type checkers are only expected to support actual string literals, not
final names or literal types, for specifying keys in a TypedDict type
definition. Also, only a boolean literal can be used to specify
totality in a TypedDict definition. The motivation for this is to
make type declarations self-contained, and to simplify the
implementation of type checkers.
Backwards Compatibility
=======================
To retain backwards compatibility, type checkers should not infer a
TypedDict type unless it is sufficiently clear that this is desired by
the programmer. When unsure, an ordinary dictionary type should be
inferred. Otherwise existing code that type checks without errors may
start generating errors once TypedDict support is added to the type
checker, since TypedDict types are more restrictive than dictionary
types. In particular, they aren't subtypes of dictionary types.
Reference Implementation
========================
The mypy [#mypy]_ type checker supports TypedDict types. A reference
implementation of the runtime component is provided in the
``typing_extensions`` [#typing_extensions]_ module. The original
implementation was in the ``mypy_extensions`` [#mypy_extensions]_
module.
Rejected Alternatives
=====================
Several proposed ideas were rejected. The current set of features
seem to cover a lot of ground, and it was not not clear which of the
proposed extensions would be more than marginally useful. This PEP
defines a baseline feature that can be potentially extended later.
These are rejected on principle, as incompatible with the spirit of
this proposal:
* TypedDict isn't extensible, and it addresses only a specific use
case. TypedDict objects are regular dictionaries at runtime, and
TypedDict cannot be used with other dictionary-like or mapping-like
classes, including subclasses of ``dict``. There is no way to add
methods to TypedDict types. The motivation here is simplicity.
* TypedDict type definitions could plausibly used to perform runtime
type checking of dictionaries. For example, they could be used to
validate that a JSON object conforms to the schema specified by a
TypedDict type. This PEP doesn't include such functionality, since
the focus of this proposal is static type checking only, and other
existing types do not support this, as discussed in `Class-based
syntax`_. Such functionality can be provided by a third-party
library using the ``typing_inspect`` [#typing_inspect]_ third-party
module, for example.
* TypedDict types can't be used in ``isinstance()`` or ``issubclass()``
checks. The reasoning is similar to why runtime type checks aren't
supported in general with many type hints.
These features were left out from this PEP, but they are potential
extensions to be added in the future:
* TypedDict doesn't support providing a *default value type* for keys
that are not explicitly defined. This would allow arbitrary keys to
be used with a TypedDict object, and only explicitly enumerated keys
would receive special treatment compared to a normal, uniform
dictionary type.
* There is no way to individually specify whether each key is required
or not. No proposed syntax was clear enough, and we expect that
there is limited need for this.
* TypedDict can't be used for specifying the type of a ``**kwargs``
argument. This would allow restricting the allowed keyword
arguments and their types. According to PEP 484, using a TypedDict
type as the type of ``**kwargs`` means that the TypedDict is valid
as the *value* of arbitrary keyword arguments, but it doesn't
restrict which keyword arguments should be allowed. The syntax
``**kwargs: Expand[T]`` has been proposed for this [#expand]_.
Acknowledgements
================
David Foster contributed the initial implementation of TypedDict types
to mypy. Improvements to the implementation have been contributed by
at least the author (Jukka Lehtosalo), Ivan Levkivskyi, Gareth T,
Michael Lee, Dominik Miedzinski, Roy Williams and Max Moroz.
References
==========
.. [#PEP-484] PEP 484, Type Hints, van Rossum, Lehtosalo, Langa
(http://www.python.org/dev/peps/pep-0484)
.. [#dataclasses-json] Dataclasses JSON
(https://github.com/lidatong/dataclasses-json)
.. [#PEP-526] PEP 526, Syntax for Variable Annotations, Gonzalez,
House, Levkivskyi, Roach, van Rossum
(http://www.python.org/dev/peps/pep-0484)
.. [#PEP-483] PEP 483, The Theory of Type Hints, van Rossum, Levkivskyi
(http://www.python.org/dev/peps/pep-0483)
.. [#PEP-591] PEP 591, Adding a final qualifier to typing, Sullivan,
Levkivskyi (http://www.python.org/dev/peps/pep-0591)
.. [#PEP-586] PEP 586, Literal Types, Lee, Levkivskyi, Lehtosalo
(http://www.python.org/dev/peps/pep-0586)
.. [#mypy] http://www.mypy-lang.org/
.. [#typing_extensions]
https://github.com/python/typing/tree/master/typing_extensions
.. [#mypy_extensions] https://github.com/python/mypy_extensions
.. [#typing_inspect] https://github.com/ilevkivskyi/typing_inspect
.. [#expand] https://github.com/python/mypy/issues/4441
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: