PEP 586: Literal Types (#928)

This commit is contained in:
Michael Lee 2019-03-14 17:29:08 -07:00 committed by Guido van Rossum
parent 375db992d5
commit b870ad2a45
1 changed files with 737 additions and 0 deletions

737
pep-0586.rst Normal file
View File

@ -0,0 +1,737 @@
PEP: 586
Title: Literal Types
Author: Michael Lee <michael.lee.0x2a@gmail.com>, Ivan Levkivskyi <levkivskyi@gmail.com>, Jukka Lehtosalo <jukka.lehtosalo@iki.fi>
Discussions-To: Typing-Sig <typing-sig@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 14-Mar-2018
Python-Version: 3.8
Post-History: 14-Mar-2018
Abstract
========
This PEP proposes adding *Literal types* to the PEP 484 ecosystem.
Literal types indicate that some expression has literally a
specific value. For example, the following function will accept
only expressions that have literally the value "4"::
from typing import Literal
def accepts_only_four(x: Literal[4]) -> None:
pass
accepts_only_four(4) # Ok
accepts_only_four(19) # Rejected
Motivation and Rationale
========================
Python has many APIs that return different types depending on the
value of some argument provided. For example:
- ``open(filename, mode)`` returns either ``IO[bytes]`` or ``IO[Text]``
depending on whether the second argument is something like ``r`` or
``rb``.
- ``subprocess.check_output(...)`` returns either bytes or text
depending on whether the ``universal_newlines`` keyword argument is
set to ``True`` or not.
This pattern is also fairly common in many popular 3rd party libraries.
For example, here are just two examples from pandas and numpy respectively:
- ``pandas.concat(...)`` will return either ``Series`` or
``DataFrame`` depending on whether the ``axis`` argument is set to
0 or 1.
- ``numpy.unique`` will return either a single array or a tuple containing
anywhere from two to four arrays depending on three boolean flag values.
The typing issue tracker contains some
`additional examples and discussion <typing-discussion_>`_.
There is currently no way of expressing the type signatures of these
functions: PEP 484 does not include any mechanism for writing signatures
where the return type varies depending on the value passed in.
Note that this problem persists even if we redesign these APIs to
instead accept enums: ``MyEnum.FOO`` and ``MyEnum.BAR`` are both
considered to be of type ``MyEnum``.
Currently, type checkers work around this limitation by adding ad-hoc
extensions for important builtins and standard library functions. For
example mypy comes bundled with a plugin that attempts to infer more
precise types for ``open(...)``. While this approach works for standard
library functions, its unsustainable in general: its not reasonable to
expect 3rd party library authors to maintain plugins for N different
type checkers.
We propose adding *Literal types* to address these gaps.
Core Semantics
==============
This section outlines the baseline behavior of literal types.
Core behavior
-------------
Literal types indicate a variable has a specific and
concrete value. For example, if we define some variable ``foo`` to have
type ``Literal[3]``, we are declaring that ``foo`` must be exactly equal
to ``3`` and no other value.
Given some value ``v`` that is a member of type ``T``, the type
``Literal[v]`` shall be treated as a subtype of ``T``. For example,
``Literal[3]`` is a subtype of ``int``.
All methods from the parent type will be directly inherited by the
literal type. So, if we have some variable ``foo`` of type ``Literal[3]``
its safe to do things like ``foo + 5`` since ``foo`` inherits ints
``__add__`` method. The resulting type of ``foo + 5`` is ``int``.
This "inheriting" behavior is identical to how we
`handle NewTypes. <newtypes_>`_.
Equivalence of two Literals
---------------------------
Two types ``Literal[v1]`` and ``Literal[v2]`` are equivalent when
both of the following conditions are true:
1. ``type(v1) == type(v2)``
2. ``v1 == v2``
For example, ``Literal[20]`` and ``Literal[0x14]`` are equivalent.
However, ``Literal[0]`` and ``Literal[False]`` is *not* equivalent
despite that ``0 == False`` evaluates to 'true' at runtime: ``0``
has type ``int`` and ``False`` has type ``bool``.
Shortening unions of literals
-----------------------------
Literals are parameterized with one or more values. When a Literal is
parameterized with more than one value, it's treated as exactly equivalent
to the union of those types. That is, ``Literal[v1, v2, v3]`` is equivalent
to ``Union[Literal[v1], Literal[v2], Literal[v3]]``.
This shortcut helps make writing signatures for functions that accept
many different literals more ergonomic — for example, functions like
``open(...)``::
# Note: this is a simplification of the true type signature.
_PathType = Union[str, bytes, int]
@overload
def open(path: _PathType,
mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
) -> IO[Text]: ...
@overload
def open(path: _PathType,
mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
) -> IO[bytes]: ...
# Fallback overload for when the user isn't using literal types
@overload
def open(path: _PathType, mode: str) -> IO[Any]: ...
The provided values do not all have to be members of the same type.
For example, ``Literal[42, "foo", True]`` is a legal type.
However, Literal **must** be parameterized with at least one type.
Types like ``Literal[]`` or ``Literal`` are illegal.
Legal and illegal parameterizations
===================================
This section describes what exactly constitutes a legal ``Literal[...]`` type:
what values may and may not be used as parameters.
In short, a ``Literal[...]`` type may be parameterized by one or more literal
expressions, and nothing else.
Legal parameters for ``Literal`` at type check time
---------------------------------------------------
``Literal`` may be parameterized with literal ints, byte and unicode strings,
bools, Enum values and ``None``. So for example, all of
the following would be legal::
Literal[26]
Literal[0x1A] # Exactly equivalent to Literal[26]
Literal[-4]
Literal["hello world"]
Literal[b"hello world"]
Literal[u"hello world"]
Literal[True]
Literal[Color.RED] # Assuming Color is some enum
Literal[None]
**Note:** Since the type ``None`` is inhabited by just a single
value, the types ``None`` and ``Literal[None]`` are exactly equivalent.
Type checkers may simplify ``Literal[None]`` into just ``None``.
``Literal`` may also be parameterized by other literal types, or type aliases
to other literal types. For example, the following is legal::
ReadOnlyMode = Literal["r", "r+"]
WriteAndTruncateMode = Literal["w", "w+", "wt", "w+t"]
WriteNoTruncateMode = Literal["r+", "r+t"]
AppendMode = Literal["a", "a+", "at", "a+t"]
AllModes = Literal[ReadOnlyMode, WriteAndTruncateMode,
WriteNoTruncateMode, AppendMode]
This feature is again intended to help make using and reusing literal types
more ergonomic.
**Note:** As a consequence of the above rules, type checkers are also expected
to support types that look like the following::
Literal[Literal[Literal[1, 2, 3], "foo"], 5, None]
This should be exactly equivalent to the following type::
Literal[1, 2, 3, "foo", 5, None]
...and also to the following type::
Optional[Literal[1, 2, 3, "foo", 5]]
**Note:** String literal types like ``Literal["foo"]`` should subtype either
bytes or unicode in the same way regular string literals do at runtime.
For example, in Python 3, the type ``Literal["foo"]`` is equivalent to
``Literal[u"foo"]``, since ``"foo"`` is equivalent to ``u"foo"`` in Python 3.
Similarly, in Python 2, the type ``Literal["foo"]`` is equivalent to
``Literal[b"foo"]`` -- unless the file includes a
``from __future__ import unicode_literals`` import, in which case it would be
equivalent to ``Literal[u"foo"]``.
Illegal parameters for ``Literal`` at type check time
-----------------------------------------------------
The following parameters are intentionally disallowed by design:
- Arbitrary expressions like ``Literal[3 + 4]`` or
``Literal["foo".replace("o", "b")]``.
- Rationale: Literal types are meant to be a
minimal extension to the PEP 484 typing ecosystem and requiring type
checkers to interpret potentially expressions inside types adds too
much complexity. Also see `Rejected or out-of-scope ideas`_.
- As a consequence, complex numbers like ``Literal[4 + 3j]`` and
``Literal[-4 + 2j]`` are also prohibited. For consistency, literals like
``Literal[4j]`` that contain just a single complex number are also
prohibited.
- The only exception to this rule is the unary ``-`` (minus) for ints: types
like ``Literal[-5]`` are *accepted*.
- Tuples containing valid literal types like ``Literal[(1, "foo", "bar")]``.
The user could always express this type as
``Tuple[Literal[1], Literal["foo"], Literal["bar"]]`` instead. Also,
tuples are likely to be confused with the ``Literal[1, 2, 3]``
shortcut.
- Mutable literal data structures like dict literals, list literals, or
set literals: literals are always implicitly final and immutable. So,
``Literal[{"a": "b", "c": "d"}]`` is illegal.
- Any other types: for example, ``Literal[Path]``, or
``Literal[some_object_instance]`` are illegal. This includes typevars: if
``T`` is a typevar, ``Literal[T]`` is not allowed. Typevars can vary over
only types, never over values.
The following are provisionally disallowed for simplicity. We can consider
allowing them on a case-by-case basis based on demand.
- Floats: e.g. ``Literal[3.14]``. Note: if we do decide to allow
floats, we should likely disallow literal infinity and literal NaN.
- Any: e.g. ``Literal[Any]`` Note: the semantics of what exactly
``Literal[Any]`` means would need to be clarified first.
Parameters at runtime
---------------------
Although the set of parameters ``Literal[...]`` may contain at type check time
is very small, the actual implementation of ``typing.Literal`` will not perform
any checks at runtime. For example::
def my_function(x: Literal[1 + 2]) -> int:
return x * 3
x: Literal = 3
y: Literal[my_function] = my_function
The type checker should reject this program: all three uses of
``Literal`` are *invalid* according to this spec. However, Python itself
should execute this program with no errors.
This is partly to help us preserve flexibility in case we want to expand the
scope of what ``Literal`` can be used for in the future, and partly because
it is not possible to detect all illegal parameters at runtime to begin with.
For example, it is impossible to distinguish between ``Literal[1 + 2]`` and
``Literal[3]`` at runtime.
Literals, enums, and forward references
---------------------------------------
One potential ambiguity is between literal strings and forward
references to literal enum members. For example, suppose we have the
type ``Literal["Color.RED"]``. Does this literal type
contain a string literal or a forward reference to some ``Color.RED``
enum member?
In cases like these, we always assume the user meant to construct a
literal string. If the user wants a forward reference, they must wrap
the entire literal type in a string -- e.g. ``"Literal[Color.RED]"``.
Literals, enums, and Any
------------------------
Another ambiguity is when the user attempts to use some expression that
is meant to be an enum but is actually of type ``Any``. For example,
suppose a user attempts to import an enum from a package with no type hints::
from typing import Literal
from lib_with_no_types import SomeEnum # SomeEnum has type 'Any'!
# x has type `Literal[Any]` due to the bad import
x: Literal[SomeEnum.FOO]
Because ``Literal`` may not be parameterized by ``Any``, this program
is *illegal*: the type checker should report an error with the last line.
In short, while ``Any`` may effectively be used as a placeholder for any
arbitrary *type*, it is currently **not** allowed to serve as a placeholder
for any arbitrary *value*.
Type inference
==============
This section describes a few rules regarding type inference and
literals, along with some examples.
Backwards compatibility
-----------------------
When type checkers add support for Literal, it's important they do so
in a way that preserves backwards-compatibility. Code that used to
type check **must** continue to type check after support for Literal
is added.
This is particularly important when performing type inference. For
example, given the statement ``x = "blue"``, should the inferred
type of ``x`` be ``str`` or ``Literal["blue"]``?
This PEP does not require any particular strategy for cases like this,
apart from requiring that backwards compatibility is maintained.
For example, one simple strategy for meeting this requirement would be
to always assume expressions are *not* Literal types unless they are
explicitly annotated otherwise. A type checker using this strategy would
always infer that ``x`` is of type ``str`` in the above example.
If type checkers choose to use more sophisticated inference strategies,
they should avoid being too over-zealous while doing so.
For example, one strategy that does *not* work is always assuming expressions
are Literal types. This naive strategy would cause programs like the
following to start failing when they previously did not::
# If a type checker infers 'var' has type Literal[3]
# and my_list has type List[Literal[3]]...
var = 3
my_list = [var]
# ...this call would be a type-error.
my_list.append(4)
Another example of when this strategy would fail is when setting fields
in objects::
class MyObject:
def __init__(self) -> None:
# If a type checker infers MyObject.field has type Literal[3]...
self.field = 3
m = MyObject()
# ...this assignment would no longer type check
m.field = 4
Using non-Literals in Literal contexts
--------------------------------------
Literal types follow the existing rules regarding subtyping with no additional
special-casing. For example, programs like the following are type safe::
def expects_str(x: str) -> None: ...
var: Literal["foo"] = "foo"
# Legal: Literal["foo"] is a subtype of str
expects_str(var)
This also means non-Literal expressions in general should not automatically
inferred to be Literal. For example::
def expects_literal(x: Literal["foo"]) -> None: ...
def runner(my_str: str) -> None:
# ILLEGAL: str is not a subclass of Literal["foo"]
expects_literal(my_str)
**Note:** If the user wants their API to support accepting both literals
*and* the original type -- perhaps for legacy purposes -- they should
implement a fallback overload. See `Interactions with overloads`_.
Interactions with other types and features
==========================================
This section discusses how Literal types interact with other existing types.
Intelligent indexing of structured data
---------------------------------------
Literals can be used to "intelligently index" into structured types like
tuples, NamedTuple, and classes. (Note: this is not an exhaustive list).
For example, type checkers should infer the correct value type when
indexing into a tuple using an int key that corresponds a valid index::
a: Literal[0] = 0
b: Literal[5] = 5
some_tuple: Tuple[int, str, List[bool]] = (3, "abc", [True, False])
reveal_type(some_tuple[a]) # Revealed type is 'int'
some_tuple[b] # Error: 5 is not a valid index into the tuple
We expect similar behavior when using functions like getattr::
class Test:
def __init__(self, param: int) -> None:
self.myfield = param
def mymethod(self, val: int) -> str: ...
a: Literal["myfield"] = "myfield"
b: Literal["mymethod"] = "mymethod"
c: Literal["blah"] = "blah"
t = Test()
reveal_type(getattr(t, a)) # Revealed type is 'int'
reveal_type(getattr(t, b)) # Revealed type is 'Callable[[int], str]'
getattr(t, c) # Error: No attribute named 'blah' in Test
Interactions with overloads
---------------------------
Literal types and overloads do not need to interact in a special
way: the existing rules work fine.
However, one important use case type checkers must take care to
support is the ability to use a *fallback* when the user is not using literal
types. For example, consider ``open``::
_PathType = Union[str, bytes, int]
@overload
def open(path: _PathType,
mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
) -> IO[Text]: ...
@overload
def open(path: _PathType,
mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
) -> IO[bytes]: ...
# Fallback overload for when the user isn't using literal types
@overload
def open(path: _PathType, mode: str) -> IO[Any]: ...
If we change the signature of ``open`` to use just the first two overloads,
we would break any code that does not pass in a literal string expression.
For example, code like this would be broken::
mode: str = pick_file_mode(...)
with open(path, mode) as f:
# f should continue to be of type IO[Any] here
A little more broadly: we propose adding a policy to typeshed that
mandates that whenever we add literal types to some existing API, we also
always include a fallback overload to maintain backwards-compatibility.
Interactions with generics
--------------------------
Types like ``Literal[3]`` are meant to be just plain old subclasses of
``int``. This means you can use types like ``Literal[3]`` anywhere
you could use normal types, such as with generics.
This means that it is legal to parameterize generic functions or
classes using Literal types::
A = TypeVar('A', bound=int)
B = TypeVar('B', bound=int)
C = TypeVar('C', bound=int)
# A simplified definition for Matrix[row, column]
class Matrix(Generic[A, B]):
def __add__(self, other: Matrix[A, B]) -> Matrix[A, B]: ...
def __matmul__(self, other: Matrix[B, C]) -> Matrix[A, C]: ...
def transpose(self) -> Matrix[B, A]: ...
foo: Matrix[Literal[2], Literal[3]] = Matrix(...)
bar: Matrix[Literal[3], Literal[7]] = Matrix(...)
baz = foo @ bar
reveal_type(baz) # Revealed type is 'Matrix[Literal[2], Literal[7]]'
Similarly, it is legal to construct TypeVars with value restrictions
or bounds involving Literal types::
T = TypeVar('T', Literal["a"], Literal["b"], Literal["c"])
S = TypeVar('S', bound=Literal["foo"])
...although it is unclear when it would ever be useful to construct a
TypeVar with a Literal upper bound. For example, the ``S`` TypeVar in
the above example is essentially pointless: we can get equivalent behavior
by using ``S = Literal["foo"]`` instead.
**Note:** Literal types and generics deliberately interact in only very
basic and limited ways. In particular, libraries that want to typecheck
code containing an heavy amount of numeric or numpy-style manipulation will
almost certainly likely find Literal types as proposed in this PEP to be
insufficient for their needs.
We considered several different proposals for fixing this, but ultimately
decided to defer the problem of integer generics to a later date. See
`Rejected or out-of-scope ideas`_ for more details.
Interactions with type narrowing
--------------------------------
Type checkers should be capable of performing exhaustiveness checks when
working Literal types that have a closed number of variants, such as
enums. For example, the type checker should be capable of inferring that the
final ``else`` statement in the following function is unreachable::
class Status(Enum):
SUCCESS = 0
INVALID_DATA = 1
FATAL_ERROR = 2
def parse_status(s: Status) -> None:
if s is Status.SUCCESS:
print("Success!")
elif s is Status.INVALID_DATA:
print("The given data is invalid because...")
elif s is Status.FATAL_ERROR:
print("Unexpected fatal error...")
else:
# Error should not be reported by type checkers that
# ignore errors in unreachable blocks
print("Nonsense" + 100)
This behavior is technically not new: this behavior is
`already codified within PEP 484 <pep-484-enums_>`_. However, many type
checkers (such as mypy) do not yet implement this behavior. Once Literal
types are introduced, it will become easier to do so: we can model
enums as being approximately equal to the union of their values and
take advantage of any existing logic regarding unions, exhaustibility,
and type narrowing.
So here, ``Status`` could be treated as being approximately equal to
``Literal[Status.SUCCESS, Status.INVALID_DATA, Status.FATAL_ERROR]``
and the type of ``s`` narrowed accordingly.
Type checkers may optionally perform additional analysis and narrowing
beyond what is described above.
For example, it may be useful to perform narrowing based on things like
containment or equality checks::
def parse_status(status: str) -> None:
if status in ("MALFORMED", "ABORTED"):
# Type checker could narrow 'status' to type
# Literal["MALFORMED", "ABORTED"] here.
return expects_bad_status(status)
# Similarly, type checker could narrow 'x' to Literal["PENDING"]
if status == "PENDING":
expects_pending_status(status)
It may also be useful to perform narrowing taking into account expressions
involving Literal bools. For example, we can combine ``Literal[True]``,
``Literal[False]``, and overloads to construct "custom type guards"::
@overload
def is_int_like(x: Union[int, List[int]]) -> Literal[True]: ...
@overload
def is_int_like(x: Union[str, List[str]]) -> Literal[False]: ...
def is_int_like(x): ...
vector: List[int] = [1, 2, 3]
if is_int_like(vector):
vector.append(3)
else:
vector.append("bad") # This branch is inferred to be unreachable
scalar: Union[int, str]
if is_int_like(scalar):
scalar += 3 # Type checks: type of 'scalar' is narrowed to 'int'
else:
scalar += "foo" # Type checks: type of 'scalar' is narrowed to 'str'
Rejected or out-of-scope ideas
==============================
This section outlines some potential features that are explicitly out-of-scope.
True dependent types/integer generics
-------------------------------------
This proposal is essentially describing adding a very simplified
dependent type system to the PEP 484 ecosystem. One obvious extension
is to implement a full-fledged dependent type system that let users
predicate types based on their values in arbitrary ways. This would
let us write signatures like the below::
# A vector has length 'n', containing elements of type 'T'
class Vector(Generic[N, T]): ...
# The type checker will statically verify our function genuinely does
# construct a vector that is equal in length to "len(vec1) + len(vec2)"
# and will throw an error if it does not.
def concat(vec1: Vector[A, T], vec2: Vector[B, T]) -> Vector[A + B, T]:
# ...snip...
At the very least, it would be useful to add some form of integer generics.
Although such a type system would certainly be useful, its out-of-scope
for this PEP: it would require a far more substantial amount of implementation
work, discussion, and research to complete compared to the current proposal.
It's entirely possible we'll circle back and revisit this topic in the future:
we very likely will need some form of dependent typing along with other
extensions like variadic generics to support popular libraries like numpy.
This PEP should be seen as a stepping stones towards this goal,
rather then an attempt at providing a comprehensive solution.
Adding more concise syntax
--------------------------
One objection to this PEP is that having to explicitly write ``Literal[...]``
feels verbose. For example, instead of writing::
def foobar(arg1: Literal[1], arg2: Literal[True]) -> None:
pass
...it would be nice to instead write::
def foobar(arg1: 1, arg2: True) -> None:
pass
Unfortunately, these abbreviations simply will not work with the
existing implementation of ``typing`` at runtime. For example, the
following snippet crashes when run using Python 3.7::
from typing import Tuple
# Supposed to accept tuple containing the literals 1 and 2
def foo(x: Tuple[1, 2]) -> None:
pass
Running this yields the following exception::
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got 1.
We dont want users to have to memorize exactly when its ok to elide
``Literal``, so we require ``Literal`` to always be present.
A little more broadly, we feel overhauling the syntax of types in
Python is not within the scope of this PEP: it would be best to have
that discussion in a separate PEP, instead of attaching it to this one.
So, this PEP deliberately does not try and innovate Python's type syntax.
Backporting the ``Literal`` type
================================
Once this PEP is accepted, the ``Literal`` type will need to be backported for
Python versions that come bundled with older versions of the ``typing`` module.
We plan to do this by adding ``Literal`` to the ``typing_extensions`` 3rd party
module, which contains a variety of other backported types.
Implementation
==============
The mypy type checker currently has implemented a large subset of the behavior
described in this spec, with the exception of enum Literals and some of the
more complex narrowing interactions described above.
Related work
============
This proposal was written based on the discussion that took place in the
following threads:
- `Check that literals belong to/are excluded from a set of values <typing-discussion_>`_
- `Simple dependent types <mypy-discussion_>`_
- `Typing for multi-dimensional arrays <arrays-discussion_>`_
The overall design of this proposal also ended up converging into
something similar to how
`literal types are handled in TypeScript <typescript-literal-types_>`_.
.. _typing-discussion: https://github.com/python/typing/issues/478
.. _mypy-discussion: https://github.com/python/mypy/issues/3062
.. _arrays-discussion: https://github.com/python/typing/issues/513
.. _typescript-literal-types: https://www.typescriptlang.org/docs/handbook/advanced-types.html#string-literal_types
.. _typescript-index-types: https://www.typescriptlang.org/docs/handbook/advanced-types.html#index-types
.. _newtypes: https://www.python.org/dev/peps/pep-0484/#newtype-helper-function
.. _pep-484-enums: https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
Acknowledgements
================
Thanks to Mark Mendoza, Ran Benita, Rebecca Chen, and the other members of
typing-sig for their comments on this PEP.
Additional thanks to the various participants in the mypy and typing issue
trackers, who helped provide a lot of the motivation and reasoning behind
this PEP.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: