diff --git a/pep-0586.rst b/pep-0586.rst new file mode 100644 index 000000000..f5aedcc60 --- /dev/null +++ b/pep-0586.rst @@ -0,0 +1,737 @@ +PEP: 586 +Title: Literal Types +Author: Michael Lee , Ivan Levkivskyi , Jukka Lehtosalo +Discussions-To: Typing-Sig +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 14-Mar-2018 +Python-Version: 3.8 +Post-History: 14-Mar-2018 + +Abstract +======== + +This PEP proposes adding *Literal types* to the PEP 484 ecosystem. +Literal types indicate that some expression has literally a +specific value. For example, the following function will accept +only expressions that have literally the value "4":: + + from typing import Literal + + def accepts_only_four(x: Literal[4]) -> None: + pass + + accepts_only_four(4) # Ok + accepts_only_four(19) # Rejected + +Motivation and Rationale +======================== + +Python has many APIs that return different types depending on the +value of some argument provided. For example: + +- ``open(filename, mode)`` returns either ``IO[bytes]`` or ``IO[Text]`` + depending on whether the second argument is something like ``r`` or + ``rb``. +- ``subprocess.check_output(...)`` returns either bytes or text + depending on whether the ``universal_newlines`` keyword argument is + set to ``True`` or not. + +This pattern is also fairly common in many popular 3rd party libraries. +For example, here are just two examples from pandas and numpy respectively: + +- ``pandas.concat(...)`` will return either ``Series`` or + ``DataFrame`` depending on whether the ``axis`` argument is set to + 0 or 1. + +- ``numpy.unique`` will return either a single array or a tuple containing + anywhere from two to four arrays depending on three boolean flag values. + +The typing issue tracker contains some +`additional examples and discussion `_. + +There is currently no way of expressing the type signatures of these +functions: PEP 484 does not include any mechanism for writing signatures +where the return type varies depending on the value passed in. +Note that this problem persists even if we redesign these APIs to +instead accept enums: ``MyEnum.FOO`` and ``MyEnum.BAR`` are both +considered to be of type ``MyEnum``. + +Currently, type checkers work around this limitation by adding ad-hoc +extensions for important builtins and standard library functions. For +example mypy comes bundled with a plugin that attempts to infer more +precise types for ``open(...)``. While this approach works for standard +library functions, it’s unsustainable in general: it’s not reasonable to +expect 3rd party library authors to maintain plugins for N different +type checkers. + +We propose adding *Literal types* to address these gaps. + +Core Semantics +============== + +This section outlines the baseline behavior of literal types. + +Core behavior +------------- + +Literal types indicate a variable has a specific and +concrete value. For example, if we define some variable ``foo`` to have +type ``Literal[3]``, we are declaring that ``foo`` must be exactly equal +to ``3`` and no other value. + +Given some value ``v`` that is a member of type ``T``, the type +``Literal[v]`` shall be treated as a subtype of ``T``. For example, +``Literal[3]`` is a subtype of ``int``. + +All methods from the parent type will be directly inherited by the +literal type. So, if we have some variable ``foo`` of type ``Literal[3]`` +it’s safe to do things like ``foo + 5`` since ``foo`` inherits int’s +``__add__`` method. The resulting type of ``foo + 5`` is ``int``. + +This "inheriting" behavior is identical to how we +`handle NewTypes. `_. + +Equivalence of two Literals +--------------------------- + +Two types ``Literal[v1]`` and ``Literal[v2]`` are equivalent when +both of the following conditions are true: + +1. ``type(v1) == type(v2)`` +2. ``v1 == v2`` + +For example, ``Literal[20]`` and ``Literal[0x14]`` are equivalent. +However, ``Literal[0]`` and ``Literal[False]`` is *not* equivalent +despite that ``0 == False`` evaluates to 'true' at runtime: ``0`` +has type ``int`` and ``False`` has type ``bool``. + +Shortening unions of literals +----------------------------- + +Literals are parameterized with one or more values. When a Literal is +parameterized with more than one value, it's treated as exactly equivalent +to the union of those types. That is, ``Literal[v1, v2, v3]`` is equivalent +to ``Union[Literal[v1], Literal[v2], Literal[v3]]``. + +This shortcut helps make writing signatures for functions that accept +many different literals more ergonomic — for example, functions like +``open(...)``:: + + # Note: this is a simplification of the true type signature. + _PathType = Union[str, bytes, int] + + @overload + def open(path: _PathType, + mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"], + ) -> IO[Text]: ... + @overload + def open(path: _PathType, + mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"], + ) -> IO[bytes]: ... + + # Fallback overload for when the user isn't using literal types + @overload + def open(path: _PathType, mode: str) -> IO[Any]: ... + +The provided values do not all have to be members of the same type. +For example, ``Literal[42, "foo", True]`` is a legal type. + +However, Literal **must** be parameterized with at least one type. +Types like ``Literal[]`` or ``Literal`` are illegal. + + +Legal and illegal parameterizations +=================================== + +This section describes what exactly constitutes a legal ``Literal[...]`` type: +what values may and may not be used as parameters. + +In short, a ``Literal[...]`` type may be parameterized by one or more literal +expressions, and nothing else. + + +Legal parameters for ``Literal`` at type check time +--------------------------------------------------- + +``Literal`` may be parameterized with literal ints, byte and unicode strings, +bools, Enum values and ``None``. So for example, all of +the following would be legal:: + + Literal[26] + Literal[0x1A] # Exactly equivalent to Literal[26] + Literal[-4] + Literal["hello world"] + Literal[b"hello world"] + Literal[u"hello world"] + Literal[True] + Literal[Color.RED] # Assuming Color is some enum + Literal[None] + +**Note:** Since the type ``None`` is inhabited by just a single +value, the types ``None`` and ``Literal[None]`` are exactly equivalent. +Type checkers may simplify ``Literal[None]`` into just ``None``. + +``Literal`` may also be parameterized by other literal types, or type aliases +to other literal types. For example, the following is legal:: + + ReadOnlyMode = Literal["r", "r+"] + WriteAndTruncateMode = Literal["w", "w+", "wt", "w+t"] + WriteNoTruncateMode = Literal["r+", "r+t"] + AppendMode = Literal["a", "a+", "at", "a+t"] + + AllModes = Literal[ReadOnlyMode, WriteAndTruncateMode, + WriteNoTruncateMode, AppendMode] + +This feature is again intended to help make using and reusing literal types +more ergonomic. + +**Note:** As a consequence of the above rules, type checkers are also expected +to support types that look like the following:: + + Literal[Literal[Literal[1, 2, 3], "foo"], 5, None] + +This should be exactly equivalent to the following type:: + + Literal[1, 2, 3, "foo", 5, None] + +...and also to the following type:: + + Optional[Literal[1, 2, 3, "foo", 5]] + +**Note:** String literal types like ``Literal["foo"]`` should subtype either +bytes or unicode in the same way regular string literals do at runtime. + +For example, in Python 3, the type ``Literal["foo"]`` is equivalent to +``Literal[u"foo"]``, since ``"foo"`` is equivalent to ``u"foo"`` in Python 3. + +Similarly, in Python 2, the type ``Literal["foo"]`` is equivalent to +``Literal[b"foo"]`` -- unless the file includes a +``from __future__ import unicode_literals`` import, in which case it would be +equivalent to ``Literal[u"foo"]``. + +Illegal parameters for ``Literal`` at type check time +----------------------------------------------------- + +The following parameters are intentionally disallowed by design: + +- Arbitrary expressions like ``Literal[3 + 4]`` or + ``Literal["foo".replace("o", "b")]``. + + - Rationale: Literal types are meant to be a + minimal extension to the PEP 484 typing ecosystem and requiring type + checkers to interpret potentially expressions inside types adds too + much complexity. Also see `Rejected or out-of-scope ideas`_. + + - As a consequence, complex numbers like ``Literal[4 + 3j]`` and + ``Literal[-4 + 2j]`` are also prohibited. For consistency, literals like + ``Literal[4j]`` that contain just a single complex number are also + prohibited. + + - The only exception to this rule is the unary ``-`` (minus) for ints: types + like ``Literal[-5]`` are *accepted*. + +- Tuples containing valid literal types like ``Literal[(1, "foo", "bar")]``. + The user could always express this type as + ``Tuple[Literal[1], Literal["foo"], Literal["bar"]]`` instead. Also, + tuples are likely to be confused with the ``Literal[1, 2, 3]`` + shortcut. + +- Mutable literal data structures like dict literals, list literals, or + set literals: literals are always implicitly final and immutable. So, + ``Literal[{"a": "b", "c": "d"}]`` is illegal. + +- Any other types: for example, ``Literal[Path]``, or + ``Literal[some_object_instance]`` are illegal. This includes typevars: if + ``T`` is a typevar, ``Literal[T]`` is not allowed. Typevars can vary over + only types, never over values. + +The following are provisionally disallowed for simplicity. We can consider +allowing them on a case-by-case basis based on demand. + +- Floats: e.g. ``Literal[3.14]``. Note: if we do decide to allow + floats, we should likely disallow literal infinity and literal NaN. + +- Any: e.g. ``Literal[Any]`` Note: the semantics of what exactly + ``Literal[Any]`` means would need to be clarified first. + +Parameters at runtime +--------------------- + +Although the set of parameters ``Literal[...]`` may contain at type check time +is very small, the actual implementation of ``typing.Literal`` will not perform +any checks at runtime. For example:: + + def my_function(x: Literal[1 + 2]) -> int: + return x * 3 + + x: Literal = 3 + y: Literal[my_function] = my_function + +The type checker should reject this program: all three uses of +``Literal`` are *invalid* according to this spec. However, Python itself +should execute this program with no errors. + +This is partly to help us preserve flexibility in case we want to expand the +scope of what ``Literal`` can be used for in the future, and partly because +it is not possible to detect all illegal parameters at runtime to begin with. +For example, it is impossible to distinguish between ``Literal[1 + 2]`` and +``Literal[3]`` at runtime. + +Literals, enums, and forward references +--------------------------------------- + +One potential ambiguity is between literal strings and forward +references to literal enum members. For example, suppose we have the +type ``Literal["Color.RED"]``. Does this literal type +contain a string literal or a forward reference to some ``Color.RED`` +enum member? + +In cases like these, we always assume the user meant to construct a +literal string. If the user wants a forward reference, they must wrap +the entire literal type in a string -- e.g. ``"Literal[Color.RED]"``. + +Literals, enums, and Any +------------------------ + +Another ambiguity is when the user attempts to use some expression that +is meant to be an enum but is actually of type ``Any``. For example, +suppose a user attempts to import an enum from a package with no type hints:: + + from typing import Literal + from lib_with_no_types import SomeEnum # SomeEnum has type 'Any'! + + # x has type `Literal[Any]` due to the bad import + x: Literal[SomeEnum.FOO] + +Because ``Literal`` may not be parameterized by ``Any``, this program +is *illegal*: the type checker should report an error with the last line. + +In short, while ``Any`` may effectively be used as a placeholder for any +arbitrary *type*, it is currently **not** allowed to serve as a placeholder +for any arbitrary *value*. + +Type inference +============== + +This section describes a few rules regarding type inference and +literals, along with some examples. + +Backwards compatibility +----------------------- + +When type checkers add support for Literal, it's important they do so +in a way that preserves backwards-compatibility. Code that used to +type check **must** continue to type check after support for Literal +is added. + +This is particularly important when performing type inference. For +example, given the statement ``x = "blue"``, should the inferred +type of ``x`` be ``str`` or ``Literal["blue"]``? + +This PEP does not require any particular strategy for cases like this, +apart from requiring that backwards compatibility is maintained. + +For example, one simple strategy for meeting this requirement would be +to always assume expressions are *not* Literal types unless they are +explicitly annotated otherwise. A type checker using this strategy would +always infer that ``x`` is of type ``str`` in the above example. + +If type checkers choose to use more sophisticated inference strategies, +they should avoid being too over-zealous while doing so. + +For example, one strategy that does *not* work is always assuming expressions +are Literal types. This naive strategy would cause programs like the +following to start failing when they previously did not:: + + # If a type checker infers 'var' has type Literal[3] + # and my_list has type List[Literal[3]]... + var = 3 + my_list = [var] + + # ...this call would be a type-error. + my_list.append(4) + +Another example of when this strategy would fail is when setting fields +in objects:: + + class MyObject: + def __init__(self) -> None: + # If a type checker infers MyObject.field has type Literal[3]... + self.field = 3 + + m = MyObject() + + # ...this assignment would no longer type check + m.field = 4 + +Using non-Literals in Literal contexts +-------------------------------------- + +Literal types follow the existing rules regarding subtyping with no additional +special-casing. For example, programs like the following are type safe:: + + def expects_str(x: str) -> None: ... + var: Literal["foo"] = "foo" + + # Legal: Literal["foo"] is a subtype of str + expects_str(var) + +This also means non-Literal expressions in general should not automatically +inferred to be Literal. For example:: + + def expects_literal(x: Literal["foo"]) -> None: ... + + def runner(my_str: str) -> None: + # ILLEGAL: str is not a subclass of Literal["foo"] + expects_literal(my_str) + +**Note:** If the user wants their API to support accepting both literals +*and* the original type -- perhaps for legacy purposes -- they should +implement a fallback overload. See `Interactions with overloads`_. + +Interactions with other types and features +========================================== + +This section discusses how Literal types interact with other existing types. + +Intelligent indexing of structured data +--------------------------------------- + +Literals can be used to "intelligently index" into structured types like +tuples, NamedTuple, and classes. (Note: this is not an exhaustive list). + +For example, type checkers should infer the correct value type when +indexing into a tuple using an int key that corresponds a valid index:: + + a: Literal[0] = 0 + b: Literal[5] = 5 + + some_tuple: Tuple[int, str, List[bool]] = (3, "abc", [True, False]) + reveal_type(some_tuple[a]) # Revealed type is 'int' + some_tuple[b] # Error: 5 is not a valid index into the tuple + +We expect similar behavior when using functions like getattr:: + + class Test: + def __init__(self, param: int) -> None: + self.myfield = param + + def mymethod(self, val: int) -> str: ... + + a: Literal["myfield"] = "myfield" + b: Literal["mymethod"] = "mymethod" + c: Literal["blah"] = "blah" + + t = Test() + reveal_type(getattr(t, a)) # Revealed type is 'int' + reveal_type(getattr(t, b)) # Revealed type is 'Callable[[int], str]' + getattr(t, c) # Error: No attribute named 'blah' in Test + +Interactions with overloads +--------------------------- + +Literal types and overloads do not need to interact in a special +way: the existing rules work fine. + +However, one important use case type checkers must take care to +support is the ability to use a *fallback* when the user is not using literal +types. For example, consider ``open``:: + + _PathType = Union[str, bytes, int] + + @overload + def open(path: _PathType, + mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"], + ) -> IO[Text]: ... + @overload + def open(path: _PathType, + mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"], + ) -> IO[bytes]: ... + + # Fallback overload for when the user isn't using literal types + @overload + def open(path: _PathType, mode: str) -> IO[Any]: ... + +If we change the signature of ``open`` to use just the first two overloads, +we would break any code that does not pass in a literal string expression. +For example, code like this would be broken:: + + mode: str = pick_file_mode(...) + with open(path, mode) as f: + # f should continue to be of type IO[Any] here + +A little more broadly: we propose adding a policy to typeshed that +mandates that whenever we add literal types to some existing API, we also +always include a fallback overload to maintain backwards-compatibility. + +Interactions with generics +-------------------------- + +Types like ``Literal[3]`` are meant to be just plain old subclasses of +``int``. This means you can use types like ``Literal[3]`` anywhere +you could use normal types, such as with generics. + +This means that it is legal to parameterize generic functions or +classes using Literal types:: + + A = TypeVar('A', bound=int) + B = TypeVar('B', bound=int) + C = TypeVar('C', bound=int) + + # A simplified definition for Matrix[row, column] + class Matrix(Generic[A, B]): + def __add__(self, other: Matrix[A, B]) -> Matrix[A, B]: ... + def __matmul__(self, other: Matrix[B, C]) -> Matrix[A, C]: ... + def transpose(self) -> Matrix[B, A]: ... + + foo: Matrix[Literal[2], Literal[3]] = Matrix(...) + bar: Matrix[Literal[3], Literal[7]] = Matrix(...) + + baz = foo @ bar + reveal_type(baz) # Revealed type is 'Matrix[Literal[2], Literal[7]]' + +Similarly, it is legal to construct TypeVars with value restrictions +or bounds involving Literal types:: + + T = TypeVar('T', Literal["a"], Literal["b"], Literal["c"]) + S = TypeVar('S', bound=Literal["foo"]) + +...although it is unclear when it would ever be useful to construct a +TypeVar with a Literal upper bound. For example, the ``S`` TypeVar in +the above example is essentially pointless: we can get equivalent behavior +by using ``S = Literal["foo"]`` instead. + +**Note:** Literal types and generics deliberately interact in only very +basic and limited ways. In particular, libraries that want to typecheck +code containing an heavy amount of numeric or numpy-style manipulation will +almost certainly likely find Literal types as proposed in this PEP to be +insufficient for their needs. + +We considered several different proposals for fixing this, but ultimately +decided to defer the problem of integer generics to a later date. See +`Rejected or out-of-scope ideas`_ for more details. + +Interactions with type narrowing +-------------------------------- + +Type checkers should be capable of performing exhaustiveness checks when +working Literal types that have a closed number of variants, such as +enums. For example, the type checker should be capable of inferring that the +final ``else`` statement in the following function is unreachable:: + + class Status(Enum): + SUCCESS = 0 + INVALID_DATA = 1 + FATAL_ERROR = 2 + + def parse_status(s: Status) -> None: + if s is Status.SUCCESS: + print("Success!") + elif s is Status.INVALID_DATA: + print("The given data is invalid because...") + elif s is Status.FATAL_ERROR: + print("Unexpected fatal error...") + else: + # Error should not be reported by type checkers that + # ignore errors in unreachable blocks + print("Nonsense" + 100) + +This behavior is technically not new: this behavior is +`already codified within PEP 484 `_. However, many type +checkers (such as mypy) do not yet implement this behavior. Once Literal +types are introduced, it will become easier to do so: we can model +enums as being approximately equal to the union of their values and +take advantage of any existing logic regarding unions, exhaustibility, +and type narrowing. + +So here, ``Status`` could be treated as being approximately equal to +``Literal[Status.SUCCESS, Status.INVALID_DATA, Status.FATAL_ERROR]`` +and the type of ``s`` narrowed accordingly. + +Type checkers may optionally perform additional analysis and narrowing +beyond what is described above. + +For example, it may be useful to perform narrowing based on things like +containment or equality checks:: + + def parse_status(status: str) -> None: + if status in ("MALFORMED", "ABORTED"): + # Type checker could narrow 'status' to type + # Literal["MALFORMED", "ABORTED"] here. + return expects_bad_status(status) + + # Similarly, type checker could narrow 'x' to Literal["PENDING"] + if status == "PENDING": + expects_pending_status(status) + +It may also be useful to perform narrowing taking into account expressions +involving Literal bools. For example, we can combine ``Literal[True]``, +``Literal[False]``, and overloads to construct "custom type guards":: + + @overload + def is_int_like(x: Union[int, List[int]]) -> Literal[True]: ... + @overload + def is_int_like(x: Union[str, List[str]]) -> Literal[False]: ... + def is_int_like(x): ... + + vector: List[int] = [1, 2, 3] + if is_int_like(vector): + vector.append(3) + else: + vector.append("bad") # This branch is inferred to be unreachable + + scalar: Union[int, str] + if is_int_like(scalar): + scalar += 3 # Type checks: type of 'scalar' is narrowed to 'int' + else: + scalar += "foo" # Type checks: type of 'scalar' is narrowed to 'str' + + + +Rejected or out-of-scope ideas +============================== + +This section outlines some potential features that are explicitly out-of-scope. + +True dependent types/integer generics +------------------------------------- + +This proposal is essentially describing adding a very simplified +dependent type system to the PEP 484 ecosystem. One obvious extension +is to implement a full-fledged dependent type system that let users +predicate types based on their values in arbitrary ways. This would +let us write signatures like the below:: + + # A vector has length 'n', containing elements of type 'T' + class Vector(Generic[N, T]): ... + + # The type checker will statically verify our function genuinely does + # construct a vector that is equal in length to "len(vec1) + len(vec2)" + # and will throw an error if it does not. + def concat(vec1: Vector[A, T], vec2: Vector[B, T]) -> Vector[A + B, T]: + # ...snip... + +At the very least, it would be useful to add some form of integer generics. + +Although such a type system would certainly be useful, it’s out-of-scope +for this PEP: it would require a far more substantial amount of implementation +work, discussion, and research to complete compared to the current proposal. + +It's entirely possible we'll circle back and revisit this topic in the future: +we very likely will need some form of dependent typing along with other +extensions like variadic generics to support popular libraries like numpy. + +This PEP should be seen as a stepping stones towards this goal, +rather then an attempt at providing a comprehensive solution. + +Adding more concise syntax +-------------------------- + +One objection to this PEP is that having to explicitly write ``Literal[...]`` +feels verbose. For example, instead of writing:: + + def foobar(arg1: Literal[1], arg2: Literal[True]) -> None: + pass + +...it would be nice to instead write:: + + def foobar(arg1: 1, arg2: True) -> None: + pass + +Unfortunately, these abbreviations simply will not work with the +existing implementation of ``typing`` at runtime. For example, the +following snippet crashes when run using Python 3.7:: + + from typing import Tuple + + # Supposed to accept tuple containing the literals 1 and 2 + def foo(x: Tuple[1, 2]) -> None: + pass + +Running this yields the following exception:: + + TypeError: Tuple[t0, t1, ...]: each t must be a type. Got 1. + +We don’t want users to have to memorize exactly when it’s ok to elide +``Literal``, so we require ``Literal`` to always be present. + +A little more broadly, we feel overhauling the syntax of types in +Python is not within the scope of this PEP: it would be best to have +that discussion in a separate PEP, instead of attaching it to this one. +So, this PEP deliberately does not try and innovate Python's type syntax. + +Backporting the ``Literal`` type +================================ + +Once this PEP is accepted, the ``Literal`` type will need to be backported for +Python versions that come bundled with older versions of the ``typing`` module. +We plan to do this by adding ``Literal`` to the ``typing_extensions`` 3rd party +module, which contains a variety of other backported types. + +Implementation +============== + +The mypy type checker currently has implemented a large subset of the behavior +described in this spec, with the exception of enum Literals and some of the +more complex narrowing interactions described above. + + +Related work +============ + +This proposal was written based on the discussion that took place in the +following threads: + +- `Check that literals belong to/are excluded from a set of values `_ + +- `Simple dependent types `_ + +- `Typing for multi-dimensional arrays `_ + +The overall design of this proposal also ended up converging into +something similar to how +`literal types are handled in TypeScript `_. + +.. _typing-discussion: https://github.com/python/typing/issues/478 + +.. _mypy-discussion: https://github.com/python/mypy/issues/3062 + +.. _arrays-discussion: https://github.com/python/typing/issues/513 + +.. _typescript-literal-types: https://www.typescriptlang.org/docs/handbook/advanced-types.html#string-literal_types + +.. _typescript-index-types: https://www.typescriptlang.org/docs/handbook/advanced-types.html#index-types + +.. _newtypes: https://www.python.org/dev/peps/pep-0484/#newtype-helper-function + +.. _pep-484-enums: https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions + + +Acknowledgements +================ + +Thanks to Mark Mendoza, Ran Benita, Rebecca Chen, and the other members of +typing-sig for their comments on this PEP. + +Additional thanks to the various participants in the mypy and typing issue +trackers, who helped provide a lot of the motivation and reasoning behind +this PEP. + + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: +