diff --git a/pep-0675.rst b/pep-0675.rst index 81e28889a..0b341d760 100644 --- a/pep-0675.rst +++ b/pep-0675.rst @@ -18,10 +18,11 @@ Abstract There is currently no way to specify that a function parameter can be of any literal string type; we have to specify the precise literal string, such as ``Literal["foo"]``. This PEP introduces a supertype of -literal string types: ``Literal[str]``. This allows a function to -accept arbitrary literal string types such as ``Literal["foo"]`` or +literal string types: ``LiteralString``. This allows a function to +accept arbitrary literal string types, such as ``Literal["foo"]`` or ``Literal["bar"]``. + Motivation ========== @@ -106,23 +107,25 @@ We want to forbid harmful execution of user-controlled data while still allowing benign idioms like the above and not requiring extra user work. -To meet this goal, we introduce the ``Literal[str]`` type, which only +To meet this goal, we introduce the ``LiteralString`` type, which only accepts string values that are known to be made of literals. This is a generalization of the ``Literal["foo"]`` type from :pep:`586`. A string of type -``Literal[str]`` cannot contain user-controlled data. Thus, any API -that only accepts ``Literal[str]`` will be immune to injection +``LiteralString`` cannot contain user-controlled data. Thus, any API +that only accepts ``LiteralString`` will be immune to injection vulnerabilities (with `pragmatic limitations `_). Since we want the ``sqlite3`` ``execute`` method to disallow strings built with user input, we would make its `typeshed stub `_ -accept a ``sql`` query that is of type ``Literal[str]``: +accept a ``sql`` query that is of type ``LiteralString``: :: - def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ... + from typing import LiteralString + + def execute(self, sql: LiteralString, parameters: Iterable[str] = ...) -> Cursor: ... This successfully forbids our unsafe SQL example. The variable @@ -135,7 +138,7 @@ from a format string using ``user_id``, and cannot be passed to def query_user(conn: Connection, user_id: str) -> User: query = f"SELECT * FROM data WHERE user_id = {user_id}" conn.execute(query) - # Error: Expected Literal[str], got str. + # Error: Expected LiteralString, got str. The method remains flexible enough to allow our more complicated example: @@ -153,7 +156,7 @@ example: """ if limit: - # Still has type Literal[str] because we added a literal string. + # Still has type LiteralString because we added a literal string. query += " LIMIT 1" conn.execute(query, (user_id,)) # OK @@ -162,7 +165,7 @@ Notice that the user did not have to change their SQL code at all. The type checker was able to infer the literal string type and complain only in case of violations. -``Literal[str]`` is also useful in other cases where we want strict +``LiteralString`` is also useful in other cases where we want strict command-data separation, such as when building shell commands or when rendering a string into an HTML response without escaping (see `Appendix A: Other Uses`_). Overall, this combination of strictness @@ -228,109 +231,100 @@ string? We want to specify that the value must be of some type ``Literal[<...>]`` where ``<...>`` is some string. This is what -``Literal[str]`` represents. ``Literal[str]`` is the "supertype" of +``LiteralString`` represents. ``LiteralString`` is the "supertype" of all literal string types. In effect, this PEP just introduces a type in the type hierarchy between ``Literal["foo"]`` and ``str``. Any -particular literal string such as ``Literal["foo"]`` or -``Literal["bar"]`` is compatible with ``Literal[str]``, but not the -other way around. The "supertype" of ``Literal[str]`` itself is -``str``. So, ``Literal[str]`` is compatible with ``str``, but not the +particular literal string, such as ``Literal["foo"]`` or +``Literal["bar"]``, is compatible with ``LiteralString``, but not the +other way around. The "supertype" of ``LiteralString`` itself is +``str``. So, ``LiteralString`` is compatible with ``str``, but not the other way around. Note that a ``Union`` of literal types is naturally compatible with -``Literal[str]`` because each element of the ``Union`` is individually -compatible with ``Literal[str]``. So, ``Literal["foo", "bar"]`` is -compatible with ``Literal[str]``. +``LiteralString`` because each element of the ``Union`` is individually +compatible with ``LiteralString``. So, ``Literal["foo", "bar"]`` is +compatible with ``LiteralString``. However, recall that we don't just want to represent exact literal queries. We also want to support composition of two literal strings, such as ``query + " LIMIT 1"``. This too is possible with the above -concept. If ``x`` and ``y`` are two values of type ``Literal[str]``, +concept. If ``x`` and ``y`` are two values of type ``LiteralString``, then ``x + y`` will also be of type compatible with -``Literal[str]``. We can reason about this by looking at specific +``LiteralString``. We can reason about this by looking at specific instances such as ``Literal["foo"]`` and ``Literal["bar"]``; the value of the added string ``x + y`` can only be ``"foobar"``, which has type ``Literal["foobar"]`` and is thus compatible with -``Literal[str]``. The same reasoning applies when ``x`` and ``y`` are +``LiteralString``. The same reasoning applies when ``x`` and ``y`` are unions of literal types; the result of pairwise adding any two literal types from ``x`` and ``y`` respectively is a literal type, which means that the overall result is a ``Union`` of literal types and is thus -compatible with ``Literal[str]``. +compatible with ``LiteralString``. In this way, we are able to leverage Python's concept of a ``Literal`` string type to specify that our API can only accept strings that are known to be constructed from literals. More specific details follow in the remaining sections. -Valid Locations for ``Literal[str]`` +Valid Locations for ``LiteralString`` ========================================= -``Literal[str]`` can be used where any other type can be used: +``LiteralString`` can be used where any other type can be used: :: - variable_annotation: Literal[str] + variable_annotation: LiteralString - def my_function(literal_string: Literal[str]) -> Literal[str]: ... + def my_function(literal_string: LiteralString) -> LiteralString: ... class Foo: - my_attribute: Literal[str] + my_attribute: LiteralString - type_argument: List[Literal[str]] + type_argument: List[LiteralString] - T = TypeVar("T", bound=Literal[str]) + T = TypeVar("T", bound=LiteralString) -It can be nested within unions of ``Literal`` types: +It cannot be nested within unions of ``Literal`` types: :: - union: Literal["hello", Literal[str]] - union2: Literal["hello", str] - union3: Literal[str, 4] + bad_union: Literal["hello", LiteralString] # Not OK + bad_nesting: Literal[LiteralString] # Not OK - nested_literal_string: Literal[Literal[str]] - - -The restrictions on the parameters of ``Literal`` are the same as in -:pep:`586`. The only legal -parameter is the literal value ``str``. Other values are rejected even -if they evaluate to the same value (``str``), such as -``Literal[(lambda x: x)(str)]``. Type Inference ============== -.. _inferring_literal_str: +.. _inferring_literal_string: -Inferring ``Literal[str]`` --------------------------- +Inferring ``LiteralString`` +--------------------------- -Any literal string type is compatible with ``Literal[str]``. For -example, ``x: Literal[str] = "foo"`` is valid because ``"foo"`` is +Any literal string type is compatible with ``LiteralString``. For +example, ``x: LiteralString = "foo"`` is valid because ``"foo"`` is inferred to be of type ``Literal["foo"]``. -As per the `Rationale`_, we also infer ``Literal[str]`` in the +As per the `Rationale`_, we also infer ``LiteralString`` in the following cases: -+ Addition: ``x + y`` is of type ``Literal[str]`` if both ``x`` and - ``y`` are compatible with ``Literal[str]``. ++ Addition: ``x + y`` is of type ``LiteralString`` if both ``x`` and + ``y`` are compatible with ``LiteralString``. -+ Joining: ``sep.join(xs)`` is of type ``Literal[str]`` if ``sep``'s - type is compatible with ``Literal[str]`` and ``xs``'s type is - compatible with ``Iterable[Literal[str]]``. ++ Joining: ``sep.join(xs)`` is of type ``LiteralString`` if ``sep``'s + type is compatible with ``LiteralString`` and ``xs``'s type is + compatible with ``Iterable[LiteralString]``. -+ In-place addition: If ``s`` has type ``Literal[str]`` and ``x`` has - type compatible with ``Literal[str]``, then ``s += x`` preserves - ``s``'s type as ``Literal[str]``. ++ In-place addition: If ``s`` has type ``LiteralString`` and ``x`` has + type compatible with ``LiteralString``, then ``s += x`` preserves + ``s``'s type as ``LiteralString``. -+ String formatting: An f-string has type ``Literal[str]`` if and only ++ String formatting: An f-string has type ``LiteralString`` if and only if its constituent expressions are literal strings. ``s.format(...)`` - has type ``Literal[str]`` if and only if ``s`` and the arguments have - types compatible with ``Literal[str]``. + has type ``LiteralString`` if and only if ``s`` and the arguments have + types compatible with ``LiteralString``. + Literal-preserving methods: In `Appendix C `_, we have provided an exhaustive list of ``str`` methods that preserve the - ``Literal[str]`` type. + ``LiteralString`` type. In all other cases, if one or more of the composed values has a non-literal type ``str``, the composition of types will have type @@ -338,9 +332,9 @@ non-literal type ``str``, the composition of types will have type has type ``str``. This matches the pre-existing behavior of type checkers. -``Literal[str]`` is compatible with the type ``str``. It inherits all +``LiteralString`` is compatible with the type ``str``. It inherits all methods from ``str``. So, if we have a variable ``s`` of type -``Literal[str]``, it is safe to write ``s.startswith("hello")``. +``LiteralString``, it is safe to write ``s.startswith("hello")``. Some type checkers refine the type of a string when doing an equality check: @@ -352,7 +346,7 @@ check: reveal_type(s) # => Literal["bar"] Such a refined type in the if-block is also compatible with -``Literal[str]`` because its type is ``Literal["bar"]``. +``LiteralString`` because its type is ``Literal["bar"]``. Examples @@ -363,36 +357,36 @@ See the examples below to help clarify the above rules: :: - literal_string: Literal[str] + literal_string: LiteralString s: str = literal_string # OK - literal_string: Literal[str] = s # Error: Expected Literal[str], got str. - literal_string: Literal[str] = "hello" # OK + literal_string: LiteralString = s # Error: Expected LiteralString, got str. + literal_string: LiteralString = "hello" # OK - def expect_literal_str(s: Literal[str]) -> None: ... + def expect_literal_string(s: LiteralString) -> None: ... Addition of literal strings: :: - expect_literal_str("foo" + "bar") # OK - expect_literal_str(literal_string + "bar") # OK - literal_string2: Literal[str] - expect_literal_str(literal_string + literal_string2) # OK - plain_str: str - expect_literal_str(literal_string + plain_str) # Not OK. + expect_literal_string("foo" + "bar") # OK + expect_literal_string(literal_string + "bar") # OK + literal_string2: LiteralString + expect_literal_string(literal_string + literal_string2) # OK + plain_string: str + expect_literal_string(literal_string + plain_string) # Not OK. Join using literal strings: :: - expect_literal_str(",".join(["foo", "bar"])) # OK - expect_literal_str(literal_string.join(["foo", "bar"])) # OK - expect_literal_str(literal_string.join([literal_string, literal_string2])) # OK - xs: List[Literal[str]] - expect_literal_str(literal_string.join(xs)) # OK - expect_literal_str(plain_str.join([literal_string, literal_string2])) + expect_literal_string(",".join(["foo", "bar"])) # OK + expect_literal_string(literal_string.join(["foo", "bar"])) # OK + expect_literal_string(literal_string.join([literal_string, literal_string2])) # OK + xs: List[LiteralString] + expect_literal_string(literal_string.join(xs)) # OK + expect_literal_string(plain_string.join([literal_string, literal_string2])) # Not OK because the separator has type ``str``. In-place addition using literal strings: @@ -401,62 +395,62 @@ In-place addition using literal strings: literal_string += "foo" # OK literal_string += literal_string2 # OK - literal_string += plain_str # Not OK + literal_string += plain_string # Not OK Format strings using literal strings: :: - literal_name: Literal[str] - expect_literal_str(f"hello {literal_name}") + literal_name: LiteralString + expect_literal_string(f"hello {literal_name}") # OK because it is composed from literal strings. - expect_literal_str("hello {}".format(literal_name)) # OK + expect_literal_string("hello {}".format(literal_name)) # OK - expect_literal_str(f"hello") # OK + expect_literal_string(f"hello") # OK - expect_literal_str(f"hello {username}") + expect_literal_string(f"hello {username}") # NOT OK. The format-string is constructed from ``username``, # which has type ``str``. - expect_literal_str("hello {}".format(username)) # Not OK + expect_literal_string("hello {}".format(username)) # Not OK -Other literal types, such as literal integers, are not compatible with ``Literal[str]``: +Other literal types, such as literal integers, are not compatible with ``LiteralString``: :: some_int: int - expect_literal_str(some_int) # Error: Expected Literal[str], got int. + expect_literal_string(some_int) # Error: Expected LiteralString, got int. literal_one: Literal[1] = 1 - expect_literal_str(literal_one) # Error: Expected Literal[str], got Literal[1]. + expect_literal_string(literal_one) # Error: Expected LiteralString, got Literal[1]. We can call functions on literal strings: :: - def add_limit(query: Literal[str]) -> Literal[str]: + def add_limit(query: LiteralString) -> LiteralString: return query + " LIMIT = 1" - def my_query(query: Literal[str], user_id: str) -> None: + def my_query(query: LiteralString, user_id: str) -> None: sql_connection().execute(add_limit(query), (user_id,)) # OK Conditional statements and expressions work as expected: :: - def return_literal_str() -> Literal[str]: + def return_literal_string() -> LiteralString: return "foo" if condition1() else "bar" # OK - def return_literal_str2(literal_str: Literal[str]) -> Literal[str]: - return "foo" if condition1() else literal_str # OK + def return_literal_str2(literal_string: LiteralString) -> LiteralString: + return "foo" if condition1() else literal_string # OK - def return_literal_str3() -> Literal[str]: + def return_literal_str3() -> LiteralString: if condition1(): result: Literal["foo"] = "foo" else: - result: Literal[str] = "bar" + result: LiteralString = "bar" return result # OK @@ -464,13 +458,13 @@ Conditional statements and expressions work as expected: Interaction with TypeVars and Generics -------------------------------------- -TypeVars can be bound to ``Literal[str]``: +TypeVars can be bound to ``LiteralString``: :: - from typing import Literal, TypeVar + from typing import Literal, LiteralString, TypeVar - TLiteral = TypeVar("TLiteral", bound=Literal[str]) + TLiteral = TypeVar("TLiteral", bound=LiteralString) def literal_identity(s: TLiteral) -> TLiteral: return s @@ -479,16 +473,16 @@ TypeVars can be bound to ``Literal[str]``: y = literal_identity(hello) reveal_type(y) # => Literal["hello"] - s: Literal[str] + s: LiteralString y2 = literal_identity(s) - reveal_type(y2) # => Literal[str] + reveal_type(y2) # => LiteralString s_error: str literal_identity(s_error) - # Error: Expected TLiteral (bound to Literal[str]), got str. + # Error: Expected TLiteral (bound to LiteralString), got str. -``Literal[str]`` can be used as type arguments for generic classes: +``LiteralString`` can be used as a type argument for generic classes: :: @@ -496,23 +490,23 @@ TypeVars can be bound to ``Literal[str]``: def __init__(self, value: T) -> None: self.value = value - literal_str: Literal[str] = "hello" - x: Container[Literal[str]] = Container(literal_str) # OK + literal_string: LiteralString = "hello" + x: Container[LiteralString] = Container(literal_string) # OK s: str - x_error: Container[Literal[str]] = Container(s) # Not OK + x_error: Container[LiteralString] = Container(s) # Not OK Standard containers like ``List`` work as expected: :: - xs: List[Literal[str]] = ["foo", "bar", "baz"] + xs: List[LiteralString] = ["foo", "bar", "baz"] Interactions with Overloads --------------------------- Literal strings and overloads do not need to interact in a special -way: the existing rules work fine. ``Literal[str]`` can be used as a +way: the existing rules work fine. ``LiteralString`` can be used as a fallback overload where a specific ``Literal["foo"]`` type does not match: @@ -521,7 +515,7 @@ match: @overload def foo(x: Literal["foo"]) -> int: ... @overload - def foo(x: Literal[str]) -> bool: ... + def foo(x: LiteralString) -> bool: ... @overload def foo(x: str) -> str: ... @@ -534,9 +528,8 @@ match: Backwards Compatibility ======================= -``Literal[str]`` is acceptable at runtime, so -this doesn't require any changes to the Python runtime itself. :pep:`586` -already backports ``Literal``, so this PEP does not need to change it. +We propose adding ``typing_extensions.LiteralString`` for use in +earlier Python versions. As :pep:`PEP 586 mentions <586#backwards-compatibility>`, @@ -548,7 +541,7 @@ string, the following example should be OK: :: x = "hello" - expect_literal_str(x) + expect_literal_string(x) # OK, because x is inferred to have type ``Literal["hello"]``. This enables precise type checking of idiomatic SQL query code without @@ -558,18 +551,19 @@ example). However, like :pep:`586`, this PEP does not mandate the above inference strategy. In case the type checker doesn't infer ``x`` to have type ``Literal["hello"]``, users can aid the type checker by explicitly -annotating it as ``x: Literal[str]``: +annotating it as ``x: LiteralString``: :: - x: Literal[str] = "hello" - expect_literal_str(x) + x: LiteralString = "hello" + expect_literal_string(x) Runtime Behavior ================ -This PEP does not change the runtime behavior of ``Literal``. +We propose an implementation for ``typing.LiteralString`` similar to that for +``typing.Self`` from :pep:`673`. Rejected Alternatives @@ -590,7 +584,7 @@ queries are dynamically built out of string literals, as shown in the AST level, the resultant SQL query is not going to appear as a string literal anymore and is thus indistinguishable from a potentially malicious string. To use these tools would require significantly -restricting developers' ability to build SQL queries. ``Literal[str]`` +restricting developers' ability to build SQL queries. ``LiteralString`` can provide similar safety guarantees with fewer restrictions. **Semgrep and pyanalyze**: Semgrep supports a more sophisticated @@ -604,15 +598,15 @@ has a similar extension. But neither handles function calls that construct and return safe SQL queries. For example, in the code sample below, ``build_insert_query`` is a helper function to create a query that inserts multiple values into the corresponding columns. Semgrep -and pyanalyze forbid this natural usage whereas ``Literal[str]`` +and pyanalyze forbid this natural usage whereas ``LiteralString`` handles it with no burden on the programmer: :: def build_insert_query( - table: Literal[str] - insert_columns: Iterable[Literal[str]], - ) -> Literal[str]: + table: LiteralString + insert_columns: Iterable[LiteralString], + ) -> LiteralString: sql = "INSERT INTO " + table column_clause = ", ".join(insert_columns) @@ -623,7 +617,7 @@ handles it with no burden on the programmer: def insert_data( conn: Connection, - kvs_to_insert: Dict[Literal[str], str] + kvs_to_insert: Dict[LiteralString, str] ) -> None: query = build_insert_query("data", kvs_to_insert.keys()) conn.execute(query, kvs_to_insert.values()) @@ -648,7 +642,7 @@ use them. They also usually take longer to run than a type checker immediate. Finally, they move the burden of preventing vulnerabilities on to library users instead of allowing the libraries themselves to specify precisely how their APIs must be called (as is possible with -``Literal[str]``). +``LiteralString``). One final reason to prefer using a new type over a dedicated tool is that type checkers are more widely used than dedicated security @@ -662,7 +656,7 @@ will mean that more developers benefit from them. Why not use a ``NewType`` for ``str``? -------------------------------------- -Any API for which ``Literal[str]`` would be suitable could instead be +Any API for which ``LiteralString`` would be suitable could instead be updated to accept a different type created within the Python type system, such as ``NewType("SafeSQL", str)``: @@ -700,7 +694,7 @@ show how this technique can `fail `_. Also note that this requires invasive changes to the source code -(wrapping the query with ``SafeSQL``) whereas ``Literal[str]`` +(wrapping the query with ``SafeSQL``) whereas ``LiteralString`` requires no such changes. Users can remain oblivious to it as long as they pass in literal strings to sensitive APIs. @@ -727,10 +721,10 @@ code*. There is no way to write a sanitizer that can reliably figure out which parts of an input string are benign and which ones are potentially malicious. -Runtime Checkable ``Literal[str]`` ----------------------------------- +Runtime Checkable ``LiteralString`` +----------------------------------- -The ``Literal[str]`` concept could be extended beyond static type +The ``LiteralString`` concept could be extended beyond static type checking to be a runtime checkable property of ``str`` objects. This would provide some benefits, such as allowing frameworks to raise errors on dynamic strings. Such runtime errors would be a more robust @@ -738,7 +732,7 @@ defense mechanism than type errors, which can potentially be suppressed, ignored, or never even seen if the author does not use a type checker. -This extension to the ``Literal[str]`` concept would dramatically +This extension to the ``LiteralString`` concept would dramatically increase the scope of the proposal by requiring changes to one of the most fundamental types in Python. While runtime taint checking on strings has been `considered `_ @@ -747,13 +741,73 @@ others may consider it in the future, such extensions are out of scope for this PEP. +Rejected Names +-------------- + +We considered a variety of names for the literal string type and +solicited ideas on `typing-sig +`_. +Some notable alternatives were: + ++ ``Literal[str]``: This is a natural extension of the + ``Literal["foo"]`` type name, but typing-sig `objected + `_ + that users could mistake this for the literal type of the ``str`` + class. + ++ ``LiteralStr``: This is shorter than ``LiteralString`` but looks + weird to the PEP authors. + ++ ``LiteralDerivedString``: This (along with + ``MadeFromLiteralString``) best captures the technical meaning of + the type. It represents not just the type of literal expressions, + such as ``"foo"``, but also that of expressions composed from + literals, such as ``"foo" + "bar"``. However, both names seem wordy. + ++ ``StringLiteral``: Users might confuse this with the existing + concept of `"string literals" + `_ + where the string exists as a syntactic token in the source code, + whereas our concept is more general. + ++ ``SafeString``: While this comes close to our intended meaning, it + may mislead users into thinking that the string has been sanitized in + some way, perhaps by escaping HTML tags or shell-related special + characters. + ++ ``ConstantStr``: This does not capture the idea of composing literal + strings. + ++ ``StaticStr``: This suggests that the string is statically + computable, i.e., computable without running the program, which is + not true. The literal string may vary based on runtime flags, as + seen in the `Motivation`_ examples. + ++ ``LiteralOnly[str]``: This has the advantage of being extensible to + other literal types, such as ``bytes`` or ``int``. However, we did + not find the extensibility worth the loss of readability. + +Overall, there was no clear winner on typing-sig over a long period, +so we decided to tip the scales in favor of ``LiteralString``. + + +``LiteralBytes`` +---------------- + +We could generalize literal byte types, such as ``Literal[b"foo"]``, +to ``LiteralBytes``. However, literal byte types are used much less +frequently than literal string types and we did not find much user +demand for ``LiteralBytes``, so we decided not to include it in this +PEP. Others may, however, consider it in future PEPs. + + Reference Implementation ======================== This is implemented in Pyre v0.9.8 and is actively being used. The implementation simply extends the type checker with -``Literal[str]`` as a supertype of literal string types. +``LiteralString`` as a supertype of literal string types. To support composition via addition, join, etc., it was sufficient to overload the stubs for ``str`` in Pyre's copy of typeshed. @@ -763,7 +817,7 @@ Appendix A: Other Uses ====================== To simplify the discussion and require minimal security knowledge, we -focused on SQL injections throughout the PEP. ``Literal[str]``, +focused on SQL injections throughout the PEP. ``LiteralString``, however, can also be used to prevent many other kinds of `injection vulnerabilities `_. @@ -787,12 +841,12 @@ following destructive command being run: echo 'Hello ' && rm -rf / #' This vulnerability could be prevented by updating ``run`` to only -accept ``Literal[str]`` when used in ``shell=True`` mode. Here is one +accept ``LiteralString`` when used in ``shell=True`` mode. Here is one simplified stub: :: - def run(command: Literal[str], *args: str, shell: bool=...): ... + def run(command: LiteralString, *args: str, shell: bool=...): ... Cross Site Scripting (XSS) -------------------------- @@ -817,16 +871,16 @@ which cause XSS vulnerabilities: return(dangerous_string) This vulnerability could be prevented by updating ``mark_safe`` to -only accept ``Literal[str]``: +only accept ``LiteralString``: :: - def mark_safe(s: Literal[str]) -> str: ... + def mark_safe(s: LiteralString) -> str: ... Server Side Template Injection (SSTI) ------------------------------------- -Templating frameworks such as Jinja allow Python expressions which +Templating frameworks, such as Jinja, allow Python expressions which will be evaluated and substituted into the rendered result: :: @@ -849,12 +903,12 @@ the application: # Result: The shell command 'rm - rf /' is run Template injection exploits like this could be prevented by updating -the ``Template`` API to only accept ``Literal[str]``: +the ``Template`` API to only accept ``LiteralString``: :: class Template: - def __init__(self, source: Literal[str]): ... + def __init__(self, source: LiteralString): ... Logging Format String Injection @@ -881,81 +935,82 @@ illustrates a simple denial of service scenario: logger.info(f'Received: {external_string}', some_dict) This kind of attack could be prevented by requiring that the format -string passed to the logger be a ``Literal[str]`` and that all +string passed to the logger be a ``LiteralString`` and that all externally controlled data be passed separately as arguments (as proposed in `Issue 46200 `_): :: - def info(msg: Literal[str], *args: object) -> None: + def info(msg: LiteralString, *args: object) -> None: ... Appendix B: Limitations ======================= -There are a number of ways ``Literal[str]`` could still fail to +There are a number of ways ``LiteralString`` could still fail to prevent users from passing strings built from non-literal data to an API: 1. If the developer does not use a type checker or does not add type annotations, then violations will go uncaught. -2. ``cast(Literal[str], non_literal_str)`` could be used to lie to the -type checker and allow a dynamic string value to masquerade as a -``Literal[str]``. The same goes for a variable that has type ``Any``. +2. ``cast(LiteralString, non_literal_string)`` could be used to lie to +the type checker and allow a dynamic string value to masquerade as a +``LiteralString``. The same goes for a variable that has type ``Any``. 3. Comments such as ``# type: ignore`` could be used to ignore warnings about non-literal strings. 4. Trivial functions could be constructed to convert a ``str`` to a -``Literal[str]``: +``LiteralString``: :: - def make_literal(s: str) -> Literal[str]: - letters: Dict[str, Literal[str]] = { + def make_literal(s: str) -> LiteralString: + letters: Dict[str, LiteralString] = { "A": "A", "B": "B", ... } - output: List[Literal[str]] = [letters[c] for c in s] + output: List[LiteralString] = [letters[c] for c in s] return "".join(output) We could mitigate the above using linting, code review, etc., but ultimately a clever, malicious developer attempting to circumvent the -protections offered by ``Literal[str]`` will always succeed. The -important thing to remember is that ``Literal[str]`` is not intended +protections offered by ``LiteralString`` will always succeed. The +important thing to remember is that ``LiteralString`` is not intended to protect against *malicious* developers; it is meant to protect against benign developers accidentally using sensitive APIs in a dangerous way (without getting in their way otherwise). -Without ``Literal[str]``, the best enforcement tool API authors have +Without ``LiteralString``, the best enforcement tool API authors have is documentation, which is easily ignored and often not seen. With -``Literal[str]``, API misuse requires conscious thought and artifacts +``LiteralString``, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice. .. _appendix_C: -Appendix C: ``str`` methods that preserve ``Literal[str]`` -========================================================== +Appendix C: ``str`` methods that preserve ``LiteralString`` +=========================================================== The ``str`` class has several methods that would benefit from -``Literal[str]``. For example, users might expect -``"hello".capitalize()`` to have the type ``Literal[str]`` similar to -the other examples we have seen in the `Inferring Literal[str] -`_ section. Inferring the type ``Literal[str]`` -is correct because the string is not an arbitrary user-supplied string -- we know that it has the type ``Literal["HELLO"]``, which is -compatible with ``Literal[str]``. In other words, the ``capitalize`` -method preserves the ``Literal[str]`` type. There are several other -``str`` methods that preserve ``Literal[str]``. +``LiteralString``. For example, users might expect +``"hello".capitalize()`` to have the type ``LiteralString`` similar to +the other examples we have seen in the `Inferring LiteralString +`_ section. Inferring the type +``LiteralString`` is correct because the string is not an arbitrary +user-supplied string - we know that it has the type +``Literal["HELLO"]``, which is compatible with ``LiteralString``. In +other words, the ``capitalize`` method preserves the ``LiteralString`` +type. There are several other ``str`` methods that preserve +``LiteralString``. We propose updating the stub for ``str`` in typeshed so that the -methods are overloaded with the ``Literal[str]``-preserving +methods are overloaded with the ``LiteralString``-preserving versions. This means type checkers do not have to hardcode -``Literal[str]`` behavior for each method. It also lets us easily +``LiteralString`` behavior for each method. It also lets us easily support new methods in the future by updating the typeshed stub. For example, to preserve literal types for the ``capitalize`` method, @@ -968,7 +1023,7 @@ we would change the stub as below: # after @overload - def capitalize(self: Literal[str]) -> Literal[str]: ... + def capitalize(self: LiteralString) -> LiteralString: ... @overload def capitalize(self) -> str: ... @@ -978,205 +1033,205 @@ understand. Type checkers may need to special-case ``str`` to make error messages understandable for users. Below is an exhaustive list of ``str`` methods which, when called as -indicated with arguments of type ``Literal[str]``, must be treated as -returning a ``Literal[str]``. If this PEP is accepted, we will update +indicated with arguments of type ``LiteralString``, must be treated as +returning a ``LiteralString``. If this PEP is accepted, we will update these method signatures in typeshed: :: @overload - def capitalize(self: Literal[str]) -> Literal[str]: ... + def capitalize(self: LiteralString) -> LiteralString: ... @overload def capitalize(self) -> str: ... @overload - def casefold(self: Literal[str]) -> Literal[str]: ... + def casefold(self: LiteralString) -> LiteralString: ... @overload def casefold(self) -> str: ... @overload - def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + def center(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ... @overload def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... if sys.version_info >= (3, 8): @overload - def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ... + def expandtabs(self: LiteralString, tabsize: SupportsIndex = ...) -> LiteralString: ... @overload def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ... else: @overload - def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ... + def expandtabs(self: LiteralString, tabsize: int = ...) -> LiteralString: ... @overload def expandtabs(self, tabsize: int = ...) -> str: ... @overload - def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ... + def format(self: LiteralString, *args: LiteralString, **kwargs: LiteralString) -> LiteralString: ... @overload def format(self, *args: str, **kwargs: str) -> str: ... @overload - def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ... + def join(self: LiteralString, __iterable: Iterable[LiteralString]) -> LiteralString: ... @overload def join(self, __iterable: Iterable[str]) -> str: ... @overload - def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + def ljust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ... @overload def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... @overload - def lower(self: Literal[str]) -> Literal[str]: ... + def lower(self: LiteralString) -> LiteralString: ... @overload - def lower(self) -> Literal[str]: ... + def lower(self) -> LiteralString: ... @overload - def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + def lstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ... @overload def lstrip(self, __chars: str | None = ...) -> str: ... @overload - def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... + def partition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ... @overload def partition(self, __sep: str) -> tuple[str, str, str]: ... @overload - def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ... + def replace(self: LiteralString, __old: LiteralString, __new: LiteralString, __count: SupportsIndex = ...) -> LiteralString: ... @overload def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ... if sys.version_info >= (3, 9): @overload - def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ... + def removeprefix(self: LiteralString, __prefix: LiteralString) -> LiteralString: ... @overload def removeprefix(self, __prefix: str) -> str: ... @overload - def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ... + def removesuffix(self: LiteralString, __suffix: LiteralString) -> LiteralString: ... @overload def removesuffix(self, __suffix: str) -> str: ... @overload - def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ... + def rjust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ... @overload def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ... @overload - def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ... + def rpartition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ... @overload def rpartition(self, __sep: str) -> tuple[str, str, str]: ... @overload - def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... + def rsplit(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ... @overload def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ... @overload - def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + def rstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ... @overload def rstrip(self, __chars: str | None = ...) -> str: ... @overload - def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ... + def split(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ... @overload def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ... @overload - def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ... + def splitlines(self: LiteralString, keepends: bool = ...) -> list[LiteralString]: ... @overload def splitlines(self, keepends: bool = ...) -> list[str]: ... @overload - def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ... + def strip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ... @overload def strip(self, __chars: str | None = ...) -> str: ... @overload - def swapcase(self: Literal[str]) -> Literal[str]: ... + def swapcase(self: LiteralString) -> LiteralString: ... @overload def swapcase(self) -> str: ... @overload - def title(self: Literal[str]) -> Literal[str]: ... + def title(self: LiteralString) -> LiteralString: ... @overload def title(self) -> str: ... @overload - def upper(self: Literal[str]) -> Literal[str]: ... + def upper(self: LiteralString) -> LiteralString: ... @overload def upper(self) -> str: ... @overload - def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ... + def zfill(self: LiteralString, __width: SupportsIndex) -> LiteralString: ... @overload def zfill(self, __width: SupportsIndex) -> str: ... @overload - def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ... + def __add__(self: LiteralString, __s: LiteralString) -> LiteralString: ... @overload def __add__(self, __s: str) -> str: ... @overload - def __iter__(self: Literal[str]) -> Iterator[str]: ... + def __iter__(self: LiteralString) -> Iterator[str]: ... @overload def __iter__(self) -> Iterator[str]: ... @overload - def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ... + def __mod__(self: LiteralString, __x: Union[LiteralString, Tuple[LiteralString, ...]]) -> str: ... @overload def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ... @overload - def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ... + def __mul__(self: LiteralString, __n: SupportsIndex) -> LiteralString: ... @overload def __mul__(self, __n: SupportsIndex) -> str: ... @overload - def __repr__(self: Literal[str]) -> Literal[str]: ... + def __repr__(self: LiteralString) -> LiteralString: ... @overload def __repr__(self) -> str: ... @overload - def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ... + def __rmul__(self: LiteralString, n: SupportsIndex) -> LiteralString: ... @overload def __rmul__(self, n: SupportsIndex) -> str: ... @overload - def __str__(self: Literal[str]) -> Literal[str]: ... + def __str__(self: LiteralString) -> LiteralString: ... @overload def __str__(self) -> str: ... -Appendix D: Guidelines for using ``Literal[str]`` in Stubs -========================================================== +Appendix D: Guidelines for using ``LiteralString`` in Stubs +=========================================================== Libraries that do not contain type annotations within their source may specify type stubs in Typeshed. Libraries written in other languages, such as those for machine learning, may also provide Python type stubs. This means the type checker cannot verify that the type annotations match the source code and must trust the type stub. Thus, -authors of type stubs need to be careful when using ``Literal[str]`` +authors of type stubs need to be careful when using ``LiteralString`` since a function may falsely appear to be safe when it is not. -We recommend the following guidelines for using ``Literal[str]`` in stubs: +We recommend the following guidelines for using ``LiteralString`` in stubs: -+ If the stub is for a function, we recommend using ``Literal[str]`` ++ If the stub is for a function, we recommend using ``LiteralString`` in the return type of the function or of its overloads only if all the corresponding arguments have literal types (i.e., - ``Literal[str]`` or ``Literal["a", "b"]``). + ``LiteralString`` or ``Literal["a", "b"]``). :: # OK @overload - def my_transform(x: Literal[str], y: Literal["a", "b"]) -> Literal[str]: ... + def my_transform(x: LiteralString, y: Literal["a", "b"]) -> LiteralString: ... @overload def my_transform(x: str, y: str) -> str: ... # Not OK @overload - def my_transform(x: Literal[str], y: str) -> Literal[str]: ... + def my_transform(x: LiteralString, y: str) -> LiteralString: ... @overload def my_transform(x: str, y: str) -> str: ... @@ -1184,17 +1239,17 @@ We recommend the following guidelines for using ``Literal[str]`` in stubs: guideline as above. + If the stub is for any other kind of method, we recommend against - using ``Literal[str]`` in the return type of the method or any of + using ``LiteralString`` in the return type of the method or any of its overloads. This is because, even if all the explicit arguments - have type ``Literal[str]``, the object itself may be created using + have type ``LiteralString``, the object itself may be created using user data and thus the return type may be user-controlled. + If the stub is for a class attribute or global variable, we also - recommend against using ``Literal[str]`` because the untyped code + recommend against using ``LiteralString`` because the untyped code may write arbitrary values to the attribute. However, we leave the final call to the library author. They may use -``Literal[str]`` if they feel confident that the string returned by +``LiteralString`` if they feel confident that the string returned by the method or function or the string stored in the attribute is guaranteed to have a literal type - i.e., the string is created by applying only literal-preserving ``str`` operations to a string @@ -1202,7 +1257,7 @@ literal. Note that these guidelines do not apply to inline type annotations since the type checker can verify that, say, a method returning -``Literal[str]`` does in fact return an expression of that type. +``LiteralString`` does in fact return an expression of that type. Resources @@ -1214,8 +1269,8 @@ Literal String Types in Scala Scala `uses `_ ``Singleton`` as the supertype for singleton types, which includes -literal string types such as ``"foo"``. ``Singleton`` is Scala's -generalized analogue of this PEP's ``Literal[str]``. +literal string types, such as ``"foo"``. ``Singleton`` is Scala's +generalized analogue of this PEP's ``LiteralString``. Tamer Abdulradi showed how Scala's literal string types can be used for "Preventing SQL injection at compile time", Scala Days talk