PEP 675: Arbitrary literal strings (#2167)

2021-12-01 09:57:41 -08:00 · 2021-12-01 09:57:41 -08:00 · 21f6993114
parent 7d8c2a104a
commit 21f6993114
2 changed files with 951 additions and 0 deletions
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@ -529,6 +529,7 @@ pep-0671.rst  @rosuav
 pep-0672.rst  @encukou
 pep-0673.rst  @jellezijlstra
 pep-0674.rst  @vstinner
+pep-0675.rst  @jellezijlstra
 # ...
 # pep-0754.txt
 # ...
--- a/pep-0675.rst
+++ b/pep-0675.rst
@ -0,0 +1,950 @@
+PEP: 675
+Title: Arbitrary Literal Strings
+Version: $Revision$
+Last-Modified: $Date$
+Author: Pradeep Kumar Srinivasan <gohanpra@gmail.com>, Graham Bleaney <gbleaney@gmail.com>
+Sponsor: Jelle Zijlstra <jelle.zijlstra@gmail.com>
+Discussions-To: Typing-Sig <typing-sig@python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 30-Nov-2021
+Python-Version: 3.11
+Post-History:
+
+Abstract
+========
+
+There is currently no way to specify that a function parameter can be
+of any literal string type; we have to specify the precise literal
+string, such as ``Literal["foo"]``. This PEP introduces a supertype of
+literal string types: ``Literal[str]``. This allows a function to
+accept arbitrary literal string types such as ``Literal["foo"]`` or
+``Literal["bar"]``.
+
+Motivation
+==========
+
+A common security vulnerability is for a program to include
+user-controlled data in a command it executes. For example, a naive
+way to look up a user record from a database is to accept a user id
+and insert it into a predefined SQL query:
+
+::
+
+    def query_user(conn: Connection, user_id: str) -> User:
+        query = f"SELECT * FROM data WHERE user_id = {user_id}"
+        conn.execute(query)
+
+    query_user(conn, "user123")  # OK.
+
+However, the user-controlled data ``user_id`` is being mixed with the
+SQL command string, which means a malicious user could run arbitrary
+SQL commands:
+
+::
+
+    # Delete the table.
+    query_user(conn, "user123; DROP TABLE data;")
+
+    # Fetch all users (since 1 = 1 is always true).
+    query_user(conn, "user123 OR 1 = 1")
+
+
+To prevent such SQL injection attacks, SQL APIs offer parameterized
+queries, which separate the executed query from user-controlled data
+and make it impossible to run arbitrary queries. For example, with
+`sqlite3 <https://docs.python.org/3/library/sqlite3.html>`_, our
+original function would be written safely as a query with parameters:
+
+::
+
+    def query_user(conn: Connection, user_id: str) -> User:
+        query = "SELECT * FROM data WHERE user_id = ?"
+        conn.execute(query, (user_id,))
+
+
+The problem is that there is no way to enforce this
+discipline. sqlite3's own `documentation
+<https://docs.python.org/3/library/sqlite3.html>`_ can only admonish
+the reader to not dynamically build the ``sql`` argument from external
+input; the API's authors cannot express that through the type
+system. Users can (and often do) still use a convenient f-string as
+before and leave their code vulnerable to SQL injection.
+
+Existing tools, such as the popular security linter `Bandit
+<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_,
+attempt to detect unsafe external data used in SQL APIs, by inspecting
+the AST or by other semantic pattern-matching. These tools, however,
+preclude common idioms like storing a large multi-line query in a
+variable before executing it, adding literal string modifiers to the
+query based on some conditions, or transforming the query string using
+a function. (We survey existing tools in the "Rejected Alternatives"
+section.) For example, many tools will detect a false positive issue
+in this benign snippet:
+
+
+::
+
+    def query_data(conn: Connection, user_id: str, limit: bool) -> None:
+        query = """
+            SELECT
+                user.name,
+                user.age
+            FROM data
+            WHERE user_id = ?
+        """
+        if limit:
+            query += " LIMIT 1"
+
+        conn.execute(query, (user_id,))
+
+We want to forbid harmful execution of user-controlled data while
+still allowing benign idioms like the above and not requiring extra
+user work.
+
+To meet this goal, we introduce the ``Literal[str]`` type, which only
+accepts string values that are known to be made of literals. This is a
+generalization of the ``Literal["foo"]`` type from `PEP 586
+<https://www.python.org/dev/peps/pep-0586/>`_. A string of type
+``Literal[str]`` cannot contain user-controlled data. Thus, any API
+that only accepts ``Literal[str]`` will be immune to injection
+vulnerabilities (with pragmatic `limitations <Appendix B:
+Limitations_>`_).
+
+Since we want the ``sqlite3`` ``execute`` method to disallow strings
+built with user input, we would make its `typeshed stub
+<https://github.com/python/typeshed/blob/1c88ceeee924ec6cfe05dd4865776b49fec299e6/stdlib/sqlite3/dbapi2.pyi#L153>`_
+accept a ``sql`` query that is of type ``Literal[str]``:
+
+::
+
+    def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ...
+
+
+This successfully forbids our unsafe SQL example. The variable
+``query`` below is inferred to have type ``str``, since it is created
+from a format string using ``user_id``, and cannot be passed to
+``execute``:
+
+::
+
+    def query_user(conn: Connection, user_id: str) -> User:
+        query = f"SELECT * FROM data WHERE user_id = {user_id}"
+        conn.execute(query)
+        # Error: Expected Literal[str], got str.
+
+The method remains flexible enough to allow our more complicated
+example:
+
+::
+
+    def query_data(conn: Connection, user_id: str, limit: bool) -> None:
+        # This is a literal string.
+        query = """
+            SELECT
+                user.name,
+                user.age
+            FROM data
+            WHERE user_id = ?
+        """
+
+        if limit:
+            # Still has type Literal[str] because we added a literal string.
+            query += " LIMIT 1"
+
+        conn.execute(query, (user_id,))  # OK
+
+Notice that the user did not have to change their SQL code at all. The
+type checker was able to infer the literal string type and complain
+only in case of violations. The ``Literal[str]`` type is also useful
+in other cases where we want strict command-data separation, such as
+when building shell commands or when rendering a string into an HTML
+response without escaping (eg. via Django's ``mark_safe``
+function). Overall, this combination of strictness and flexibility
+makes it easy to enforce safer API usage in sensitive code without
+burdening users.
+
+Usage statistics
+----------------
+
+In a sample of open-source projects using ``sqlite3``, we found that
+``conn.execute`` was called `~67%
+<https://grep.app/search?q=conn%5C.execute%5C%28%5Cs%2A%5B%27%22%5D&regexp=true&filter[lang][0]=Python>`_
+of the time with a safe string literal and `~33%
+<https://grep.app/search?current=3&q=conn%5C.execute%5C%28%5Ba-zA-Z_%5D%2B%5C%29&regexp=true&filter[lang][0]=Python>`_
+of the time with an unsafe, dynamically-built local variable. Using
+this PEP's literal string type along with a type checker would have
+prevented ``execute`` from being called in such an unsafe manner.
+
+Rationale
+=========
+
+Firstly, why use *types* to prevent security vulnerabilities?
+
+Warning users in documentation is insufficient - most users either
+never see these warnings or ignore them. Using an existing dynamic or
+static analysis approach is too restrictive - these prevent natural
+idioms, as we saw in the `Motivation`_ section (and will discuss more
+extensively in the `Rejected Alternatives`_ section). The typing-based
+approach in this PEP strikes a user-friendly balance between
+strictness and flexibility.
+
+Runtime approaches do not work because, at runtime, the query string
+is a plain ``str``. While we could prevent some exploits using
+heuristics, such as regex-filtering for obviously malicious payloads,
+there will always be a way to work around them (perfectly
+distinguishing good and bad queries reduces to the halting problem).
+
+Static approaches like checking the AST to see if the query string is
+a literal string expression cannot tell when a string is assigned to
+an intermediate variable or when it is transformed by a benign
+function. This makes them overly restrictive.
+
+The type checker, surprisingly, does better than both because it has
+access to information not available in the runtime or static analysis
+approaches. Specifically, the type checker can tell us whether an
+expression has a literal string type, say ``Literal["foo"]``. The type
+checker already propagates types across variable assignments or
+function calls.
+
+In the current type system itself, if the SQL or shell command
+execution function only accepted three possible input strings, our job
+would be done. We would just say:
+
+::
+
+    def execute(query: Literal["foo", "bar", "baz"]) -> None: ...
+
+But, of course, ``execute`` can accept *any* possible query. How do we
+ensure that the query does not contain an arbitrary, user-controlled
+string?
+
+We want to specify that the value must be of some type
+``Literal[<...>]`` where ``<...>`` is some string. This is what
+``Literal[str]`` represents. ``Literal[str]`` is the "supertype" of
+all literal string types. Any particular literal string such as
+``Literal["foo"]`` or ``Literal["bar"]`` is compatible with
+``Literal[str]``, but not the other way around. The "supertype" of
+``Literal[str]`` itself is ``str``. So, ``Literal[str]`` itself is
+compatible with ``str``, but not the other way around. In effect, this
+PEP just introduces a type in the type hierarchy between
+``Literal["foo"]`` and ``str``.
+
+Note that a ``Union`` of literal types is naturally compatible with
+``Literal[str]`` because each element of the ``Union`` is individually
+compatible with ``Literal[str]``. So, ``Literal["foo", "bar"]`` is
+compatible with ``Literal[str]``.
+
+However, recall that we don't just want to represent exact literal
+queries. We also want to support composition of two literal strings,
+such as ``query + " LIMIT 1"``. This too is possible with the above
+concept. If ``x`` and ``y`` are two values of type ``Literal[str]``,
+then ``x + y`` will also be of type compatible with
+``Literal[str]``. We can reason about this by looking at specific
+instances such as ``Literal["foo"]`` and ``Literal["bar"]``; the value
+of the added string ``x + y`` can only be ``"foobar"``, which has type
+``Literal["foobar"]`` and is thus compatible with
+``Literal[str]``. The same reasoning applies when ``x`` and ``y`` are
+unions of literal types; the result of pairwise adding any two literal
+types from ``x`` and ``y`` respectively is a literal type, which means
+that the overall result is a ``Union`` of literal types and is thus
+compatible with ``Literal[str]``.
+
+In this way, we are able to leverage Python's concept of a ``Literal``
+string type to specify that our API can only accept strings that are
+known to be constructed from literals. More specific details follow in
+the remaining sections.
+
+Valid Locations for ``Literal[str]``
+=========================================
+
+``Literal[str]`` can be used where any other type can be used:
+
+::
+
+    variable_annotation: Literal[str]
+
+    def my_function(literal_string: Literal[str]) -> Literal[str]: ...
+
+    class Foo:
+        my_attribute: Literal[str]
+
+    type_argument: List[Literal[str]]
+
+    T = TypeVar("T", bound=Literal[str])
+
+It can be nested within unions of ``Literal`` types:
+
+::
+
+    union: Literal["hello", Literal[str]]
+    union2: Literal["hello", str]
+    union3: Literal[str, 4]
+
+    nested_literal_string: Literal[Literal[str]]
+
+
+The restrictions on the parameters of ``Literal`` are the same as in
+`PEP 586 <https://www.python.org/dev/peps/pep-0586/>`_. The only legal
+parameter is the literal value ``str``. Other values are rejected even
+if they evaluate to the same value (``str``), such as
+``Literal[(lambda x: x)(str)]``.
+
+Type Inference
+==============
+
+
+Inferring ``Literal[str]``
+--------------------------
+
+Any literal string type is compatible with ``Literal[str]``. For
+example, ``x: Literal[str] = "foo"`` is valid because ``"foo"`` is
+inferred to be of type ``Literal["foo"]``.
+
+As per the `Rationale`_, we also infer ``Literal[str]`` in the
+following cases:
+
+ Addition: ``x + y`` is of type ``Literal[str]`` if both ``x`` and
+  ``y`` are compatible with ``Literal[str]``.
+
+ Joining: ``sep.join(xs)`` is of type ``Literal[str]`` if ``sep``'s
+  type is compatible with ``Literal[str]`` and ``xs``'s type is
+  compatible with ``Iterable[Literal[str]]``.
+
+ In-place addition: If ``s`` has type ``Literal[str]`` and ``x`` has
+  type compatible with ``Literal[str]``, then ``s += x`` preserves
+  ``s``'s type as ``Literal[str]``.
+
+ String formatting: An f-string has type ``Literal[str]`` if and only
+  if its constituent expressions are literal strings. ``s.format(...)``
+  has type ``Literal[str]`` if and only if ``s`` and the arguments have
+  types compatible with ``Literal[str]``.
+
+In all other cases, if one or more of the composed values has a
+non-literal type ``str``, the composition of types will have type
+``str``. For example, if ``s`` has type ``str``, then ``"hello" + s``
+has type ``str``. This matches the pre-existing behavior of type
+checkers.
+
+``Literal[str]`` is compatible with the type ``str``. It inherits all
+methods from ``str``. So, if we have a variable ``s`` of type
+``Literal[str]``, it is safe to write ``s.startswith("hello")``.
+
+Note that, beyond the few composition rules mentioned above, this PEP
+doesn't change inference for other ``str`` methods such as
+``literal_string.upper()``.
+
+Some type checkers refine the type of a string when doing an equality
+check:
+
+::
+
+    def foo(s: str) -> None:
+        if s == "bar":
+            reveal_type(s)  # => Literal["bar"]
+
+Such a refined type in the if-block is also compatible with
+``Literal[str]`` because its type is ``Literal["bar"]``.
+
+
+Examples
+--------
+
+See the examples below to help clarify the above rules:
+
+::
+
+
+    literal_string: Literal[str]
+    s: str = literal_string  # OK
+
+    literal_string: Literal[str] = s  # Error: Expected Literal[str], got str.
+    literal_string: Literal[str] = "hello" # OK
+
+
+    def expect_literal_str(s: Literal[str]) -> None: ...
+
+Addition of literal strings:
+
+::
+
+    expect_literal_str("foo" + "bar")  # OK
+    expect_literal_str(literal_string + "bar")  # OK
+    literal_string2: Literal[str]
+    expect_literal_str(literal_string + literal_string2)  # OK
+    plain_str: str
+    expect_literal_str(literal_string + plain_str)  # Not OK.
+
+Join using literal strings:
+
+::
+
+    expect_literal_str(",".join(["foo", "bar"]))  # OK
+    expect_literal_str(literal_string.join(["foo", "bar"]))  # OK
+    expect_literal_str(literal_string.join([literal_string, literal_string2]))  # OK
+    xs: List[Literal[str]]
+    expect_literal_str(literal_string.join(xs)) # OK
+    expect_literal_str(plain_str.join([literal_string, literal_string2]))
+    # Not OK because the separator has type ``str``.
+
+In-place addition using literal strings:
+
+::
+
+    literal_string += "foo"  # OK
+    literal_string += literal_string2  # OK
+    literal_string += plain_str # Not OK
+
+Format strings using literal strings:
+
+::
+
+    literal_name: Literal[str]
+    expect_literal_str(f"hello {literal_name}")
+    # OK because it is composed from literal strings.
+
+    expect_literal_str("hello {}".format(literal_name))  # OK
+
+    expect_literal_str(f"hello")  # OK
+
+    expect_literal_str(f"hello {username}")
+    # NOT OK. The format-string is constructed from ``username``,
+    # which has type ``str``.
+
+    expect_literal_str("hello {}".format(username))  # Not OK
+
+Other literal types, such as literal integers, are not compatible with ``Literal[str]``:
+
+::
+
+    some_int: int
+    expect_literal_str(some_int)  # Error: Expected Literal[str], got int.
+
+    literal_one: Literal[1] = 1
+    expect_literal_str(literal_one)  # Error: Expected Literal[str], got Literal[1].
+
+
+We can call functions on literal strings:
+
+::
+
+    def add_limit(query: Literal[str]) -> Literal[str]:
+        return query + " LIMIT = 1"
+
+    def my_query(query: Literal[str], user_id: str) -> None:
+        sql_connection().execute(add_limit(query), (user_id,))  # OK
+
+Conditional statements and expressions work as expected:
+
+::
+
+    def return_literal_str() -> Literal[str]:
+        return "foo" if condition1() else "bar"  # OK
+
+    def return_literal_str2(literal_str: Literal[str]) -> Literal[str]:
+        return "foo" if condition1() else literal_str  # OK
+
+    def return_literal_str3() -> Literal[str]:
+        if condition1():
+            result: Literal["foo"] = "foo"
+        else:
+            result: Literal[str] = "bar"
+
+        return result  # OK
+
+
+Interaction with TypeVars and Generics
+--------------------------------------
+
+TypeVars can be bound to ``Literal[str]``:
+
+::
+
+    from typing import Literal, TypeVar
+
+    TLiteral = TypeVar("TLiteral", bound=Literal[str])
+
+    def literal_identity(s: TLiteral) -> TLiteral:
+        return s
+
+    hello: Literal["hello"] = "hello"
+    y = literal_identity(hello)
+    reveal_type(y)  # => Literal["hello"]
+
+    s: Literal[str]
+    y2 = literal_identity(s)
+    reveal_type(y2)  # => Literal[str]
+
+    s_error: str
+    literal_identity(s_error)
+    # Error: Expected TLiteral (bound to Literal[str]), got str.
+
+
+``Literal[str]`` can be used as type arguments for generic classes:
+
+::
+
+    class Container(Generic[T]):
+        def __init__(self, value: T) -> None:
+            self.value = value
+
+    literal_str: Literal[str] = "hello"
+    x: Container[Literal[str]] = Container(literal_str)  # OK
+
+    s: str
+    x_error: Container[Literal[str]] = Container(s)  # Not OK
+
+Standard containers like ``List`` work as expected:
+
+::
+
+    xs: List[Literal[str]] = ["foo", "bar", "baz"]
+
+Interactions with Overloads
+---------------------------
+
+Literal strings and overloads do not need to interact in a special
+way: the existing rules work fine. ``Literal[str]`` can be used as a
+fallback overload where a specific ``Literal["foo"]`` type does not
+match:
+
+::
+
+    @overload
+    def foo(x: Literal["foo"]) -> int: ...
+    @overload
+    def foo(x: Literal[str]) -> bool: ...
+    @overload
+    def foo(x: str) -> str: ...
+
+    x1: int = foo("foo")  # First overload.
+    x2: bool = foo("bar")  # Second overload.
+    s: str
+    x3: str = foo(s)  # Third overload.
+
+Backwards Compatibility
+-----------------------
+
+As PEP 586 `mentions
+<https://www.python.org/dev/peps/pep-0586/#backwards-compatibility>`_,
+type checkers "should feel free to experiment with more sophisticated
+inference techniques". So, if the type checker infers a literal string
+type for an unannotated variable that is initialized with a literal
+string, the following example should be OK:
+
+::
+
+    x = "hello"
+    expect_literal_str(x)
+    # OK, because x is inferred to have type ``Literal["hello"]``.
+
+This enables precise type checking of idiomatic SQL query code without
+annotating the code at all (as seen in the `Motivation`_ section
+example).
+
+However, like PEP 586, this PEP does not mandate the above inference
+strategy. In case the type checker doesn't infer ``x`` to have type
+``Literal["hello"]``, users can aid the type checker by explicitly
+annotating it as ``x: Literal[str]``:
+
+::
+
+    x: Literal[str] = "hello"
+    expect_literal_str(x)
+
+Runtime behavior
+================
+
+This PEP does not change the runtime behavior of ``Literal``.
+
+Backwards compatibility
+=======================
+
+Backwards compatibility: ``Literal[str]`` is acceptable at runtime, so
+this doesn't require any changes to the Python runtime itself. PEP 586
+already backports ``Literal``, so this PEP does not need to change it.
+
+
+Rejected Alternatives
+=====================
+
+Why not use tool X?
+-------------------
+
+Focusing solely on the example of preventing SQL injection, tooling to
+catch this kind of issue seems to come in three flavors: AST based,
+function level analysis, and taint flow analysis.
+
+**AST based tools include Bandit**: `Bandit
+<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_
+has a plugin to warn when SQL queries are not literal
+strings. The problem is that many perfectly safe SQL
+queries are dynamically built out of string literals, as shown in the
+`Motivation`_ section. At the
+AST level, the resultant SQL query is not going to appear as a string
+literal anymore and is thus indistinguishable from a potentially
+malicious string. To use these tools would require significantly
+restricting developers' ability to build SQL queries. ``Literal[str]``
+can provide similar safety guarantees with fewer restrictions.
+
+**Semgrep and pyanalyze**: Semgrep supports a more sophisticated
+function level analysis, including `constant propagation
+<https://semgrep.dev/docs/writing-rules/data-flow/#constant-propagation>`_
+within a function. This allows us to prevent injection attacks while
+permitting some forms of safe dynamic SQL queries within a
+function. `pyanalyze
+<https://github.com/quora/pyanalyze/blob/afcb58cd3e967e4e3fea9e57bb18b6b1d9d42ed7/README.md#extending-pyanalyze>`_
+has a similar extension. But neither handles function calls that
+construct and return safe SQL queries. For example, in the code sample
+below, ``build_insert_query`` is a helper function to create a query
+that inserts multiple values into the corresponding columns. Semgrep
+and pyanalyze forbid this natural usage whereas ``Literal[str]``
+handles it with no burden on the programmer:
+
+::
+
+    def build_insert_query(
+        table: Literal[str]
+        insert_columns: Iterable[Literal[str]],
+    ) -> Literal[str]:
+        sql = "INSERT INTO " + table
+
+        column_clause = ", ".join(insert_columns)
+        value_clause = ", ".join(["?"] * len(insert_columns))
+
+        sql += f" ({column_clause}) VALUES ({value_clause})"
+        return sql
+
+    def insert_data(
+        conn: Connection,
+        kvs_to_insert: Dict[Literal[str], str]
+    ) -> None:
+        query = build_insert_query("data", kvs_to_insert.keys())
+        conn.execute(query, kvs_to_insert.values())
+
+    # Example usage
+    data_to_insert = {
+        "column_1": value_1, # Note: values are not literals
+        "column_2": value_2,
+        "column_3": value_3,
+    }
+    insert_data(conn, data_to_insert)
+
+
+**Taint flow analysis**: Tools such as `Pysa
+<https://pyre-check.org/docs/pysa-basics/>`_ or `CodeQL
+<https://codeql.github.com/>`_ are capable of tracking data flowing
+from a user controlled input into a SQL query. These tools are
+powerful but involve considerable overhead in setting up the tool in
+CI, defining "taint" sinks and sources, and teaching developers how to
+use them. They also usually take longer to run than a type checker
+(minutes instead of seconds), which means feedback is not
+immediate. Finally, they move the burden of preventing vulnerabilities
+on to library users instead of allowing the libraries themselves to
+specify precisely how their APIs must be called (as is possible with
+``Literal[str]``).
+
+
+Why not use a ``NewType`` for ``str``?
+--------------------------------------
+
+Any API for which ``Literal[str]`` would be suitable could instead be
+updated to accept a different type created within the Python type
+system, such as ``NewType("SafeSQL", str)``:
+
+::
+
+    SafeSQL = NewType("SafeSQL", str)
+
+
+    def execute(self, sql: SafeSQL, parameters: Iterable[str] = ...) -> Cursor: ...
+
+    execute(SafeSQL("SELECT * FROM data WHERE user_id = ?"), user_id)  # OK
+
+    user_query: str
+    execute(user_query)  # Error: Expected SafeSQL, got str.
+
+
+Having to create a new type to call an API might give some developers
+pause and encourage more caution, but it doesn't guarantee that
+developers won't just turn a user controlled string into the new type,
+and pass it into the modified API anyway:
+
+::
+
+    query = f"SELECT * FROM data WHERE user_id = f{user_id}"
+    execute(SafeSQL(query))  # No error!
+
+We are back to square one with the problem of preventing arbitrary
+inputs to ``SafeSQL``. This is not a theoretical concern
+either. Django uses the above approach with ``SafeString`` and
+`mark_safe
+<https://docs.djangoproject.com/en/dev/_modules/django/utils/safestring/#SafeString>`_. Issues
+such as `CVE-2020-13596
+<https://github.com/django/django/commit/2dd4d110c159d0c81dff42eaead2c378a0998735>`_
+show how this technique can `fail
+<https://nvd.nist.gov/vuln/detail/CVE-2020-13596>`_.
+
+Also note that this requires invasive changes to the source code
+(wrapping the query with ``SafeSQL``) whereas ``Literal[str]``
+requires no such changes. Users can remain oblivious to it as long as
+they pass in literal strings to sensitive APIs.
+
+Why not try to emulate Trusted Types?
+-------------------------------------
+
+`Trusted Types
+<https://w3c.github.io/webappsec-trusted-types/dist/spec/>`_ is a W3C
+specification for preventing DOM-based Cross Site Scripting (XSS). XSS
+occurs when dangerous browser APIs accept raw user-controlled
+strings. The specification modifies these APIs to accept only the
+"Trusted Types" returned by designated sanitizing functions. These
+sanitizing functions must take in a potentially malicious string and
+validate it or render it benign somehow, for example by verifying that
+it is a valid URL or HTML-encoding it.
+
+It can be tempting to assume porting the concept of Trusted Types to
+Python could solve the problem. The fundamental difference, however,
+is that the output of a Trusted Types sanitizer is usually intended
+*to not be executable code*. Thus it's easy to HTML encode the input,
+strip out dangerous tags, or otherwise render it inert. With a SQL
+query or shell command, the end result *still needs to be executable
+code*. There is no way to write a sanitizer that can reliably figure
+out which parts of an input string are benign and which ones are
+potentially malicious.
+
+Runtime Checkable ``Literal[str]``
+----------------------------------
+
+The ``Literal[str]`` concept could be extended beyond static type
+checking to be a runtime checkable property of ``str`` objects. This
+would provide some benefits, such as allowing frameworks to raise
+errors on dynamic strings. Such runtime errors would be a more robust
+defense mechanism than type errors, which can potentially be
+suppressed, ignored, or never even seen if the author does not use a
+type checker.
+
+This extension to the ``Literal[str]`` concept would dramatically
+increase the scope of the proposal by requiring changes to one of the
+most fundamental types in Python. While runtime taint checking on
+strings has been `considered <https://bugs.python.org/issue500698>`_
+and `attempted <https://github.com/felixgr/pytaint>`_ in the past, and
+others may consider it in the future, such extensions are out of scope
+for this PEP.
+
+
+Reference Implementation
+========================
+
+This is implemented in Pyre v0.9.8 and is actively being used.
+
+The implementation simply extends the type checker with
+``Literal[str]`` as a supertype of literal string types.
+
+To support composition via addition, join, etc., it was sufficient to
+overload the stubs for ``str`` in Pyre's copy of typeshed. For
+example, we replaced ``str`` ``__add__``:
+
+::
+
+    # Before:
+    def __add__(self, s: str) -> str: ...
+
+    # After:
+    @overload
+    def __add__(self: Literal[str], other: Literal[str]) -> Literal[str]: ...
+    @overload
+    def __add__(self, other: str) -> str: ...
+
+This means that addition of non-literal string types remains to have
+type ``str``. The only change is that addition of literal string types
+now produces ``Literal[str]``.
+
+One implementation strategy is to update the official Typeshed `stub
+<https://github.com/python/typeshed/blob/aa7e277adb9049e24ea3434fc9848defbfa87673/stdlib/builtins.pyi#L420>`_
+for ``str`` with these changes.
+
+Appendix A: Other Uses
+======================
+
+To simplify the discussion and require minimal security knowledge, we
+focused on SQL injections throughout the PEP. ``Literal[str]``,
+however, can also be used to prevent many other kinds of `injection
+vulnerabilities <https://owasp.org/www-community/Injection_Flaws>`_.
+
+Command Injection
+-----------------
+
+APIs such as ``subprocess.run`` accept a string which can be run as a
+shell command:
+
+::
+
+    subprocess.run(f"echo 'Hello {name}'", shell=True)
+
+If attacker controlled data is included in the command string, a
+command injection vulnerability exists and malicious operations can be
+run. For example, a value of ``' && rm -rf / #`` would result in the
+following destructive command being run:
+
+::
+
+    echo 'Hello ' && rm -rf / #'
+
+This vulnerability could be prevented by updating ``run`` to only
+accept ``Literal[str]`` when used in ``shell=True`` mode. Here is one
+simplified stub:
+
+::
+
+    def run(command: Literal[str], *args: str, shell: bool=...): ...
+
+Cross Site Scripting (XSS)
+--------------------------
+
+Most popular Python web frameworks, such as Django, use a templating
+engine to produce HTML from user data. These templating languages
+auto-escape user data before inserting it into the HTML template and
+thus prevent cross site scripting (XSS) vulnerabilities.
+
+But a common way to `bypass auto-escaping
+<https://django.readthedocs.io/en/stable/ref/templates/language.html#how-to-turn-it-off>`_
+and render HTML as-is is to use functions like ``mark_safe`` in
+`Django
+<https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe>`_
+or ``do_mark_safe`` in `Jinja2
+<https://github.com/pallets/jinja/blob/main/src/jinja2/filters.py#L1264>`_,
+which cause XSS vulnerabilities:
+
+::
+
+    dangerous_string = django.utils.safestring.mark_safe(f"<script>{user_input}</script>")
+    return(dangerous_string)
+
+This vulnerability could be prevented by updating ``mark_safe`` to
+only accept ``Literal[str]``:
+
+::
+
+    def mark_safe(s: Literal[str]) -> str: ...
+
+Server Side Template Injection (SSTI)
+-------------------------------------
+
+Templating frameworks such as Jinja allow Python expressions which
+will be evaluated and substituted into the rendered result:
+
+::
+
+    template_str = "There are {{ len(values) }} values: {{ values }}"
+    template = jinja2.Template(template_str)
+    template.render(values=[1, 2])
+    # Result: "There are 2 values: [1, 2]"
+
+If an attacker controls all or part of the template string, they can
+insert expressions which execute arbitrary code and `compromise
+<https://www.onsecurity.io/blog/server-side-template-injection-with-jinja2/>`_
+the application:
+
+::
+
+    malicious_str = "{{''.__class__.__base__.__subclasses__()[408]('rm - rf /',shell=True)}}"
+    template = jinja2.Template(malicious_str)
+    template.render()
+    # Result: The shell command 'rm - rf /' is run
+
+Template injection exploits like this could be prevented by updating
+the ``Template`` API to only accept ``Literal[str]``:
+
+::
+
+    class Template:
+        def __init__(self, source: Literal[str]): ...
+
+
+Appendix B: Limitations
+=======================
+
+There are a number of ways ``Literal[str]`` could still fail to
+prevent users from passing strings built from non-literal data to an
+API:
+
+1. If the developer does not use a type checker or does not add type
+annotations, then violations will go uncaught.
+
+2. ``cast(Literal[str], non_literal_str)`` could be used to lie to the
+type checker and allow a dynamic string value to masquerade as a
+``Literal[str]``. The same goes for a variable that has type ``Any``.
+
+3. Comments such as ``# type: ignore`` could be used to ignore
+warnings about non-literal strings.
+
+4. Trivial functions could be constructed to convert a ``str`` to a
+``Literal[str]``:
+
+::
+
+    def make_literal(s: str) -> Literal[str]:
+        letters: Dict[str, Literal[str]] = {
+            "A": "A",
+            "B": "B",
+            ...
+        }
+        output: List[Literal[str]] = [letters[c] for c in s]
+        return "".join(output)
+
+
+We could mitigate the above using linting, code review, etc., but
+ultimately a clever, malicious developer attempting to circumvent the
+protections offered by ``Literal[str]`` will always succeed. The
+important thing to remember is that ``Literal[str]`` is not intended
+to protect against *malicious* developers; it is meant to protect
+against benign developers accidentally using sensitive APIs in a
+dangerous way (without getting in their way otherwise).
+
+Without ``Literal[str]``, the best enforcement tool API authors have
+is documentation, which is easily ignored and often not seen. With
+``Literal[str]``, API misuse requires conscious thought and artifacts
+in the code that reviewers and future developers can notice.
+
+Resources
+=========
+
+Literal String Types in Scala
+-----------------------------
+
+Scala `uses
+<https://www.scala-lang.org/api/2.13.x/scala/Singleton.html>`_
+``Singleton`` as the supertype for singleton types, which includes
+literal string types such as ``"foo"``. ``Singleton`` is Scala's
+generalized analogue of this PEP's ``Literal[str]``.
+
+Tamer Abdulradi showed how Scala's literal string types can be used
+for "Preventing SQL injection at compile time", Scala Days talk
+`Literal types: What are they good for?
+<https://slideslive.com/38907881/literal-types-what-they-are-good-for>`_
+(slides 52 to 68).
+
+Thanks
+------
+
+Thanks to the following people for their feedback on the PEP:
+
+Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, and Shengye Wan
+
+Copyright
+=========
+
+This document is placed in the public domain or under the
+CC0-1.0-Universal license, whichever is more permissive.
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: