956 lines
32 KiB
ReStructuredText
956 lines
32 KiB
ReStructuredText
PEP: 675
|
|
Title: Arbitrary Literal Strings
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Pradeep Kumar Srinivasan <gohanpra@gmail.com>, Graham Bleaney <gbleaney@gmail.com>
|
|
Sponsor: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
|
Discussions-To: Typing-Sig <typing-sig@python.org>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-Nov-2021
|
|
Python-Version: 3.11
|
|
Post-History:
|
|
|
|
Abstract
|
|
========
|
|
|
|
There is currently no way to specify that a function parameter can be
|
|
of any literal string type; we have to specify the precise literal
|
|
string, such as ``Literal["foo"]``. This PEP introduces a supertype of
|
|
literal string types: ``Literal[str]``. This allows a function to
|
|
accept arbitrary literal string types such as ``Literal["foo"]`` or
|
|
``Literal["bar"]``.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Powerful APIs that execute SQL or shell commands often recommend that
|
|
they be invoked with literal strings, rather than arbitrary user
|
|
controlled strings. There is no way to express this recommendation in
|
|
the type system, however, meaning security vulnerabilities sometimes
|
|
occur when developers fail to follow it. For example, a naive way to
|
|
look up a user record from a database is to accept a user id and
|
|
insert it into a predefined SQL query:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = f"SELECT * FROM data WHERE user_id = {user_id}"
|
|
conn.execute(query)
|
|
|
|
query_user(conn, "user123") # OK.
|
|
|
|
However, the user-controlled data ``user_id`` is being mixed with the
|
|
SQL command string, which means a malicious user could run arbitrary
|
|
SQL commands:
|
|
|
|
::
|
|
|
|
# Delete the table.
|
|
query_user(conn, "user123; DROP TABLE data;")
|
|
|
|
# Fetch all users (since 1 = 1 is always true).
|
|
query_user(conn, "user123 OR 1 = 1")
|
|
|
|
|
|
To prevent such SQL injection attacks, SQL APIs offer parameterized
|
|
queries, which separate the executed query from user-controlled data
|
|
and make it impossible to run arbitrary queries. For example, with
|
|
`sqlite3 <https://docs.python.org/3/library/sqlite3.html>`_, our
|
|
original function would be written safely as a query with parameters:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = "SELECT * FROM data WHERE user_id = ?"
|
|
conn.execute(query, (user_id,))
|
|
|
|
|
|
The problem is that there is no way to enforce this
|
|
discipline. sqlite3's own `documentation
|
|
<https://docs.python.org/3/library/sqlite3.html>`_ can only admonish
|
|
the reader to not dynamically build the ``sql`` argument from external
|
|
input; the API's authors cannot express that through the type
|
|
system. Users can (and often do) still use a convenient f-string as
|
|
before and leave their code vulnerable to SQL injection.
|
|
|
|
Existing tools, such as the popular security linter `Bandit
|
|
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_,
|
|
attempt to detect unsafe external data used in SQL APIs, by inspecting
|
|
the AST or by other semantic pattern-matching. These tools, however,
|
|
preclude common idioms like storing a large multi-line query in a
|
|
variable before executing it, adding literal string modifiers to the
|
|
query based on some conditions, or transforming the query string using
|
|
a function. (We survey existing tools in the "Rejected Alternatives"
|
|
section.) For example, many tools will detect a false positive issue
|
|
in this benign snippet:
|
|
|
|
|
|
::
|
|
|
|
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
|
|
query = """
|
|
SELECT
|
|
user.name,
|
|
user.age
|
|
FROM data
|
|
WHERE user_id = ?
|
|
"""
|
|
if limit:
|
|
query += " LIMIT 1"
|
|
|
|
conn.execute(query, (user_id,))
|
|
|
|
We want to forbid harmful execution of user-controlled data while
|
|
still allowing benign idioms like the above and not requiring extra
|
|
user work.
|
|
|
|
To meet this goal, we introduce the ``Literal[str]`` type, which only
|
|
accepts string values that are known to be made of literals. This is a
|
|
generalization of the ``Literal["foo"]`` type from :pep:`586`.
|
|
A string of type
|
|
``Literal[str]`` cannot contain user-controlled data. Thus, any API
|
|
that only accepts ``Literal[str]`` will be immune to injection
|
|
vulnerabilities (with pragmatic `limitations <Appendix B:
|
|
Limitations_>`_).
|
|
|
|
Since we want the ``sqlite3`` ``execute`` method to disallow strings
|
|
built with user input, we would make its `typeshed stub
|
|
<https://github.com/python/typeshed/blob/1c88ceeee924ec6cfe05dd4865776b49fec299e6/stdlib/sqlite3/dbapi2.pyi#L153>`_
|
|
accept a ``sql`` query that is of type ``Literal[str]``:
|
|
|
|
::
|
|
|
|
def execute(self, sql: Literal[str], parameters: Iterable[str] = ...) -> Cursor: ...
|
|
|
|
|
|
This successfully forbids our unsafe SQL example. The variable
|
|
``query`` below is inferred to have type ``str``, since it is created
|
|
from a format string using ``user_id``, and cannot be passed to
|
|
``execute``:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = f"SELECT * FROM data WHERE user_id = {user_id}"
|
|
conn.execute(query)
|
|
# Error: Expected Literal[str], got str.
|
|
|
|
The method remains flexible enough to allow our more complicated
|
|
example:
|
|
|
|
::
|
|
|
|
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
|
|
# This is a literal string.
|
|
query = """
|
|
SELECT
|
|
user.name,
|
|
user.age
|
|
FROM data
|
|
WHERE user_id = ?
|
|
"""
|
|
|
|
if limit:
|
|
# Still has type Literal[str] because we added a literal string.
|
|
query += " LIMIT 1"
|
|
|
|
conn.execute(query, (user_id,)) # OK
|
|
|
|
Notice that the user did not have to change their SQL code at all. The
|
|
type checker was able to infer the literal string type and complain
|
|
only in case of violations.
|
|
|
|
``Literal[str]`` is also useful in other cases where we want strict
|
|
command-data separation, such as when building shell commands or when
|
|
rendering a string into an HTML response without escaping (see
|
|
`Appendix A: Other Uses`_). Overall, this combination of strictness
|
|
and flexibility makes it easy to enforce safer API usage in sensitive
|
|
code without burdening users.
|
|
|
|
Usage statistics
|
|
----------------
|
|
|
|
In a sample of open-source projects using ``sqlite3``, we found that
|
|
``conn.execute`` was called `~67% of the time
|
|
<https://grep.app/search?q=conn%5C.execute%5C%28%5Cs%2A%5B%27%22%5D®exp=true&filter[lang][0]=Python>`_
|
|
with a safe string literal and `~33% of the time
|
|
<https://grep.app/search?current=3&q=conn%5C.execute%5C%28%5Ba-zA-Z_%5D%2B%5C%29®exp=true&filter[lang][0]=Python>`_
|
|
with a potentially unsafe, local string variable. Using this PEP's
|
|
literal string type along with a type checker would prevent the unsafe
|
|
portion of that 33% of cases (ie. the ones where user controlled data
|
|
is incorporated into the query), while seamlessly allowing the safe
|
|
ones to remain.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Firstly, why use *types* to prevent security vulnerabilities?
|
|
|
|
Warning users in documentation is insufficient - most users either
|
|
never see these warnings or ignore them. Using an existing dynamic or
|
|
static analysis approach is too restrictive - these prevent natural
|
|
idioms, as we saw in the `Motivation`_ section (and will discuss more
|
|
extensively in the `Rejected Alternatives`_ section). The typing-based
|
|
approach in this PEP strikes a user-friendly balance between
|
|
strictness and flexibility.
|
|
|
|
Runtime approaches do not work because, at runtime, the query string
|
|
is a plain ``str``. While we could prevent some exploits using
|
|
heuristics, such as regex-filtering for obviously malicious payloads,
|
|
there will always be a way to work around them (perfectly
|
|
distinguishing good and bad queries reduces to the halting problem).
|
|
|
|
Static approaches like checking the AST to see if the query string is
|
|
a literal string expression cannot tell when a string is assigned to
|
|
an intermediate variable or when it is transformed by a benign
|
|
function. This makes them overly restrictive.
|
|
|
|
The type checker, surprisingly, does better than both because it has
|
|
access to information not available in the runtime or static analysis
|
|
approaches. Specifically, the type checker can tell us whether an
|
|
expression has a literal string type, say ``Literal["foo"]``. The type
|
|
checker already propagates types across variable assignments or
|
|
function calls.
|
|
|
|
In the current type system itself, if the SQL or shell command
|
|
execution function only accepted three possible input strings, our job
|
|
would be done. We would just say:
|
|
|
|
::
|
|
|
|
def execute(query: Literal["foo", "bar", "baz"]) -> None: ...
|
|
|
|
But, of course, ``execute`` can accept *any* possible query. How do we
|
|
ensure that the query does not contain an arbitrary, user-controlled
|
|
string?
|
|
|
|
We want to specify that the value must be of some type
|
|
``Literal[<...>]`` where ``<...>`` is some string. This is what
|
|
``Literal[str]`` represents. ``Literal[str]`` is the "supertype" of
|
|
all literal string types. In effect, this PEP just introduces a type
|
|
in the type hierarchy between ``Literal["foo"]`` and ``str``. Any
|
|
particular literal string such as ``Literal["foo"]`` or
|
|
``Literal["bar"]`` is compatible with ``Literal[str]``, but not the
|
|
other way around. The "supertype" of ``Literal[str]`` itself is
|
|
``str``. So, ``Literal[str]`` is compatible with ``str``, but not the
|
|
other way around.
|
|
|
|
Note that a ``Union`` of literal types is naturally compatible with
|
|
``Literal[str]`` because each element of the ``Union`` is individually
|
|
compatible with ``Literal[str]``. So, ``Literal["foo", "bar"]`` is
|
|
compatible with ``Literal[str]``.
|
|
|
|
However, recall that we don't just want to represent exact literal
|
|
queries. We also want to support composition of two literal strings,
|
|
such as ``query + " LIMIT 1"``. This too is possible with the above
|
|
concept. If ``x`` and ``y`` are two values of type ``Literal[str]``,
|
|
then ``x + y`` will also be of type compatible with
|
|
``Literal[str]``. We can reason about this by looking at specific
|
|
instances such as ``Literal["foo"]`` and ``Literal["bar"]``; the value
|
|
of the added string ``x + y`` can only be ``"foobar"``, which has type
|
|
``Literal["foobar"]`` and is thus compatible with
|
|
``Literal[str]``. The same reasoning applies when ``x`` and ``y`` are
|
|
unions of literal types; the result of pairwise adding any two literal
|
|
types from ``x`` and ``y`` respectively is a literal type, which means
|
|
that the overall result is a ``Union`` of literal types and is thus
|
|
compatible with ``Literal[str]``.
|
|
|
|
In this way, we are able to leverage Python's concept of a ``Literal``
|
|
string type to specify that our API can only accept strings that are
|
|
known to be constructed from literals. More specific details follow in
|
|
the remaining sections.
|
|
|
|
Valid Locations for ``Literal[str]``
|
|
=========================================
|
|
|
|
``Literal[str]`` can be used where any other type can be used:
|
|
|
|
::
|
|
|
|
variable_annotation: Literal[str]
|
|
|
|
def my_function(literal_string: Literal[str]) -> Literal[str]: ...
|
|
|
|
class Foo:
|
|
my_attribute: Literal[str]
|
|
|
|
type_argument: List[Literal[str]]
|
|
|
|
T = TypeVar("T", bound=Literal[str])
|
|
|
|
It can be nested within unions of ``Literal`` types:
|
|
|
|
::
|
|
|
|
union: Literal["hello", Literal[str]]
|
|
union2: Literal["hello", str]
|
|
union3: Literal[str, 4]
|
|
|
|
nested_literal_string: Literal[Literal[str]]
|
|
|
|
|
|
The restrictions on the parameters of ``Literal`` are the same as in
|
|
:pep:`586`. The only legal
|
|
parameter is the literal value ``str``. Other values are rejected even
|
|
if they evaluate to the same value (``str``), such as
|
|
``Literal[(lambda x: x)(str)]``.
|
|
|
|
Type Inference
|
|
==============
|
|
|
|
|
|
Inferring ``Literal[str]``
|
|
--------------------------
|
|
|
|
Any literal string type is compatible with ``Literal[str]``. For
|
|
example, ``x: Literal[str] = "foo"`` is valid because ``"foo"`` is
|
|
inferred to be of type ``Literal["foo"]``.
|
|
|
|
As per the `Rationale`_, we also infer ``Literal[str]`` in the
|
|
following cases:
|
|
|
|
+ Addition: ``x + y`` is of type ``Literal[str]`` if both ``x`` and
|
|
``y`` are compatible with ``Literal[str]``.
|
|
|
|
+ Joining: ``sep.join(xs)`` is of type ``Literal[str]`` if ``sep``'s
|
|
type is compatible with ``Literal[str]`` and ``xs``'s type is
|
|
compatible with ``Iterable[Literal[str]]``.
|
|
|
|
+ In-place addition: If ``s`` has type ``Literal[str]`` and ``x`` has
|
|
type compatible with ``Literal[str]``, then ``s += x`` preserves
|
|
``s``'s type as ``Literal[str]``.
|
|
|
|
+ String formatting: An f-string has type ``Literal[str]`` if and only
|
|
if its constituent expressions are literal strings. ``s.format(...)``
|
|
has type ``Literal[str]`` if and only if ``s`` and the arguments have
|
|
types compatible with ``Literal[str]``.
|
|
|
|
In all other cases, if one or more of the composed values has a
|
|
non-literal type ``str``, the composition of types will have type
|
|
``str``. For example, if ``s`` has type ``str``, then ``"hello" + s``
|
|
has type ``str``. This matches the pre-existing behavior of type
|
|
checkers.
|
|
|
|
``Literal[str]`` is compatible with the type ``str``. It inherits all
|
|
methods from ``str``. So, if we have a variable ``s`` of type
|
|
``Literal[str]``, it is safe to write ``s.startswith("hello")``.
|
|
|
|
Note that, beyond the few composition rules mentioned above, this PEP
|
|
doesn't change inference for other ``str`` methods such as
|
|
``literal_string.upper()``.
|
|
|
|
Some type checkers refine the type of a string when doing an equality
|
|
check:
|
|
|
|
::
|
|
|
|
def foo(s: str) -> None:
|
|
if s == "bar":
|
|
reveal_type(s) # => Literal["bar"]
|
|
|
|
Such a refined type in the if-block is also compatible with
|
|
``Literal[str]`` because its type is ``Literal["bar"]``.
|
|
|
|
|
|
Examples
|
|
--------
|
|
|
|
See the examples below to help clarify the above rules:
|
|
|
|
::
|
|
|
|
|
|
literal_string: Literal[str]
|
|
s: str = literal_string # OK
|
|
|
|
literal_string: Literal[str] = s # Error: Expected Literal[str], got str.
|
|
literal_string: Literal[str] = "hello" # OK
|
|
|
|
|
|
def expect_literal_str(s: Literal[str]) -> None: ...
|
|
|
|
Addition of literal strings:
|
|
|
|
::
|
|
|
|
expect_literal_str("foo" + "bar") # OK
|
|
expect_literal_str(literal_string + "bar") # OK
|
|
literal_string2: Literal[str]
|
|
expect_literal_str(literal_string + literal_string2) # OK
|
|
plain_str: str
|
|
expect_literal_str(literal_string + plain_str) # Not OK.
|
|
|
|
Join using literal strings:
|
|
|
|
::
|
|
|
|
expect_literal_str(",".join(["foo", "bar"])) # OK
|
|
expect_literal_str(literal_string.join(["foo", "bar"])) # OK
|
|
expect_literal_str(literal_string.join([literal_string, literal_string2])) # OK
|
|
xs: List[Literal[str]]
|
|
expect_literal_str(literal_string.join(xs)) # OK
|
|
expect_literal_str(plain_str.join([literal_string, literal_string2]))
|
|
# Not OK because the separator has type ``str``.
|
|
|
|
In-place addition using literal strings:
|
|
|
|
::
|
|
|
|
literal_string += "foo" # OK
|
|
literal_string += literal_string2 # OK
|
|
literal_string += plain_str # Not OK
|
|
|
|
Format strings using literal strings:
|
|
|
|
::
|
|
|
|
literal_name: Literal[str]
|
|
expect_literal_str(f"hello {literal_name}")
|
|
# OK because it is composed from literal strings.
|
|
|
|
expect_literal_str("hello {}".format(literal_name)) # OK
|
|
|
|
expect_literal_str(f"hello") # OK
|
|
|
|
expect_literal_str(f"hello {username}")
|
|
# NOT OK. The format-string is constructed from ``username``,
|
|
# which has type ``str``.
|
|
|
|
expect_literal_str("hello {}".format(username)) # Not OK
|
|
|
|
Other literal types, such as literal integers, are not compatible with ``Literal[str]``:
|
|
|
|
::
|
|
|
|
some_int: int
|
|
expect_literal_str(some_int) # Error: Expected Literal[str], got int.
|
|
|
|
literal_one: Literal[1] = 1
|
|
expect_literal_str(literal_one) # Error: Expected Literal[str], got Literal[1].
|
|
|
|
|
|
We can call functions on literal strings:
|
|
|
|
::
|
|
|
|
def add_limit(query: Literal[str]) -> Literal[str]:
|
|
return query + " LIMIT = 1"
|
|
|
|
def my_query(query: Literal[str], user_id: str) -> None:
|
|
sql_connection().execute(add_limit(query), (user_id,)) # OK
|
|
|
|
Conditional statements and expressions work as expected:
|
|
|
|
::
|
|
|
|
def return_literal_str() -> Literal[str]:
|
|
return "foo" if condition1() else "bar" # OK
|
|
|
|
def return_literal_str2(literal_str: Literal[str]) -> Literal[str]:
|
|
return "foo" if condition1() else literal_str # OK
|
|
|
|
def return_literal_str3() -> Literal[str]:
|
|
if condition1():
|
|
result: Literal["foo"] = "foo"
|
|
else:
|
|
result: Literal[str] = "bar"
|
|
|
|
return result # OK
|
|
|
|
|
|
Interaction with TypeVars and Generics
|
|
--------------------------------------
|
|
|
|
TypeVars can be bound to ``Literal[str]``:
|
|
|
|
::
|
|
|
|
from typing import Literal, TypeVar
|
|
|
|
TLiteral = TypeVar("TLiteral", bound=Literal[str])
|
|
|
|
def literal_identity(s: TLiteral) -> TLiteral:
|
|
return s
|
|
|
|
hello: Literal["hello"] = "hello"
|
|
y = literal_identity(hello)
|
|
reveal_type(y) # => Literal["hello"]
|
|
|
|
s: Literal[str]
|
|
y2 = literal_identity(s)
|
|
reveal_type(y2) # => Literal[str]
|
|
|
|
s_error: str
|
|
literal_identity(s_error)
|
|
# Error: Expected TLiteral (bound to Literal[str]), got str.
|
|
|
|
|
|
``Literal[str]`` can be used as type arguments for generic classes:
|
|
|
|
::
|
|
|
|
class Container(Generic[T]):
|
|
def __init__(self, value: T) -> None:
|
|
self.value = value
|
|
|
|
literal_str: Literal[str] = "hello"
|
|
x: Container[Literal[str]] = Container(literal_str) # OK
|
|
|
|
s: str
|
|
x_error: Container[Literal[str]] = Container(s) # Not OK
|
|
|
|
Standard containers like ``List`` work as expected:
|
|
|
|
::
|
|
|
|
xs: List[Literal[str]] = ["foo", "bar", "baz"]
|
|
|
|
Interactions with Overloads
|
|
---------------------------
|
|
|
|
Literal strings and overloads do not need to interact in a special
|
|
way: the existing rules work fine. ``Literal[str]`` can be used as a
|
|
fallback overload where a specific ``Literal["foo"]`` type does not
|
|
match:
|
|
|
|
::
|
|
|
|
@overload
|
|
def foo(x: Literal["foo"]) -> int: ...
|
|
@overload
|
|
def foo(x: Literal[str]) -> bool: ...
|
|
@overload
|
|
def foo(x: str) -> str: ...
|
|
|
|
x1: int = foo("foo") # First overload.
|
|
x2: bool = foo("bar") # Second overload.
|
|
s: str
|
|
x3: str = foo(s) # Third overload.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
``Literal[str]`` is acceptable at runtime, so
|
|
this doesn't require any changes to the Python runtime itself. :pep:`586`
|
|
already backports ``Literal``, so this PEP does not need to change it.
|
|
|
|
As :pep:`PEP 586 mentions
|
|
<586#backwards-compatibility>`,
|
|
type checkers "should feel free to experiment with more sophisticated
|
|
inference techniques". So, if the type checker infers a literal string
|
|
type for an unannotated variable that is initialized with a literal
|
|
string, the following example should be OK:
|
|
|
|
::
|
|
|
|
x = "hello"
|
|
expect_literal_str(x)
|
|
# OK, because x is inferred to have type ``Literal["hello"]``.
|
|
|
|
This enables precise type checking of idiomatic SQL query code without
|
|
annotating the code at all (as seen in the `Motivation`_ section
|
|
example).
|
|
|
|
However, like :pep:`586`, this PEP does not mandate the above inference
|
|
strategy. In case the type checker doesn't infer ``x`` to have type
|
|
``Literal["hello"]``, users can aid the type checker by explicitly
|
|
annotating it as ``x: Literal[str]``:
|
|
|
|
::
|
|
|
|
x: Literal[str] = "hello"
|
|
expect_literal_str(x)
|
|
|
|
|
|
Runtime Behavior
|
|
================
|
|
|
|
This PEP does not change the runtime behavior of ``Literal``.
|
|
|
|
|
|
Rejected Alternatives
|
|
=====================
|
|
|
|
Why not use tool X?
|
|
-------------------
|
|
|
|
Focusing solely on the example of preventing SQL injection, tooling to
|
|
catch this kind of issue seems to come in three flavors: AST based,
|
|
function level analysis, and taint flow analysis.
|
|
|
|
**AST based tools include Bandit**: `Bandit
|
|
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_
|
|
has a plugin to warn when SQL queries are not literal
|
|
strings. The problem is that many perfectly safe SQL
|
|
queries are dynamically built out of string literals, as shown in the
|
|
`Motivation`_ section. At the
|
|
AST level, the resultant SQL query is not going to appear as a string
|
|
literal anymore and is thus indistinguishable from a potentially
|
|
malicious string. To use these tools would require significantly
|
|
restricting developers' ability to build SQL queries. ``Literal[str]``
|
|
can provide similar safety guarantees with fewer restrictions.
|
|
|
|
**Semgrep and pyanalyze**: Semgrep supports a more sophisticated
|
|
function level analysis, including `constant propagation
|
|
<https://semgrep.dev/docs/writing-rules/data-flow/#constant-propagation>`_
|
|
within a function. This allows us to prevent injection attacks while
|
|
permitting some forms of safe dynamic SQL queries within a
|
|
function. `pyanalyze
|
|
<https://github.com/quora/pyanalyze/blob/afcb58cd3e967e4e3fea9e57bb18b6b1d9d42ed7/README.md#extending-pyanalyze>`_
|
|
has a similar extension. But neither handles function calls that
|
|
construct and return safe SQL queries. For example, in the code sample
|
|
below, ``build_insert_query`` is a helper function to create a query
|
|
that inserts multiple values into the corresponding columns. Semgrep
|
|
and pyanalyze forbid this natural usage whereas ``Literal[str]``
|
|
handles it with no burden on the programmer:
|
|
|
|
::
|
|
|
|
def build_insert_query(
|
|
table: Literal[str]
|
|
insert_columns: Iterable[Literal[str]],
|
|
) -> Literal[str]:
|
|
sql = "INSERT INTO " + table
|
|
|
|
column_clause = ", ".join(insert_columns)
|
|
value_clause = ", ".join(["?"] * len(insert_columns))
|
|
|
|
sql += f" ({column_clause}) VALUES ({value_clause})"
|
|
return sql
|
|
|
|
def insert_data(
|
|
conn: Connection,
|
|
kvs_to_insert: Dict[Literal[str], str]
|
|
) -> None:
|
|
query = build_insert_query("data", kvs_to_insert.keys())
|
|
conn.execute(query, kvs_to_insert.values())
|
|
|
|
# Example usage
|
|
data_to_insert = {
|
|
"column_1": value_1, # Note: values are not literals
|
|
"column_2": value_2,
|
|
"column_3": value_3,
|
|
}
|
|
insert_data(conn, data_to_insert)
|
|
|
|
|
|
**Taint flow analysis**: Tools such as `Pysa
|
|
<https://pyre-check.org/docs/pysa-basics/>`_ or `CodeQL
|
|
<https://codeql.github.com/>`_ are capable of tracking data flowing
|
|
from a user controlled input into a SQL query. These tools are
|
|
powerful but involve considerable overhead in setting up the tool in
|
|
CI, defining "taint" sinks and sources, and teaching developers how to
|
|
use them. They also usually take longer to run than a type checker
|
|
(minutes instead of seconds), which means feedback is not
|
|
immediate. Finally, they move the burden of preventing vulnerabilities
|
|
on to library users instead of allowing the libraries themselves to
|
|
specify precisely how their APIs must be called (as is possible with
|
|
``Literal[str]``).
|
|
|
|
|
|
Why not use a ``NewType`` for ``str``?
|
|
--------------------------------------
|
|
|
|
Any API for which ``Literal[str]`` would be suitable could instead be
|
|
updated to accept a different type created within the Python type
|
|
system, such as ``NewType("SafeSQL", str)``:
|
|
|
|
::
|
|
|
|
SafeSQL = NewType("SafeSQL", str)
|
|
|
|
|
|
def execute(self, sql: SafeSQL, parameters: Iterable[str] = ...) -> Cursor: ...
|
|
|
|
execute(SafeSQL("SELECT * FROM data WHERE user_id = ?"), user_id) # OK
|
|
|
|
user_query: str
|
|
execute(user_query) # Error: Expected SafeSQL, got str.
|
|
|
|
|
|
Having to create a new type to call an API might give some developers
|
|
pause and encourage more caution, but it doesn't guarantee that
|
|
developers won't just turn a user controlled string into the new type,
|
|
and pass it into the modified API anyway:
|
|
|
|
::
|
|
|
|
query = f"SELECT * FROM data WHERE user_id = f{user_id}"
|
|
execute(SafeSQL(query)) # No error!
|
|
|
|
We are back to square one with the problem of preventing arbitrary
|
|
inputs to ``SafeSQL``. This is not a theoretical concern
|
|
either. Django uses the above approach with ``SafeString`` and
|
|
`mark_safe
|
|
<https://docs.djangoproject.com/en/dev/_modules/django/utils/safestring/#SafeString>`_. Issues
|
|
such as `CVE-2020-13596
|
|
<https://github.com/django/django/commit/2dd4d110c159d0c81dff42eaead2c378a0998735>`_
|
|
show how this technique can `fail
|
|
<https://nvd.nist.gov/vuln/detail/CVE-2020-13596>`_.
|
|
|
|
Also note that this requires invasive changes to the source code
|
|
(wrapping the query with ``SafeSQL``) whereas ``Literal[str]``
|
|
requires no such changes. Users can remain oblivious to it as long as
|
|
they pass in literal strings to sensitive APIs.
|
|
|
|
Why not try to emulate Trusted Types?
|
|
-------------------------------------
|
|
|
|
`Trusted Types
|
|
<https://w3c.github.io/webappsec-trusted-types/dist/spec/>`_ is a W3C
|
|
specification for preventing DOM-based Cross Site Scripting (XSS). XSS
|
|
occurs when dangerous browser APIs accept raw user-controlled
|
|
strings. The specification modifies these APIs to accept only the
|
|
"Trusted Types" returned by designated sanitizing functions. These
|
|
sanitizing functions must take in a potentially malicious string and
|
|
validate it or render it benign somehow, for example by verifying that
|
|
it is a valid URL or HTML-encoding it.
|
|
|
|
It can be tempting to assume porting the concept of Trusted Types to
|
|
Python could solve the problem. The fundamental difference, however,
|
|
is that the output of a Trusted Types sanitizer is usually intended
|
|
*to not be executable code*. Thus it's easy to HTML encode the input,
|
|
strip out dangerous tags, or otherwise render it inert. With a SQL
|
|
query or shell command, the end result *still needs to be executable
|
|
code*. There is no way to write a sanitizer that can reliably figure
|
|
out which parts of an input string are benign and which ones are
|
|
potentially malicious.
|
|
|
|
Runtime Checkable ``Literal[str]``
|
|
----------------------------------
|
|
|
|
The ``Literal[str]`` concept could be extended beyond static type
|
|
checking to be a runtime checkable property of ``str`` objects. This
|
|
would provide some benefits, such as allowing frameworks to raise
|
|
errors on dynamic strings. Such runtime errors would be a more robust
|
|
defense mechanism than type errors, which can potentially be
|
|
suppressed, ignored, or never even seen if the author does not use a
|
|
type checker.
|
|
|
|
This extension to the ``Literal[str]`` concept would dramatically
|
|
increase the scope of the proposal by requiring changes to one of the
|
|
most fundamental types in Python. While runtime taint checking on
|
|
strings has been `considered <https://bugs.python.org/issue500698>`_
|
|
and `attempted <https://github.com/felixgr/pytaint>`_ in the past, and
|
|
others may consider it in the future, such extensions are out of scope
|
|
for this PEP.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
This is implemented in Pyre v0.9.8 and is actively being used.
|
|
|
|
The implementation simply extends the type checker with
|
|
``Literal[str]`` as a supertype of literal string types.
|
|
|
|
To support composition via addition, join, etc., it was sufficient to
|
|
overload the stubs for ``str`` in Pyre's copy of typeshed. For
|
|
example, we replaced ``str`` ``__add__``:
|
|
|
|
::
|
|
|
|
# Before:
|
|
def __add__(self, s: str) -> str: ...
|
|
|
|
# After:
|
|
@overload
|
|
def __add__(self: Literal[str], other: Literal[str]) -> Literal[str]: ...
|
|
@overload
|
|
def __add__(self, other: str) -> str: ...
|
|
|
|
This means that addition of non-literal string types remains to have
|
|
type ``str``. The only change is that addition of literal string types
|
|
now produces ``Literal[str]``.
|
|
|
|
One implementation strategy is to update the official Typeshed `stub
|
|
<https://github.com/python/typeshed/blob/aa7e277adb9049e24ea3434fc9848defbfa87673/stdlib/builtins.pyi#L420>`_
|
|
for ``str`` with these changes.
|
|
|
|
Appendix A: Other Uses
|
|
======================
|
|
|
|
To simplify the discussion and require minimal security knowledge, we
|
|
focused on SQL injections throughout the PEP. ``Literal[str]``,
|
|
however, can also be used to prevent many other kinds of `injection
|
|
vulnerabilities <https://owasp.org/www-community/Injection_Flaws>`_.
|
|
|
|
Command Injection
|
|
-----------------
|
|
|
|
APIs such as ``subprocess.run`` accept a string which can be run as a
|
|
shell command:
|
|
|
|
::
|
|
|
|
subprocess.run(f"echo 'Hello {name}'", shell=True)
|
|
|
|
If attacker controlled data is included in the command string, a
|
|
command injection vulnerability exists and malicious operations can be
|
|
run. For example, a value of ``' && rm -rf / #`` would result in the
|
|
following destructive command being run:
|
|
|
|
::
|
|
|
|
echo 'Hello ' && rm -rf / #'
|
|
|
|
This vulnerability could be prevented by updating ``run`` to only
|
|
accept ``Literal[str]`` when used in ``shell=True`` mode. Here is one
|
|
simplified stub:
|
|
|
|
::
|
|
|
|
def run(command: Literal[str], *args: str, shell: bool=...): ...
|
|
|
|
Cross Site Scripting (XSS)
|
|
--------------------------
|
|
|
|
Most popular Python web frameworks, such as Django, use a templating
|
|
engine to produce HTML from user data. These templating languages
|
|
auto-escape user data before inserting it into the HTML template and
|
|
thus prevent cross site scripting (XSS) vulnerabilities.
|
|
|
|
But a common way to `bypass auto-escaping
|
|
<https://django.readthedocs.io/en/stable/ref/templates/language.html#how-to-turn-it-off>`_
|
|
and render HTML as-is is to use functions like ``mark_safe`` in
|
|
`Django
|
|
<https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe>`_
|
|
or ``do_mark_safe`` in `Jinja2
|
|
<https://github.com/pallets/jinja/blob/main/src/jinja2/filters.py#L1264>`_,
|
|
which cause XSS vulnerabilities:
|
|
|
|
::
|
|
|
|
dangerous_string = django.utils.safestring.mark_safe(f"<script>{user_input}</script>")
|
|
return(dangerous_string)
|
|
|
|
This vulnerability could be prevented by updating ``mark_safe`` to
|
|
only accept ``Literal[str]``:
|
|
|
|
::
|
|
|
|
def mark_safe(s: Literal[str]) -> str: ...
|
|
|
|
Server Side Template Injection (SSTI)
|
|
-------------------------------------
|
|
|
|
Templating frameworks such as Jinja allow Python expressions which
|
|
will be evaluated and substituted into the rendered result:
|
|
|
|
::
|
|
|
|
template_str = "There are {{ len(values) }} values: {{ values }}"
|
|
template = jinja2.Template(template_str)
|
|
template.render(values=[1, 2])
|
|
# Result: "There are 2 values: [1, 2]"
|
|
|
|
If an attacker controls all or part of the template string, they can
|
|
insert expressions which execute arbitrary code and `compromise
|
|
<https://www.onsecurity.io/blog/server-side-template-injection-with-jinja2/>`_
|
|
the application:
|
|
|
|
::
|
|
|
|
malicious_str = "{{''.__class__.__base__.__subclasses__()[408]('rm - rf /',shell=True)}}"
|
|
template = jinja2.Template(malicious_str)
|
|
template.render()
|
|
# Result: The shell command 'rm - rf /' is run
|
|
|
|
Template injection exploits like this could be prevented by updating
|
|
the ``Template`` API to only accept ``Literal[str]``:
|
|
|
|
::
|
|
|
|
class Template:
|
|
def __init__(self, source: Literal[str]): ...
|
|
|
|
|
|
Appendix B: Limitations
|
|
=======================
|
|
|
|
There are a number of ways ``Literal[str]`` could still fail to
|
|
prevent users from passing strings built from non-literal data to an
|
|
API:
|
|
|
|
1. If the developer does not use a type checker or does not add type
|
|
annotations, then violations will go uncaught.
|
|
|
|
2. ``cast(Literal[str], non_literal_str)`` could be used to lie to the
|
|
type checker and allow a dynamic string value to masquerade as a
|
|
``Literal[str]``. The same goes for a variable that has type ``Any``.
|
|
|
|
3. Comments such as ``# type: ignore`` could be used to ignore
|
|
warnings about non-literal strings.
|
|
|
|
4. Trivial functions could be constructed to convert a ``str`` to a
|
|
``Literal[str]``:
|
|
|
|
::
|
|
|
|
def make_literal(s: str) -> Literal[str]:
|
|
letters: Dict[str, Literal[str]] = {
|
|
"A": "A",
|
|
"B": "B",
|
|
...
|
|
}
|
|
output: List[Literal[str]] = [letters[c] for c in s]
|
|
return "".join(output)
|
|
|
|
|
|
We could mitigate the above using linting, code review, etc., but
|
|
ultimately a clever, malicious developer attempting to circumvent the
|
|
protections offered by ``Literal[str]`` will always succeed. The
|
|
important thing to remember is that ``Literal[str]`` is not intended
|
|
to protect against *malicious* developers; it is meant to protect
|
|
against benign developers accidentally using sensitive APIs in a
|
|
dangerous way (without getting in their way otherwise).
|
|
|
|
Without ``Literal[str]``, the best enforcement tool API authors have
|
|
is documentation, which is easily ignored and often not seen. With
|
|
``Literal[str]``, API misuse requires conscious thought and artifacts
|
|
in the code that reviewers and future developers can notice.
|
|
|
|
Resources
|
|
=========
|
|
|
|
Literal String Types in Scala
|
|
-----------------------------
|
|
|
|
Scala `uses
|
|
<https://www.scala-lang.org/api/2.13.x/scala/Singleton.html>`_
|
|
``Singleton`` as the supertype for singleton types, which includes
|
|
literal string types such as ``"foo"``. ``Singleton`` is Scala's
|
|
generalized analogue of this PEP's ``Literal[str]``.
|
|
|
|
Tamer Abdulradi showed how Scala's literal string types can be used
|
|
for "Preventing SQL injection at compile time", Scala Days talk
|
|
`Literal types: What are they good for?
|
|
<https://slideslive.com/38907881/literal-types-what-they-are-good-for>`_
|
|
(slides 52 to 68).
|
|
|
|
Thanks
|
|
------
|
|
|
|
Thanks to the following people for their feedback on the PEP:
|
|
|
|
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, and Shengye Wan
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|