1320 lines
46 KiB
ReStructuredText
1320 lines
46 KiB
ReStructuredText
PEP: 675
|
|
Title: Arbitrary Literal String Type
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Pradeep Kumar Srinivasan <gohanpra@gmail.com>, Graham Bleaney <gbleaney@gmail.com>
|
|
Sponsor: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
|
Discussions-To: https://mail.python.org/archives/list/typing-sig@python.org/thread/VB74EHNM4RODDFM64NEEEBJQVAUAWIAW/
|
|
Status: Accepted
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-Nov-2021
|
|
Python-Version: 3.11
|
|
Post-History: 07-Feb-2022
|
|
Resolution: https://mail.python.org/archives/list/python-dev@python.org/message/XEOOSSPNYPGZ5NXOJFPLXG2BTN7EVRT5/
|
|
|
|
Abstract
|
|
========
|
|
|
|
There is currently no way to specify, using type annotations, that a
|
|
function parameter can be of any literal string type. We have to
|
|
specify a precise literal string type, such as
|
|
``Literal["foo"]``. This PEP introduces a supertype of literal string
|
|
types: ``LiteralString``. This allows a function to accept arbitrary
|
|
literal string types, such as ``Literal["foo"]`` or
|
|
``Literal["bar"]``.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Powerful APIs that execute SQL or shell commands often recommend that
|
|
they be invoked with literal strings, rather than arbitrary user
|
|
controlled strings. There is no way to express this recommendation in
|
|
the type system, however, meaning security vulnerabilities sometimes
|
|
occur when developers fail to follow it. For example, a naive way to
|
|
look up a user record from a database is to accept a user id and
|
|
insert it into a predefined SQL query:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = f"SELECT * FROM data WHERE user_id = {user_id}"
|
|
conn.execute(query)
|
|
|
|
query_user(conn, "user123") # OK.
|
|
|
|
However, the user-controlled data ``user_id`` is being mixed with the
|
|
SQL command string, which means a malicious user could run arbitrary
|
|
SQL commands:
|
|
|
|
::
|
|
|
|
# Delete the table.
|
|
query_user(conn, "user123; DROP TABLE data;")
|
|
|
|
# Fetch all users (since 1 = 1 is always true).
|
|
query_user(conn, "user123 OR 1 = 1")
|
|
|
|
|
|
To prevent such SQL injection attacks, SQL APIs offer parameterized
|
|
queries, which separate the executed query from user-controlled data
|
|
and make it impossible to run arbitrary queries. For example, with
|
|
`sqlite3 <https://docs.python.org/3/library/sqlite3.html>`_, our
|
|
original function would be written safely as a query with parameters:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = "SELECT * FROM data WHERE user_id = ?"
|
|
conn.execute(query, (user_id,))
|
|
|
|
|
|
The problem is that there is no way to enforce this
|
|
discipline. sqlite3's own `documentation
|
|
<https://docs.python.org/3/library/sqlite3.html>`_ can only admonish
|
|
the reader to not dynamically build the ``sql`` argument from external
|
|
input; the API's authors cannot express that through the type
|
|
system. Users can (and often do) still use a convenient f-string as
|
|
before and leave their code vulnerable to SQL injection.
|
|
|
|
Existing tools, such as the popular security linter `Bandit
|
|
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_,
|
|
attempt to detect unsafe external data used in SQL APIs, by inspecting
|
|
the AST or by other semantic pattern-matching. These tools, however,
|
|
preclude common idioms like storing a large multi-line query in a
|
|
variable before executing it, adding literal string modifiers to the
|
|
query based on some conditions, or transforming the query string using
|
|
a function. (We survey existing tools in the `Rejected Alternatives`_
|
|
section.) For example, many tools will detect a false positive issue
|
|
in this benign snippet:
|
|
|
|
|
|
::
|
|
|
|
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
|
|
query = """
|
|
SELECT
|
|
user.name,
|
|
user.age
|
|
FROM data
|
|
WHERE user_id = ?
|
|
"""
|
|
if limit:
|
|
query += " LIMIT 1"
|
|
|
|
conn.execute(query, (user_id,))
|
|
|
|
We want to forbid harmful execution of user-controlled data while
|
|
still allowing benign idioms like the above and not requiring extra
|
|
user work.
|
|
|
|
To meet this goal, we introduce the ``LiteralString`` type, which only
|
|
accepts string values that are known to be made of literals. This is a
|
|
generalization of the ``Literal["foo"]`` type from :pep:`586`.
|
|
A string of type
|
|
``LiteralString`` cannot contain user-controlled data. Thus, any API
|
|
that only accepts ``LiteralString`` will be immune to injection
|
|
vulnerabilities (with `pragmatic limitations <Appendix B:
|
|
Limitations_>`_).
|
|
|
|
Since we want the ``sqlite3`` ``execute`` method to disallow strings
|
|
built with user input, we would make its `typeshed stub
|
|
<https://github.com/python/typeshed/blob/1c88ceeee924ec6cfe05dd4865776b49fec299e6/stdlib/sqlite3/dbapi2.pyi#L153>`_
|
|
accept a ``sql`` query that is of type ``LiteralString``:
|
|
|
|
::
|
|
|
|
from typing import LiteralString
|
|
|
|
def execute(self, sql: LiteralString, parameters: Iterable[str] = ...) -> Cursor: ...
|
|
|
|
|
|
This successfully forbids our unsafe SQL example. The variable
|
|
``query`` below is inferred to have type ``str``, since it is created
|
|
from a format string using ``user_id``, and cannot be passed to
|
|
``execute``:
|
|
|
|
::
|
|
|
|
def query_user(conn: Connection, user_id: str) -> User:
|
|
query = f"SELECT * FROM data WHERE user_id = {user_id}"
|
|
conn.execute(query)
|
|
# Error: Expected LiteralString, got str.
|
|
|
|
The method remains flexible enough to allow our more complicated
|
|
example:
|
|
|
|
::
|
|
|
|
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
|
|
# This is a literal string.
|
|
query = """
|
|
SELECT
|
|
user.name,
|
|
user.age
|
|
FROM data
|
|
WHERE user_id = ?
|
|
"""
|
|
|
|
if limit:
|
|
# Still has type LiteralString because we added a literal string.
|
|
query += " LIMIT 1"
|
|
|
|
conn.execute(query, (user_id,)) # OK
|
|
|
|
Notice that the user did not have to change their SQL code at all. The
|
|
type checker was able to infer the literal string type and complain
|
|
only in case of violations.
|
|
|
|
``LiteralString`` is also useful in other cases where we want strict
|
|
command-data separation, such as when building shell commands or when
|
|
rendering a string into an HTML response without escaping (see
|
|
`Appendix A: Other Uses`_). Overall, this combination of strictness
|
|
and flexibility makes it easy to enforce safer API usage in sensitive
|
|
code without burdening users.
|
|
|
|
Usage statistics
|
|
----------------
|
|
|
|
In a sample of open-source projects using ``sqlite3``, we found that
|
|
``conn.execute`` was called `~67% of the time
|
|
<https://grep.app/search?q=conn%5C.execute%5C%28%5Cs%2A%5B%27%22%5D®exp=true&filter[lang][0]=Python>`_
|
|
with a safe string literal and `~33% of the time
|
|
<https://grep.app/search?current=3&q=conn%5C.execute%5C%28%5Ba-zA-Z_%5D%2B%5C%29®exp=true&filter[lang][0]=Python>`_
|
|
with a potentially unsafe, local string variable. Using this PEP's
|
|
literal string type along with a type checker would prevent the unsafe
|
|
portion of that 33% of cases (ie. the ones where user controlled data
|
|
is incorporated into the query), while seamlessly allowing the safe
|
|
ones to remain.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Firstly, why use *types* to prevent security vulnerabilities?
|
|
|
|
Warning users in documentation is insufficient - most users either
|
|
never see these warnings or ignore them. Using an existing dynamic or
|
|
static analysis approach is too restrictive - these prevent natural
|
|
idioms, as we saw in the `Motivation`_ section (and will discuss more
|
|
extensively in the `Rejected Alternatives`_ section). The typing-based
|
|
approach in this PEP strikes a user-friendly balance between
|
|
strictness and flexibility.
|
|
|
|
Runtime approaches do not work because, at runtime, the query string
|
|
is a plain ``str``. While we could prevent some exploits using
|
|
heuristics, such as regex-filtering for obviously malicious payloads,
|
|
there will always be a way to work around them (perfectly
|
|
distinguishing good and bad queries reduces to the halting problem).
|
|
|
|
Static approaches, such as checking the AST to see if the query string
|
|
is a literal string expression, cannot tell when a string is assigned
|
|
to an intermediate variable or when it is transformed by a benign
|
|
function. This makes them overly restrictive.
|
|
|
|
The type checker, surprisingly, does better than both because it has
|
|
access to information not available in the runtime or static analysis
|
|
approaches. Specifically, the type checker can tell us whether an
|
|
expression has a literal string type, say ``Literal["foo"]``. The type
|
|
checker already propagates types across variable assignments or
|
|
function calls.
|
|
|
|
In the current type system itself, if the SQL or shell command
|
|
execution function only accepted three possible input strings, our job
|
|
would be done. We would just say:
|
|
|
|
::
|
|
|
|
def execute(query: Literal["foo", "bar", "baz"]) -> None: ...
|
|
|
|
But, of course, ``execute`` can accept *any* possible query. How do we
|
|
ensure that the query does not contain an arbitrary, user-controlled
|
|
string?
|
|
|
|
We want to specify that the value must be of some type
|
|
``Literal[<...>]`` where ``<...>`` is some string. This is what
|
|
``LiteralString`` represents. ``LiteralString`` is the "supertype" of
|
|
all literal string types. In effect, this PEP just introduces a type
|
|
in the type hierarchy between ``Literal["foo"]`` and ``str``. Any
|
|
particular literal string, such as ``Literal["foo"]`` or
|
|
``Literal["bar"]``, is compatible with ``LiteralString``, but not the
|
|
other way around. The "supertype" of ``LiteralString`` itself is
|
|
``str``. So, ``LiteralString`` is compatible with ``str``, but not the
|
|
other way around.
|
|
|
|
Note that a ``Union`` of literal types is naturally compatible with
|
|
``LiteralString`` because each element of the ``Union`` is individually
|
|
compatible with ``LiteralString``. So, ``Literal["foo", "bar"]`` is
|
|
compatible with ``LiteralString``.
|
|
|
|
However, recall that we don't just want to represent exact literal
|
|
queries. We also want to support composition of two literal strings,
|
|
such as ``query + " LIMIT 1"``. This too is possible with the above
|
|
concept. If ``x`` and ``y`` are two values of type ``LiteralString``,
|
|
then ``x + y`` will also be of type compatible with
|
|
``LiteralString``. We can reason about this by looking at specific
|
|
instances such as ``Literal["foo"]`` and ``Literal["bar"]``; the value
|
|
of the added string ``x + y`` can only be ``"foobar"``, which has type
|
|
``Literal["foobar"]`` and is thus compatible with
|
|
``LiteralString``. The same reasoning applies when ``x`` and ``y`` are
|
|
unions of literal types; the result of pairwise adding any two literal
|
|
types from ``x`` and ``y`` respectively is a literal type, which means
|
|
that the overall result is a ``Union`` of literal types and is thus
|
|
compatible with ``LiteralString``.
|
|
|
|
In this way, we are able to leverage Python's concept of a ``Literal``
|
|
string type to specify that our API can only accept strings that are
|
|
known to be constructed from literals. More specific details follow in
|
|
the remaining sections.
|
|
|
|
Specification
|
|
=============
|
|
|
|
|
|
Runtime Behavior
|
|
----------------
|
|
|
|
We propose adding ``LiteralString`` to ``typing.py``, with an
|
|
implementation similar to ``typing.NoReturn``.
|
|
|
|
Note that ``LiteralString`` is a special form used solely for type
|
|
checking. There is no expression for which ``type(<expr>)`` will
|
|
produce ``LiteralString`` at runtime. So, we do not specify in the
|
|
implementation that it is a subclass of ``str``.
|
|
|
|
|
|
Valid Locations for ``LiteralString``
|
|
-----------------------------------------
|
|
|
|
``LiteralString`` can be used where any other type can be used:
|
|
|
|
::
|
|
|
|
variable_annotation: LiteralString
|
|
|
|
def my_function(literal_string: LiteralString) -> LiteralString: ...
|
|
|
|
class Foo:
|
|
my_attribute: LiteralString
|
|
|
|
type_argument: List[LiteralString]
|
|
|
|
T = TypeVar("T", bound=LiteralString)
|
|
|
|
It cannot be nested within unions of ``Literal`` types:
|
|
|
|
::
|
|
|
|
bad_union: Literal["hello", LiteralString] # Not OK
|
|
bad_nesting: Literal[LiteralString] # Not OK
|
|
|
|
|
|
Type Inference
|
|
--------------
|
|
|
|
.. _inferring_literal_string:
|
|
|
|
|
|
Inferring ``LiteralString``
|
|
'''''''''''''''''''''''''''
|
|
|
|
Any literal string type is compatible with ``LiteralString``. For
|
|
example, ``x: LiteralString = "foo"`` is valid because ``"foo"`` is
|
|
inferred to be of type ``Literal["foo"]``.
|
|
|
|
As per the `Rationale`_, we also infer ``LiteralString`` in the
|
|
following cases:
|
|
|
|
+ Addition: ``x + y`` is of type ``LiteralString`` if both ``x`` and
|
|
``y`` are compatible with ``LiteralString``.
|
|
|
|
+ Joining: ``sep.join(xs)`` is of type ``LiteralString`` if ``sep``'s
|
|
type is compatible with ``LiteralString`` and ``xs``'s type is
|
|
compatible with ``Iterable[LiteralString]``.
|
|
|
|
+ In-place addition: If ``s`` has type ``LiteralString`` and ``x`` has
|
|
type compatible with ``LiteralString``, then ``s += x`` preserves
|
|
``s``'s type as ``LiteralString``.
|
|
|
|
+ String formatting: An f-string has type ``LiteralString`` if and only
|
|
if its constituent expressions are literal strings. ``s.format(...)``
|
|
has type ``LiteralString`` if and only if ``s`` and the arguments have
|
|
types compatible with ``LiteralString``.
|
|
|
|
+ Literal-preserving methods: In `Appendix C <appendix_C_>`_, we have
|
|
provided an exhaustive list of ``str`` methods that preserve the
|
|
``LiteralString`` type.
|
|
|
|
In all other cases, if one or more of the composed values has a
|
|
non-literal type ``str``, the composition of types will have type
|
|
``str``. For example, if ``s`` has type ``str``, then ``"hello" + s``
|
|
has type ``str``. This matches the pre-existing behavior of type
|
|
checkers.
|
|
|
|
``LiteralString`` is compatible with the type ``str``. It inherits all
|
|
methods from ``str``. So, if we have a variable ``s`` of type
|
|
``LiteralString``, it is safe to write ``s.startswith("hello")``.
|
|
|
|
Some type checkers refine the type of a string when doing an equality
|
|
check:
|
|
|
|
::
|
|
|
|
def foo(s: str) -> None:
|
|
if s == "bar":
|
|
reveal_type(s) # => Literal["bar"]
|
|
|
|
Such a refined type in the if-block is also compatible with
|
|
``LiteralString`` because its type is ``Literal["bar"]``.
|
|
|
|
|
|
Examples
|
|
''''''''
|
|
|
|
See the examples below to help clarify the above rules:
|
|
|
|
::
|
|
|
|
|
|
literal_string: LiteralString
|
|
s: str = literal_string # OK
|
|
|
|
literal_string: LiteralString = s # Error: Expected LiteralString, got str.
|
|
literal_string: LiteralString = "hello" # OK
|
|
|
|
Addition of literal strings:
|
|
|
|
::
|
|
|
|
def expect_literal_string(s: LiteralString) -> None: ...
|
|
|
|
expect_literal_string("foo" + "bar") # OK
|
|
expect_literal_string(literal_string + "bar") # OK
|
|
|
|
literal_string2: LiteralString
|
|
expect_literal_string(literal_string + literal_string2) # OK
|
|
|
|
plain_string: str
|
|
expect_literal_string(literal_string + plain_string) # Not OK.
|
|
|
|
Join using literal strings:
|
|
|
|
::
|
|
|
|
expect_literal_string(",".join(["foo", "bar"])) # OK
|
|
expect_literal_string(literal_string.join(["foo", "bar"])) # OK
|
|
expect_literal_string(literal_string.join([literal_string, literal_string2])) # OK
|
|
|
|
xs: List[LiteralString]
|
|
expect_literal_string(literal_string.join(xs)) # OK
|
|
expect_literal_string(plain_string.join([literal_string, literal_string2]))
|
|
# Not OK because the separator has type 'str'.
|
|
|
|
In-place addition using literal strings:
|
|
|
|
::
|
|
|
|
literal_string += "foo" # OK
|
|
literal_string += literal_string2 # OK
|
|
literal_string += plain_string # Not OK
|
|
|
|
Format strings using literal strings:
|
|
|
|
::
|
|
|
|
literal_name: LiteralString
|
|
expect_literal_string(f"hello {literal_name}")
|
|
# OK because it is composed from literal strings.
|
|
|
|
expect_literal_string("hello {}".format(literal_name)) # OK
|
|
|
|
expect_literal_string(f"hello") # OK
|
|
|
|
username: str
|
|
expect_literal_string(f"hello {username}")
|
|
# NOT OK. The format-string is constructed from 'username',
|
|
# which has type 'str'.
|
|
|
|
expect_literal_string("hello {}".format(username)) # Not OK
|
|
|
|
Other literal types, such as literal integers, are not compatible with ``LiteralString``:
|
|
|
|
::
|
|
|
|
some_int: int
|
|
expect_literal_string(some_int) # Error: Expected LiteralString, got int.
|
|
|
|
literal_one: Literal[1] = 1
|
|
expect_literal_string(literal_one) # Error: Expected LiteralString, got Literal[1].
|
|
|
|
|
|
We can call functions on literal strings:
|
|
|
|
::
|
|
|
|
def add_limit(query: LiteralString) -> LiteralString:
|
|
return query + " LIMIT = 1"
|
|
|
|
def my_query(query: LiteralString, user_id: str) -> None:
|
|
sql_connection().execute(add_limit(query), (user_id,)) # OK
|
|
|
|
Conditional statements and expressions work as expected:
|
|
|
|
::
|
|
|
|
def return_literal_string() -> LiteralString:
|
|
return "foo" if condition1() else "bar" # OK
|
|
|
|
def return_literal_str2(literal_string: LiteralString) -> LiteralString:
|
|
return "foo" if condition1() else literal_string # OK
|
|
|
|
def return_literal_str3() -> LiteralString:
|
|
if condition1():
|
|
result: Literal["foo"] = "foo"
|
|
else:
|
|
result: LiteralString = "bar"
|
|
|
|
return result # OK
|
|
|
|
|
|
Interaction with TypeVars and Generics
|
|
''''''''''''''''''''''''''''''''''''''
|
|
|
|
TypeVars can be bound to ``LiteralString``:
|
|
|
|
::
|
|
|
|
from typing import Literal, LiteralString, TypeVar
|
|
|
|
TLiteral = TypeVar("TLiteral", bound=LiteralString)
|
|
|
|
def literal_identity(s: TLiteral) -> TLiteral:
|
|
return s
|
|
|
|
hello: Literal["hello"] = "hello"
|
|
y = literal_identity(hello)
|
|
reveal_type(y) # => Literal["hello"]
|
|
|
|
s: LiteralString
|
|
y2 = literal_identity(s)
|
|
reveal_type(y2) # => LiteralString
|
|
|
|
s_error: str
|
|
literal_identity(s_error)
|
|
# Error: Expected TLiteral (bound to LiteralString), got str.
|
|
|
|
|
|
``LiteralString`` can be used as a type argument for generic classes:
|
|
|
|
::
|
|
|
|
class Container(Generic[T]):
|
|
def __init__(self, value: T) -> None:
|
|
self.value = value
|
|
|
|
literal_string: LiteralString = "hello"
|
|
x: Container[LiteralString] = Container(literal_string) # OK
|
|
|
|
s: str
|
|
x_error: Container[LiteralString] = Container(s) # Not OK
|
|
|
|
Standard containers like ``List`` work as expected:
|
|
|
|
::
|
|
|
|
xs: List[LiteralString] = ["foo", "bar", "baz"]
|
|
|
|
|
|
Interactions with Overloads
|
|
'''''''''''''''''''''''''''
|
|
|
|
Literal strings and overloads do not need to interact in a special
|
|
way: the existing rules work fine. ``LiteralString`` can be used as a
|
|
fallback overload where a specific ``Literal["foo"]`` type does not
|
|
match:
|
|
|
|
::
|
|
|
|
@overload
|
|
def foo(x: Literal["foo"]) -> int: ...
|
|
@overload
|
|
def foo(x: LiteralString) -> bool: ...
|
|
@overload
|
|
def foo(x: str) -> str: ...
|
|
|
|
x1: int = foo("foo") # First overload.
|
|
x2: bool = foo("bar") # Second overload.
|
|
s: str
|
|
x3: str = foo(s) # Third overload.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
We propose adding ``typing_extensions.LiteralString`` for use in
|
|
earlier Python versions.
|
|
|
|
As :pep:`PEP 586 mentions
|
|
<586#backwards-compatibility>`,
|
|
type checkers "should feel free to experiment with more sophisticated
|
|
inference techniques". So, if the type checker infers a literal string
|
|
type for an unannotated variable that is initialized with a literal
|
|
string, the following example should be OK:
|
|
|
|
::
|
|
|
|
x = "hello"
|
|
expect_literal_string(x)
|
|
# OK, because x is inferred to have type 'Literal["hello"]'.
|
|
|
|
This enables precise type checking of idiomatic SQL query code without
|
|
annotating the code at all (as seen in the `Motivation`_ section
|
|
example).
|
|
|
|
However, like :pep:`586`, this PEP does not mandate the above inference
|
|
strategy. In case the type checker doesn't infer ``x`` to have type
|
|
``Literal["hello"]``, users can aid the type checker by explicitly
|
|
annotating it as ``x: LiteralString``:
|
|
|
|
::
|
|
|
|
x: LiteralString = "hello"
|
|
expect_literal_string(x)
|
|
|
|
|
|
Rejected Alternatives
|
|
=====================
|
|
|
|
Why not use tool X?
|
|
-------------------
|
|
|
|
Tools to catch issues such as SQL injection seem to come in three
|
|
flavors: AST based, function level analysis, and taint flow analysis.
|
|
|
|
**AST-based tools**: `Bandit
|
|
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_
|
|
has a plugin to warn when SQL queries are not literal
|
|
strings. The problem is that many perfectly safe SQL
|
|
queries are dynamically built out of string literals, as shown in the
|
|
`Motivation`_ section. At the
|
|
AST level, the resultant SQL query is not going to appear as a string
|
|
literal anymore and is thus indistinguishable from a potentially
|
|
malicious string. To use these tools would require significantly
|
|
restricting developers' ability to build SQL queries. ``LiteralString``
|
|
can provide similar safety guarantees with fewer restrictions.
|
|
|
|
**Semgrep and pyanalyze**: Semgrep supports a more sophisticated
|
|
function level analysis, including `constant propagation
|
|
<https://semgrep.dev/docs/writing-rules/data-flow/#constant-propagation>`_
|
|
within a function. This allows us to prevent injection attacks while
|
|
permitting some forms of safe dynamic SQL queries within a
|
|
function. `pyanalyze
|
|
<https://github.com/quora/pyanalyze/blob/afcb58cd3e967e4e3fea9e57bb18b6b1d9d42ed7/README.md#extending-pyanalyze>`_
|
|
has a similar extension. But neither handles function calls that
|
|
construct and return safe SQL queries. For example, in the code sample
|
|
below, ``build_insert_query`` is a helper function to create a query
|
|
that inserts multiple values into the corresponding columns. Semgrep
|
|
and pyanalyze forbid this natural usage whereas ``LiteralString``
|
|
handles it with no burden on the programmer:
|
|
|
|
::
|
|
|
|
def build_insert_query(
|
|
table: LiteralString
|
|
insert_columns: Iterable[LiteralString],
|
|
) -> LiteralString:
|
|
sql = "INSERT INTO " + table
|
|
|
|
column_clause = ", ".join(insert_columns)
|
|
value_clause = ", ".join(["?"] * len(insert_columns))
|
|
|
|
sql += f" ({column_clause}) VALUES ({value_clause})"
|
|
return sql
|
|
|
|
def insert_data(
|
|
conn: Connection,
|
|
kvs_to_insert: Dict[LiteralString, str]
|
|
) -> None:
|
|
query = build_insert_query("data", kvs_to_insert.keys())
|
|
conn.execute(query, kvs_to_insert.values())
|
|
|
|
# Example usage
|
|
data_to_insert = {
|
|
"column_1": value_1, # Note: values are not literals
|
|
"column_2": value_2,
|
|
"column_3": value_3,
|
|
}
|
|
insert_data(conn, data_to_insert)
|
|
|
|
|
|
**Taint flow analysis**: Tools such as `Pysa
|
|
<https://pyre-check.org/docs/pysa-basics/>`_ or `CodeQL
|
|
<https://codeql.github.com/>`_ are capable of tracking data flowing
|
|
from a user controlled input into a SQL query. These tools are
|
|
powerful but involve considerable overhead in setting up the tool in
|
|
CI, defining "taint" sinks and sources, and teaching developers how to
|
|
use them. They also usually take longer to run than a type checker
|
|
(minutes instead of seconds), which means feedback is not
|
|
immediate. Finally, they move the burden of preventing vulnerabilities
|
|
on to library users instead of allowing the libraries themselves to
|
|
specify precisely how their APIs must be called (as is possible with
|
|
``LiteralString``).
|
|
|
|
One final reason to prefer using a new type over a dedicated tool is
|
|
that type checkers are more widely used than dedicated security
|
|
tooling; for example, MyPy was downloaded `over 7 million times
|
|
<https://www.pypistats.org/packages/mypy>`_ in Jan 2022 vs `less than
|
|
2 million times <https://www.pypistats.org/packages/bandit>`_ for
|
|
Bandit. Having security protections built right into type checkers
|
|
will mean that more developers benefit from them.
|
|
|
|
|
|
Why not use a ``NewType`` for ``str``?
|
|
--------------------------------------
|
|
|
|
Any API for which ``LiteralString`` would be suitable could instead be
|
|
updated to accept a different type created within the Python type
|
|
system, such as ``NewType("SafeSQL", str)``:
|
|
|
|
::
|
|
|
|
SafeSQL = NewType("SafeSQL", str)
|
|
|
|
def execute(self, sql: SafeSQL, parameters: Iterable[str] = ...) -> Cursor: ...
|
|
|
|
execute(SafeSQL("SELECT * FROM data WHERE user_id = ?"), user_id) # OK
|
|
|
|
user_query: str
|
|
execute(user_query) # Error: Expected SafeSQL, got str.
|
|
|
|
|
|
Having to create a new type to call an API might give some developers
|
|
pause and encourage more caution, but it doesn't guarantee that
|
|
developers won't just turn a user controlled string into the new type,
|
|
and pass it into the modified API anyway:
|
|
|
|
::
|
|
|
|
query = f"SELECT * FROM data WHERE user_id = f{user_id}"
|
|
execute(SafeSQL(query)) # No error!
|
|
|
|
We are back to square one with the problem of preventing arbitrary
|
|
inputs to ``SafeSQL``. This is not a theoretical concern
|
|
either. Django uses the above approach with ``SafeString`` and
|
|
`mark_safe
|
|
<https://docs.djangoproject.com/en/dev/_modules/django/utils/safestring/#SafeString>`_. Issues
|
|
such as `CVE-2020-13596
|
|
<https://github.com/django/django/commit/2dd4d110c159d0c81dff42eaead2c378a0998735>`_
|
|
show how this technique can `fail
|
|
<https://nvd.nist.gov/vuln/detail/CVE-2020-13596>`_.
|
|
|
|
Also note that this requires invasive changes to the source code
|
|
(wrapping the query with ``SafeSQL``) whereas ``LiteralString``
|
|
requires no such changes. Users can remain oblivious to it as long as
|
|
they pass in literal strings to sensitive APIs.
|
|
|
|
Why not try to emulate Trusted Types?
|
|
-------------------------------------
|
|
|
|
`Trusted Types
|
|
<https://w3c.github.io/webappsec-trusted-types/dist/spec/>`_ is a W3C
|
|
specification for preventing DOM-based Cross Site Scripting (XSS). XSS
|
|
occurs when dangerous browser APIs accept raw user-controlled
|
|
strings. The specification modifies these APIs to accept only the
|
|
"Trusted Types" returned by designated sanitizing functions. These
|
|
sanitizing functions must take in a potentially malicious string and
|
|
validate it or render it benign somehow, for example by verifying that
|
|
it is a valid URL or HTML-encoding it.
|
|
|
|
It can be tempting to assume porting the concept of Trusted Types to
|
|
Python could solve the problem. The fundamental difference, however,
|
|
is that the output of a Trusted Types sanitizer is usually intended
|
|
*to not be executable code*. Thus it's easy to HTML encode the input,
|
|
strip out dangerous tags, or otherwise render it inert. With a SQL
|
|
query or shell command, the end result *still needs to be executable
|
|
code*. There is no way to write a sanitizer that can reliably figure
|
|
out which parts of an input string are benign and which ones are
|
|
potentially malicious.
|
|
|
|
Runtime Checkable ``LiteralString``
|
|
-----------------------------------
|
|
|
|
The ``LiteralString`` concept could be extended beyond static type
|
|
checking to be a runtime checkable property of ``str`` objects. This
|
|
would provide some benefits, such as allowing frameworks to raise
|
|
errors on dynamic strings. Such runtime errors would be a more robust
|
|
defense mechanism than type errors, which can potentially be
|
|
suppressed, ignored, or never even seen if the author does not use a
|
|
type checker.
|
|
|
|
This extension to the ``LiteralString`` concept would dramatically
|
|
increase the scope of the proposal by requiring changes to one of the
|
|
most fundamental types in Python. While runtime taint checking on
|
|
strings, similar to Perl's `taint <https://metacpan.org/pod/Taint>`_,
|
|
has been `considered <https://bugs.python.org/issue500698>`_ and
|
|
`attempted <https://github.com/felixgr/pytaint>`_ in the past, and
|
|
others may consider it in the future, such extensions are out of scope
|
|
for this PEP.
|
|
|
|
|
|
Rejected Names
|
|
--------------
|
|
|
|
We considered a variety of names for the literal string type and
|
|
solicited ideas on `typing-sig
|
|
<https://mail.python.org/archives/list/typing-sig@python.org/thread/VB74EHNM4RODDFM64NEEEBJQVAUAWIAW/>`_.
|
|
Some notable alternatives were:
|
|
|
|
+ ``Literal[str]``: This is a natural extension of the
|
|
``Literal["foo"]`` type name, but typing-sig `objected
|
|
<https://mail.python.org/archives/list/typing-sig@python.org/message/2ZQO4NTJEI42KTRJDBL77MNANEXOW7UI/>`_
|
|
that users could mistake this for the literal type of the ``str``
|
|
class.
|
|
|
|
+ ``LiteralStr``: This is shorter than ``LiteralString`` but looks
|
|
weird to the PEP authors.
|
|
|
|
+ ``LiteralDerivedString``: This (along with
|
|
``MadeFromLiteralString``) best captures the technical meaning of
|
|
the type. It represents not just the type of literal expressions,
|
|
such as ``"foo"``, but also that of expressions composed from
|
|
literals, such as ``"foo" + "bar"``. However, both names seem wordy.
|
|
|
|
+ ``StringLiteral``: Users might confuse this with the existing
|
|
concept of `"string literals"
|
|
<https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>`_
|
|
where the string exists as a syntactic token in the source code,
|
|
whereas our concept is more general.
|
|
|
|
+ ``SafeString``: While this comes close to our intended meaning, it
|
|
may mislead users into thinking that the string has been sanitized in
|
|
some way, perhaps by escaping HTML tags or shell-related special
|
|
characters.
|
|
|
|
+ ``ConstantStr``: This does not capture the idea of composing literal
|
|
strings.
|
|
|
|
+ ``StaticStr``: This suggests that the string is statically
|
|
computable, i.e., computable without running the program, which is
|
|
not true. The literal string may vary based on runtime flags, as
|
|
seen in the `Motivation`_ examples.
|
|
|
|
+ ``LiteralOnly[str]``: This has the advantage of being extensible to
|
|
other literal types, such as ``bytes`` or ``int``. However, we did
|
|
not find the extensibility worth the loss of readability.
|
|
|
|
Overall, there was no clear winner on typing-sig over a long period,
|
|
so we decided to tip the scales in favor of ``LiteralString``.
|
|
|
|
|
|
``LiteralBytes``
|
|
----------------
|
|
|
|
We could generalize literal byte types, such as ``Literal[b"foo"]``,
|
|
to ``LiteralBytes``. However, literal byte types are used much less
|
|
frequently than literal string types and we did not find much user
|
|
demand for ``LiteralBytes``, so we decided not to include it in this
|
|
PEP. Others may, however, consider it in future PEPs.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
This is implemented in Pyre v0.9.8 and is actively being used.
|
|
|
|
The implementation simply extends the type checker with
|
|
``LiteralString`` as a supertype of literal string types.
|
|
|
|
To support composition via addition, join, etc., it was sufficient to
|
|
overload the stubs for ``str`` in Pyre's copy of typeshed.
|
|
|
|
|
|
Appendix A: Other Uses
|
|
======================
|
|
|
|
To simplify the discussion and require minimal security knowledge, we
|
|
focused on SQL injections throughout the PEP. ``LiteralString``,
|
|
however, can also be used to prevent many other kinds of `injection
|
|
vulnerabilities <https://owasp.org/www-community/Injection_Flaws>`_.
|
|
|
|
Command Injection
|
|
-----------------
|
|
|
|
APIs such as ``subprocess.run`` accept a string which can be run as a
|
|
shell command:
|
|
|
|
::
|
|
|
|
subprocess.run(f"echo 'Hello {name}'", shell=True)
|
|
|
|
If user-controlled data is included in the command string, the code is
|
|
vulnerable to "command injection"; i.e., an attacker can run malicious
|
|
commands. For example, a value of ``' && rm -rf / #`` would result in
|
|
the following destructive command being run:
|
|
|
|
::
|
|
|
|
echo 'Hello ' && rm -rf / #'
|
|
|
|
This vulnerability could be prevented by updating ``run`` to only
|
|
accept ``LiteralString`` when used in ``shell=True`` mode. Here is one
|
|
simplified stub:
|
|
|
|
::
|
|
|
|
def run(command: LiteralString, *args: str, shell: bool=...): ...
|
|
|
|
Cross Site Scripting (XSS)
|
|
--------------------------
|
|
|
|
Most popular Python web frameworks, such as Django, use a templating
|
|
engine to produce HTML from user data. These templating languages
|
|
auto-escape user data before inserting it into the HTML template and
|
|
thus prevent cross site scripting (XSS) vulnerabilities.
|
|
|
|
But a common way to `bypass auto-escaping
|
|
<https://django.readthedocs.io/en/stable/ref/templates/language.html#how-to-turn-it-off>`_
|
|
and render HTML as-is is to use functions like ``mark_safe`` in
|
|
`Django
|
|
<https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.safestring.mark_safe>`_
|
|
or ``do_mark_safe`` in `Jinja2
|
|
<https://github.com/pallets/jinja/blob/077b7918a7642ff6742fe48a32e54d7875140894/src/jinja2/filters.py#L1264>`_,
|
|
which cause XSS vulnerabilities:
|
|
|
|
::
|
|
|
|
dangerous_string = django.utils.safestring.mark_safe(f"<script>{user_input}</script>")
|
|
return(dangerous_string)
|
|
|
|
This vulnerability could be prevented by updating ``mark_safe`` to
|
|
only accept ``LiteralString``:
|
|
|
|
::
|
|
|
|
def mark_safe(s: LiteralString) -> str: ...
|
|
|
|
Server Side Template Injection (SSTI)
|
|
-------------------------------------
|
|
|
|
Templating frameworks, such as Jinja, allow Python expressions which
|
|
will be evaluated and substituted into the rendered result:
|
|
|
|
::
|
|
|
|
template_str = "There are {{ len(values) }} values: {{ values }}"
|
|
template = jinja2.Template(template_str)
|
|
template.render(values=[1, 2])
|
|
# Result: "There are 2 values: [1, 2]"
|
|
|
|
If an attacker controls all or part of the template string, they can
|
|
insert expressions which execute arbitrary code and `compromise
|
|
<https://www.onsecurity.io/blog/server-side-template-injection-with-jinja2/>`_
|
|
the application:
|
|
|
|
::
|
|
|
|
malicious_str = "{{''.__class__.__base__.__subclasses__()[408]('rm - rf /',shell=True)}}"
|
|
template = jinja2.Template(malicious_str)
|
|
template.render()
|
|
# Result: The shell command 'rm - rf /' is run
|
|
|
|
Template injection exploits like this could be prevented by updating
|
|
the ``Template`` API to only accept ``LiteralString``:
|
|
|
|
::
|
|
|
|
class Template:
|
|
def __init__(self, source: LiteralString): ...
|
|
|
|
|
|
Logging Format String Injection
|
|
-------------------------------
|
|
|
|
Logging frameworks often allow their input strings to contain
|
|
formatting directives. At its worst, allowing users to control the
|
|
logged string has led to `CVE-2021-44228
|
|
<https://nvd.nist.gov/vuln/detail/CVE-2021-44228>`_ (colloquially
|
|
known as ``log4shell``), which has been described as the `"most
|
|
critical vulnerability of the last decade"
|
|
<https://www.theguardian.com/technology/2021/dec/10/software-flaw-most-critical-vulnerability-log-4-shell>`_.
|
|
While no Python frameworks are currently known to be vulnerable to a
|
|
similar attack, the built-in logging framework does provide formatting
|
|
options which are vulnerable to Denial of Service attacks from
|
|
externally controlled logging strings. The following example
|
|
illustrates a simple denial of service scenario:
|
|
|
|
::
|
|
|
|
external_string = "%(foo)999999999s"
|
|
...
|
|
# Tries to add > 1GB of whitespace to the logged string:
|
|
logger.info(f'Received: {external_string}', some_dict)
|
|
|
|
This kind of attack could be prevented by requiring that the format
|
|
string passed to the logger be a ``LiteralString`` and that all
|
|
externally controlled data be passed separately as arguments (as
|
|
proposed in `Issue 46200 <https://bugs.python.org/issue46200>`_):
|
|
|
|
::
|
|
|
|
def info(msg: LiteralString, *args: object) -> None:
|
|
...
|
|
|
|
|
|
Appendix B: Limitations
|
|
=======================
|
|
|
|
There are a number of ways ``LiteralString`` could still fail to
|
|
prevent users from passing strings built from non-literal data to an
|
|
API:
|
|
|
|
1. If the developer does not use a type checker or does not add type
|
|
annotations, then violations will go uncaught.
|
|
|
|
2. ``cast(LiteralString, non_literal_string)`` could be used to lie to
|
|
the type checker and allow a dynamic string value to masquerade as a
|
|
``LiteralString``. The same goes for a variable that has type ``Any``.
|
|
|
|
3. Comments such as ``# type: ignore`` could be used to ignore
|
|
warnings about non-literal strings.
|
|
|
|
4. Trivial functions could be constructed to convert a ``str`` to a
|
|
``LiteralString``:
|
|
|
|
::
|
|
|
|
def make_literal(s: str) -> LiteralString:
|
|
letters: Dict[str, LiteralString] = {
|
|
"A": "A",
|
|
"B": "B",
|
|
...
|
|
}
|
|
output: List[LiteralString] = [letters[c] for c in s]
|
|
return "".join(output)
|
|
|
|
|
|
We could mitigate the above using linting, code review, etc., but
|
|
ultimately a clever, malicious developer attempting to circumvent the
|
|
protections offered by ``LiteralString`` will always succeed. The
|
|
important thing to remember is that ``LiteralString`` is not intended
|
|
to protect against *malicious* developers; it is meant to protect
|
|
against benign developers accidentally using sensitive APIs in a
|
|
dangerous way (without getting in their way otherwise).
|
|
|
|
Without ``LiteralString``, the best enforcement tool API authors have
|
|
is documentation, which is easily ignored and often not seen. With
|
|
``LiteralString``, API misuse requires conscious thought and artifacts
|
|
in the code that reviewers and future developers can notice.
|
|
|
|
.. _appendix_C:
|
|
|
|
Appendix C: ``str`` methods that preserve ``LiteralString``
|
|
===========================================================
|
|
|
|
The ``str`` class has several methods that would benefit from
|
|
``LiteralString``. For example, users might expect
|
|
``"hello".capitalize()`` to have the type ``LiteralString`` similar to
|
|
the other examples we have seen in the `Inferring LiteralString
|
|
<inferring_literal_string_>`_ section. Inferring the type
|
|
``LiteralString`` is correct because the string is not an arbitrary
|
|
user-supplied string - we know that it has the type
|
|
``Literal["HELLO"]``, which is compatible with ``LiteralString``. In
|
|
other words, the ``capitalize`` method preserves the ``LiteralString``
|
|
type. There are several other ``str`` methods that preserve
|
|
``LiteralString``.
|
|
|
|
We propose updating the stub for ``str`` in typeshed so that the
|
|
methods are overloaded with the ``LiteralString``-preserving
|
|
versions. This means type checkers do not have to hardcode
|
|
``LiteralString`` behavior for each method. It also lets us easily
|
|
support new methods in the future by updating the typeshed stub.
|
|
|
|
For example, to preserve literal types for the ``capitalize`` method,
|
|
we would change the stub as below:
|
|
|
|
::
|
|
|
|
# before
|
|
def capitalize(self) -> str: ...
|
|
|
|
# after
|
|
@overload
|
|
def capitalize(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def capitalize(self) -> str: ...
|
|
|
|
The downside of changing the ``str`` stub is that the stub becomes
|
|
more complicated and can make error messages harder to
|
|
understand. Type checkers may need to special-case ``str`` to make
|
|
error messages understandable for users.
|
|
|
|
Below is an exhaustive list of ``str`` methods which, when called with
|
|
arguments of type ``LiteralString``, must be treated as returning a
|
|
``LiteralString``. If this PEP is accepted, we will update these
|
|
method signatures in typeshed:
|
|
|
|
::
|
|
|
|
@overload
|
|
def capitalize(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def capitalize(self) -> str: ...
|
|
|
|
@overload
|
|
def casefold(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def casefold(self) -> str: ...
|
|
|
|
@overload
|
|
def center(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
|
|
@overload
|
|
def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
|
|
|
if sys.version_info >= (3, 8):
|
|
@overload
|
|
def expandtabs(self: LiteralString, tabsize: SupportsIndex = ...) -> LiteralString: ...
|
|
@overload
|
|
def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ...
|
|
|
|
else:
|
|
@overload
|
|
def expandtabs(self: LiteralString, tabsize: int = ...) -> LiteralString: ...
|
|
@overload
|
|
def expandtabs(self, tabsize: int = ...) -> str: ...
|
|
|
|
@overload
|
|
def format(self: LiteralString, *args: LiteralString, **kwargs: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def format(self, *args: str, **kwargs: str) -> str: ...
|
|
|
|
@overload
|
|
def join(self: LiteralString, __iterable: Iterable[LiteralString]) -> LiteralString: ...
|
|
@overload
|
|
def join(self, __iterable: Iterable[str]) -> str: ...
|
|
|
|
@overload
|
|
def ljust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
|
|
@overload
|
|
def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
|
|
|
@overload
|
|
def lower(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def lower(self) -> LiteralString: ...
|
|
|
|
@overload
|
|
def lstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
|
|
@overload
|
|
def lstrip(self, __chars: str | None = ...) -> str: ...
|
|
|
|
@overload
|
|
def partition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ...
|
|
@overload
|
|
def partition(self, __sep: str) -> tuple[str, str, str]: ...
|
|
|
|
@overload
|
|
def replace(self: LiteralString, __old: LiteralString, __new: LiteralString, __count: SupportsIndex = ...) -> LiteralString: ...
|
|
@overload
|
|
def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ...
|
|
|
|
if sys.version_info >= (3, 9):
|
|
@overload
|
|
def removeprefix(self: LiteralString, __prefix: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def removeprefix(self, __prefix: str) -> str: ...
|
|
|
|
@overload
|
|
def removesuffix(self: LiteralString, __suffix: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def removesuffix(self, __suffix: str) -> str: ...
|
|
|
|
@overload
|
|
def rjust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
|
|
@overload
|
|
def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
|
|
|
@overload
|
|
def rpartition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ...
|
|
@overload
|
|
def rpartition(self, __sep: str) -> tuple[str, str, str]: ...
|
|
|
|
@overload
|
|
def rsplit(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ...
|
|
@overload
|
|
def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
|
|
|
|
@overload
|
|
def rstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
|
|
@overload
|
|
def rstrip(self, __chars: str | None = ...) -> str: ...
|
|
|
|
@overload
|
|
def split(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ...
|
|
@overload
|
|
def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
|
|
|
|
@overload
|
|
def splitlines(self: LiteralString, keepends: bool = ...) -> list[LiteralString]: ...
|
|
@overload
|
|
def splitlines(self, keepends: bool = ...) -> list[str]: ...
|
|
|
|
@overload
|
|
def strip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
|
|
@overload
|
|
def strip(self, __chars: str | None = ...) -> str: ...
|
|
|
|
@overload
|
|
def swapcase(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def swapcase(self) -> str: ...
|
|
|
|
@overload
|
|
def title(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def title(self) -> str: ...
|
|
|
|
@overload
|
|
def upper(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def upper(self) -> str: ...
|
|
|
|
@overload
|
|
def zfill(self: LiteralString, __width: SupportsIndex) -> LiteralString: ...
|
|
@overload
|
|
def zfill(self, __width: SupportsIndex) -> str: ...
|
|
|
|
@overload
|
|
def __add__(self: LiteralString, __s: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def __add__(self, __s: str) -> str: ...
|
|
|
|
@overload
|
|
def __iter__(self: LiteralString) -> Iterator[str]: ...
|
|
@overload
|
|
def __iter__(self) -> Iterator[str]: ...
|
|
|
|
@overload
|
|
def __mod__(self: LiteralString, __x: Union[LiteralString, Tuple[LiteralString, ...]]) -> str: ...
|
|
@overload
|
|
def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ...
|
|
|
|
@overload
|
|
def __mul__(self: LiteralString, __n: SupportsIndex) -> LiteralString: ...
|
|
@overload
|
|
def __mul__(self, __n: SupportsIndex) -> str: ...
|
|
|
|
@overload
|
|
def __repr__(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def __repr__(self) -> str: ...
|
|
|
|
@overload
|
|
def __rmul__(self: LiteralString, n: SupportsIndex) -> LiteralString: ...
|
|
@overload
|
|
def __rmul__(self, n: SupportsIndex) -> str: ...
|
|
|
|
@overload
|
|
def __str__(self: LiteralString) -> LiteralString: ...
|
|
@overload
|
|
def __str__(self) -> str: ...
|
|
|
|
|
|
Appendix D: Guidelines for using ``LiteralString`` in Stubs
|
|
===========================================================
|
|
|
|
Libraries that do not contain type annotations within their source may
|
|
specify type stubs in Typeshed. Libraries written in other languages,
|
|
such as those for machine learning, may also provide Python type
|
|
stubs. This means the type checker cannot verify that the type
|
|
annotations match the source code and must trust the type stub. Thus,
|
|
authors of type stubs need to be careful when using ``LiteralString``,
|
|
since a function may falsely appear to be safe when it is not.
|
|
|
|
We recommend the following guidelines for using ``LiteralString`` in stubs:
|
|
|
|
+ If the stub is for a pure function, we recommend using ``LiteralString``
|
|
in the return type of the function or of its overloads only if all
|
|
the corresponding parameters have literal types (i.e.,
|
|
``LiteralString`` or ``Literal["a", "b"]``).
|
|
|
|
::
|
|
|
|
# OK
|
|
@overload
|
|
def my_transform(x: LiteralString, y: Literal["a", "b"]) -> LiteralString: ...
|
|
@overload
|
|
def my_transform(x: str, y: str) -> str: ...
|
|
|
|
# Not OK
|
|
@overload
|
|
def my_transform(x: LiteralString, y: str) -> LiteralString: ...
|
|
@overload
|
|
def my_transform(x: str, y: str) -> str: ...
|
|
|
|
+ If the stub is for a ``staticmethod``, we recommend the same
|
|
guideline as above.
|
|
|
|
+ If the stub is for any other kind of method, we recommend against
|
|
using ``LiteralString`` in the return type of the method or any of
|
|
its overloads. This is because, even if all the explicit parameters
|
|
have type ``LiteralString``, the object itself may be created using
|
|
user data and thus the return type may be user-controlled.
|
|
|
|
+ If the stub is for a class attribute or global variable, we also
|
|
recommend against using ``LiteralString`` because the untyped code
|
|
may write arbitrary values to the attribute.
|
|
|
|
However, we leave the final call to the library author. They may use
|
|
``LiteralString`` if they feel confident that the string returned by
|
|
the method or function or the string stored in the attribute is
|
|
guaranteed to have a literal type - i.e., the string is created by
|
|
applying only literal-preserving ``str`` operations to a string
|
|
literal.
|
|
|
|
Note that these guidelines do not apply to inline type annotations
|
|
since the type checker can verify that, say, a method returning
|
|
``LiteralString`` does in fact return an expression of that type.
|
|
|
|
|
|
Resources
|
|
=========
|
|
|
|
Literal String Types in Scala
|
|
-----------------------------
|
|
|
|
Scala `uses
|
|
<https://www.scala-lang.org/api/2.13.x/scala/Singleton.html>`_
|
|
``Singleton`` as the supertype for singleton types, which includes
|
|
literal string types, such as ``"foo"``. ``Singleton`` is Scala's
|
|
generalized analogue of this PEP's ``LiteralString``.
|
|
|
|
Tamer Abdulradi showed how Scala's literal string types can be used
|
|
for "Preventing SQL injection at compile time", Scala Days talk
|
|
`Literal types: What are they good for?
|
|
<https://slideslive.com/38907881/literal-types-what-they-are-good-for>`_
|
|
(slides 52 to 68).
|
|
|
|
Thanks
|
|
------
|
|
|
|
Thanks to the following people for their feedback on the PEP:
|
|
|
|
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев,
|
|
CAM Gerlach, Arie Bovenberg, David Foster, and Shengye Wan
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|