PEP 675: Updates (#2282)
This commit is contained in:
parent
e43f567e93
commit
6f11197188
365
pep-0675.rst
365
pep-0675.rst
|
@ -82,7 +82,7 @@ the AST or by other semantic pattern-matching. These tools, however,
|
|||
preclude common idioms like storing a large multi-line query in a
|
||||
variable before executing it, adding literal string modifiers to the
|
||||
query based on some conditions, or transforming the query string using
|
||||
a function. (We survey existing tools in the "Rejected Alternatives"
|
||||
a function. (We survey existing tools in the `Rejected Alternatives`_
|
||||
section.) For example, many tools will detect a false positive issue
|
||||
in this benign snippet:
|
||||
|
||||
|
@ -112,7 +112,7 @@ generalization of the ``Literal["foo"]`` type from :pep:`586`.
|
|||
A string of type
|
||||
``Literal[str]`` cannot contain user-controlled data. Thus, any API
|
||||
that only accepts ``Literal[str]`` will be immune to injection
|
||||
vulnerabilities (with pragmatic `limitations <Appendix B:
|
||||
vulnerabilities (with `pragmatic limitations <Appendix B:
|
||||
Limitations_>`_).
|
||||
|
||||
Since we want the ``sqlite3`` ``execute`` method to disallow strings
|
||||
|
@ -202,9 +202,9 @@ heuristics, such as regex-filtering for obviously malicious payloads,
|
|||
there will always be a way to work around them (perfectly
|
||||
distinguishing good and bad queries reduces to the halting problem).
|
||||
|
||||
Static approaches like checking the AST to see if the query string is
|
||||
a literal string expression cannot tell when a string is assigned to
|
||||
an intermediate variable or when it is transformed by a benign
|
||||
Static approaches, such as checking the AST to see if the query string
|
||||
is a literal string expression, cannot tell when a string is assigned
|
||||
to an intermediate variable or when it is transformed by a benign
|
||||
function. This makes them overly restrictive.
|
||||
|
||||
The type checker, surprisingly, does better than both because it has
|
||||
|
@ -300,6 +300,7 @@ if they evaluate to the same value (``str``), such as
|
|||
Type Inference
|
||||
==============
|
||||
|
||||
.. _inferring_literal_str:
|
||||
|
||||
Inferring ``Literal[str]``
|
||||
--------------------------
|
||||
|
@ -327,6 +328,10 @@ following cases:
|
|||
has type ``Literal[str]`` if and only if ``s`` and the arguments have
|
||||
types compatible with ``Literal[str]``.
|
||||
|
||||
+ Literal-preserving methods: In `Appendix C <appendix_C_>`_, we have
|
||||
provided an exhaustive list of ``str`` methods that preserve the
|
||||
``Literal[str]`` type.
|
||||
|
||||
In all other cases, if one or more of the composed values has a
|
||||
non-literal type ``str``, the composition of types will have type
|
||||
``str``. For example, if ``s`` has type ``str``, then ``"hello" + s``
|
||||
|
@ -337,10 +342,6 @@ checkers.
|
|||
methods from ``str``. So, if we have a variable ``s`` of type
|
||||
``Literal[str]``, it is safe to write ``s.startswith("hello")``.
|
||||
|
||||
Note that, beyond the few composition rules mentioned above, this PEP
|
||||
doesn't change inference for other ``str`` methods such as
|
||||
``literal_string.upper()``.
|
||||
|
||||
Some type checkers refine the type of a string when doing an equality
|
||||
check:
|
||||
|
||||
|
@ -366,7 +367,7 @@ See the examples below to help clarify the above rules:
|
|||
s: str = literal_string # OK
|
||||
|
||||
literal_string: Literal[str] = s # Error: Expected Literal[str], got str.
|
||||
literal_string: Literal[str] = "hello" # OK
|
||||
literal_string: Literal[str] = "hello" # OK
|
||||
|
||||
|
||||
def expect_literal_str(s: Literal[str]) -> None: ...
|
||||
|
@ -577,11 +578,10 @@ Rejected Alternatives
|
|||
Why not use tool X?
|
||||
-------------------
|
||||
|
||||
Focusing solely on the example of preventing SQL injection, tooling to
|
||||
catch this kind of issue seems to come in three flavors: AST based,
|
||||
function level analysis, and taint flow analysis.
|
||||
Tools to catch issues such as SQL injection seem to come in three
|
||||
flavors: AST based, function level analysis, and taint flow analysis.
|
||||
|
||||
**AST based tools include Bandit**: `Bandit
|
||||
**AST-based tools**: `Bandit
|
||||
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_
|
||||
has a plugin to warn when SQL queries are not literal
|
||||
strings. The problem is that many perfectly safe SQL
|
||||
|
@ -630,7 +630,7 @@ handles it with no burden on the programmer:
|
|||
|
||||
# Example usage
|
||||
data_to_insert = {
|
||||
"column_1": value_1, # Note: values are not literals
|
||||
"column_1": value_1, # Note: values are not literals
|
||||
"column_2": value_2,
|
||||
"column_3": value_3,
|
||||
}
|
||||
|
@ -650,6 +650,14 @@ on to library users instead of allowing the libraries themselves to
|
|||
specify precisely how their APIs must be called (as is possible with
|
||||
``Literal[str]``).
|
||||
|
||||
One final reason to prefer using a new type over a dedicated tool is
|
||||
that type checkers are more widely used than dedicated security
|
||||
tooling; for example, MyPy was downloaded `over 7 million times
|
||||
<https://www.pypistats.org/packages/mypy>`_ in Jan 2022 vs `less than
|
||||
2 million times <https://www.pypistats.org/packages/bandit>`_ for
|
||||
Bandit. Having security protections built right into type checkers
|
||||
will mean that more developers benefit from them.
|
||||
|
||||
|
||||
Why not use a ``NewType`` for ``str``?
|
||||
--------------------------------------
|
||||
|
@ -748,27 +756,8 @@ The implementation simply extends the type checker with
|
|||
``Literal[str]`` as a supertype of literal string types.
|
||||
|
||||
To support composition via addition, join, etc., it was sufficient to
|
||||
overload the stubs for ``str`` in Pyre's copy of typeshed. For
|
||||
example, we replaced ``str`` ``__add__``:
|
||||
overload the stubs for ``str`` in Pyre's copy of typeshed.
|
||||
|
||||
::
|
||||
|
||||
# Before:
|
||||
def __add__(self, s: str) -> str: ...
|
||||
|
||||
# After:
|
||||
@overload
|
||||
def __add__(self: Literal[str], other: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def __add__(self, other: str) -> str: ...
|
||||
|
||||
This means that addition of non-literal string types remains to have
|
||||
type ``str``. The only change is that addition of literal string types
|
||||
now produces ``Literal[str]``.
|
||||
|
||||
One implementation strategy is to update the official Typeshed `stub
|
||||
<https://github.com/python/typeshed/blob/aa7e277adb9049e24ea3434fc9848defbfa87673/stdlib/builtins.pyi#L420>`_
|
||||
for ``str`` with these changes.
|
||||
|
||||
Appendix A: Other Uses
|
||||
======================
|
||||
|
@ -868,6 +857,40 @@ the ``Template`` API to only accept ``Literal[str]``:
|
|||
def __init__(self, source: Literal[str]): ...
|
||||
|
||||
|
||||
Logging Format String Injection
|
||||
-------------------------------
|
||||
|
||||
Logging frameworks often allow their input strings to contain
|
||||
formatting directives. At its worst, allowing users to control the
|
||||
logged string has led to `CVE-2021-44228
|
||||
<https://nvd.nist.gov/vuln/detail/CVE-2021-44228>`_ (colloquially
|
||||
known as ``log4shell``), which has been described as the `"most
|
||||
critical vulnerability of the last decade"
|
||||
<https://www.theguardian.com/technology/2021/dec/10/software-flaw-most-critical-vulnerability-log-4-shell>`_.
|
||||
While no Python frameworks are currently known to be vulnerable to a
|
||||
similar attack, the built-in logging framework does provide formatting
|
||||
options which are vulnerable to Denial of Service attacks from
|
||||
externally controlled logging strings. The following example
|
||||
illustrates a simple denial of service scenario:
|
||||
|
||||
::
|
||||
|
||||
external_string = "%(foo)999999999s"
|
||||
...
|
||||
# Tries to add > 1GB of whitespace to the logged string:
|
||||
logger.info(f'Received: {external_string}', some_dict)
|
||||
|
||||
This kind of attack could be prevented by requiring that the format
|
||||
string passed to the logger be a ``Literal[str]`` and that all
|
||||
externally controlled data be passed separately as arguments (as
|
||||
proposed in `Issue 46200 <https://bugs.python.org/issue46200>`_):
|
||||
|
||||
::
|
||||
|
||||
def info(msg: Literal[str], *args: object) -> None:
|
||||
...
|
||||
|
||||
|
||||
Appendix B: Limitations
|
||||
=======================
|
||||
|
||||
|
@ -913,6 +936,275 @@ is documentation, which is easily ignored and often not seen. With
|
|||
``Literal[str]``, API misuse requires conscious thought and artifacts
|
||||
in the code that reviewers and future developers can notice.
|
||||
|
||||
.. _appendix_C:
|
||||
|
||||
Appendix C: ``str`` methods that preserve ``Literal[str]``
|
||||
==========================================================
|
||||
|
||||
The ``str`` class has several methods that would benefit from
|
||||
``Literal[str]``. For example, users might expect
|
||||
``"hello".capitalize()`` to have the type ``Literal[str]`` similar to
|
||||
the other examples we have seen in the `Inferring Literal[str]
|
||||
<inferring_literal_str>`_ section. Inferring the type ``Literal[str]``
|
||||
is correct because the string is not an arbitrary user-supplied string
|
||||
- we know that it has the type ``Literal["HELLO"]``, which is
|
||||
compatible with ``Literal[str]``. In other words, the ``capitalize``
|
||||
method preserves the ``Literal[str]`` type. There are several other
|
||||
``str`` methods that preserve ``Literal[str]``.
|
||||
|
||||
We propose updating the stub for ``str`` in typeshed so that the
|
||||
methods are overloaded with the ``Literal[str]``-preserving
|
||||
versions. This means type checkers do not have to hardcode
|
||||
``Literal[str]`` behavior for each method. It also lets us easily
|
||||
support new methods in the future by updating the typeshed stub.
|
||||
|
||||
For example, to preserve literal types for the ``capitalize`` method,
|
||||
we would change the stub as below:
|
||||
|
||||
::
|
||||
|
||||
# before
|
||||
def capitalize(self) -> str: ...
|
||||
|
||||
# after
|
||||
@overload
|
||||
def capitalize(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def capitalize(self) -> str: ...
|
||||
|
||||
The downside of changing the ``str`` stub is that the stub becomes
|
||||
more complicated and can make error messages harder to
|
||||
understand. Type checkers may need to special-case ``str`` to make
|
||||
error messages understandable for users.
|
||||
|
||||
Below is an exhaustive list of ``str`` methods which, when called as
|
||||
indicated with arguments of type ``Literal[str]``, must be treated as
|
||||
returning a ``Literal[str]``. If this PEP is accepted, we will update
|
||||
these method signatures in typeshed:
|
||||
|
||||
::
|
||||
|
||||
@overload
|
||||
def capitalize(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def capitalize(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def casefold(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def casefold(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
||||
|
||||
if sys.version_info >= (3, 8):
|
||||
@overload
|
||||
def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ...
|
||||
|
||||
else:
|
||||
@overload
|
||||
def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def expandtabs(self, tabsize: int = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def format(self, *args: str, **kwargs: str) -> str: ...
|
||||
|
||||
@overload
|
||||
def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ...
|
||||
@overload
|
||||
def join(self, __iterable: Iterable[str]) -> str: ...
|
||||
|
||||
@overload
|
||||
def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def lower(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def lower(self) -> Literal[str]: ...
|
||||
|
||||
@overload
|
||||
def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def lstrip(self, __chars: str | None = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...
|
||||
@overload
|
||||
def partition(self, __sep: str) -> tuple[str, str, str]: ...
|
||||
|
||||
@overload
|
||||
def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ...
|
||||
|
||||
if sys.version_info >= (3, 9):
|
||||
@overload
|
||||
def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def removeprefix(self, __prefix: str) -> str: ...
|
||||
|
||||
@overload
|
||||
def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def removesuffix(self, __suffix: str) -> str: ...
|
||||
|
||||
@overload
|
||||
def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...
|
||||
@overload
|
||||
def rpartition(self, __sep: str) -> tuple[str, str, str]: ...
|
||||
|
||||
@overload
|
||||
def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...
|
||||
@overload
|
||||
def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
|
||||
|
||||
@overload
|
||||
def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def rstrip(self, __chars: str | None = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...
|
||||
@overload
|
||||
def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
|
||||
|
||||
@overload
|
||||
def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ...
|
||||
@overload
|
||||
def splitlines(self, keepends: bool = ...) -> list[str]: ...
|
||||
|
||||
@overload
|
||||
def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
|
||||
@overload
|
||||
def strip(self, __chars: str | None = ...) -> str: ...
|
||||
|
||||
@overload
|
||||
def swapcase(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def swapcase(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def title(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def title(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def upper(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def upper(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ...
|
||||
@overload
|
||||
def zfill(self, __width: SupportsIndex) -> str: ...
|
||||
|
||||
@overload
|
||||
def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def __add__(self, __s: str) -> str: ...
|
||||
|
||||
@overload
|
||||
def __iter__(self: Literal[str]) -> Iterator[str]: ...
|
||||
@overload
|
||||
def __iter__(self) -> Iterator[str]: ...
|
||||
|
||||
@overload
|
||||
def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ...
|
||||
@overload
|
||||
def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ...
|
||||
|
||||
@overload
|
||||
def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ...
|
||||
@overload
|
||||
def __mul__(self, __n: SupportsIndex) -> str: ...
|
||||
|
||||
@overload
|
||||
def __repr__(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def __repr__(self) -> str: ...
|
||||
|
||||
@overload
|
||||
def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ...
|
||||
@overload
|
||||
def __rmul__(self, n: SupportsIndex) -> str: ...
|
||||
|
||||
@overload
|
||||
def __str__(self: Literal[str]) -> Literal[str]: ...
|
||||
@overload
|
||||
def __str__(self) -> str: ...
|
||||
|
||||
|
||||
Appendix D: Guidelines for using ``Literal[str]`` in Stubs
|
||||
==========================================================
|
||||
|
||||
Libraries that do not contain type annotations within their source may
|
||||
specify type stubs in Typeshed. Libraries written in other languages,
|
||||
such as those for machine learning, may also provide Python type
|
||||
stubs. This means the type checker cannot verify that the type
|
||||
annotations match the source code and must trust the type stub. Thus,
|
||||
authors of type stubs need to be careful when using ``Literal[str]``
|
||||
since a function may falsely appear to be safe when it is not.
|
||||
|
||||
We recommend the following guidelines for using ``Literal[str]`` in stubs:
|
||||
|
||||
+ If the stub is for a function, we recommend using ``Literal[str]``
|
||||
in the return type of the function or of its overloads only if all
|
||||
the corresponding arguments have literal types (i.e.,
|
||||
``Literal[str]`` or ``Literal["a", "b"]``).
|
||||
|
||||
::
|
||||
|
||||
# OK
|
||||
@overload
|
||||
def my_transform(x: Literal[str], y: Literal["a", "b"]) -> Literal[str]: ...
|
||||
@overload
|
||||
def my_transform(x: str, y: str) -> str: ...
|
||||
|
||||
# Not OK
|
||||
@overload
|
||||
def my_transform(x: Literal[str], y: str) -> Literal[str]: ...
|
||||
@overload
|
||||
def my_transform(x: str, y: str) -> str: ...
|
||||
|
||||
+ If the stub is for a ``staticmethod``, we recommend the same
|
||||
guideline as above.
|
||||
|
||||
+ If the stub is for any other kind of method, we recommend against
|
||||
using ``Literal[str]`` in the return type of the method or any of
|
||||
its overloads. This is because, even if all the explicit arguments
|
||||
have type ``Literal[str]``, the object itself may be created using
|
||||
user data and thus the return type may be user-controlled.
|
||||
|
||||
+ If the stub is for a class attribute or global variable, we also
|
||||
recommend against using ``Literal[str]`` because the untyped code
|
||||
may write arbitrary values to the attribute.
|
||||
|
||||
However, we leave the final call to the library author. They may use
|
||||
``Literal[str]`` if they feel confident that the string returned by
|
||||
the method or function or the string stored in the attribute is
|
||||
guaranteed to have a literal type - i.e., the string is created by
|
||||
applying only literal-preserving ``str`` operations to a string
|
||||
literal.
|
||||
|
||||
Note that these guidelines do not apply to inline type annotations
|
||||
since the type checker can verify that, say, a method returning
|
||||
``Literal[str]`` does in fact return an expression of that type.
|
||||
|
||||
|
||||
Resources
|
||||
=========
|
||||
|
||||
|
@ -936,7 +1228,8 @@ Thanks
|
|||
|
||||
Thanks to the following people for their feedback on the PEP:
|
||||
|
||||
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, and Shengye Wan
|
||||
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев,
|
||||
CAM Gerlach, and Shengye Wan
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue