PEP 675: Updates (#2282)

This commit is contained in:
Pradeep Kumar 2022-01-31 19:20:47 -08:00 committed by GitHub
parent e43f567e93
commit 6f11197188
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 329 additions and 36 deletions

View File

@ -82,7 +82,7 @@ the AST or by other semantic pattern-matching. These tools, however,
preclude common idioms like storing a large multi-line query in a
variable before executing it, adding literal string modifiers to the
query based on some conditions, or transforming the query string using
a function. (We survey existing tools in the "Rejected Alternatives"
a function. (We survey existing tools in the `Rejected Alternatives`_
section.) For example, many tools will detect a false positive issue
in this benign snippet:
@ -112,7 +112,7 @@ generalization of the ``Literal["foo"]`` type from :pep:`586`.
A string of type
``Literal[str]`` cannot contain user-controlled data. Thus, any API
that only accepts ``Literal[str]`` will be immune to injection
vulnerabilities (with pragmatic `limitations <Appendix B:
vulnerabilities (with `pragmatic limitations <Appendix B:
Limitations_>`_).
Since we want the ``sqlite3`` ``execute`` method to disallow strings
@ -202,9 +202,9 @@ heuristics, such as regex-filtering for obviously malicious payloads,
there will always be a way to work around them (perfectly
distinguishing good and bad queries reduces to the halting problem).
Static approaches like checking the AST to see if the query string is
a literal string expression cannot tell when a string is assigned to
an intermediate variable or when it is transformed by a benign
Static approaches, such as checking the AST to see if the query string
is a literal string expression, cannot tell when a string is assigned
to an intermediate variable or when it is transformed by a benign
function. This makes them overly restrictive.
The type checker, surprisingly, does better than both because it has
@ -300,6 +300,7 @@ if they evaluate to the same value (``str``), such as
Type Inference
==============
.. _inferring_literal_str:
Inferring ``Literal[str]``
--------------------------
@ -327,6 +328,10 @@ following cases:
has type ``Literal[str]`` if and only if ``s`` and the arguments have
types compatible with ``Literal[str]``.
+ Literal-preserving methods: In `Appendix C <appendix_C_>`_, we have
provided an exhaustive list of ``str`` methods that preserve the
``Literal[str]`` type.
In all other cases, if one or more of the composed values has a
non-literal type ``str``, the composition of types will have type
``str``. For example, if ``s`` has type ``str``, then ``"hello" + s``
@ -337,10 +342,6 @@ checkers.
methods from ``str``. So, if we have a variable ``s`` of type
``Literal[str]``, it is safe to write ``s.startswith("hello")``.
Note that, beyond the few composition rules mentioned above, this PEP
doesn't change inference for other ``str`` methods such as
``literal_string.upper()``.
Some type checkers refine the type of a string when doing an equality
check:
@ -366,7 +367,7 @@ See the examples below to help clarify the above rules:
s: str = literal_string # OK
literal_string: Literal[str] = s # Error: Expected Literal[str], got str.
literal_string: Literal[str] = "hello" # OK
literal_string: Literal[str] = "hello" # OK
def expect_literal_str(s: Literal[str]) -> None: ...
@ -577,11 +578,10 @@ Rejected Alternatives
Why not use tool X?
-------------------
Focusing solely on the example of preventing SQL injection, tooling to
catch this kind of issue seems to come in three flavors: AST based,
function level analysis, and taint flow analysis.
Tools to catch issues such as SQL injection seem to come in three
flavors: AST based, function level analysis, and taint flow analysis.
**AST based tools include Bandit**: `Bandit
**AST-based tools**: `Bandit
<https://github.com/PyCQA/bandit/blob/aac3f16f45648a7756727286ba8f8f0cf5e7d408/bandit/plugins/django_sql_injection.py#L102>`_
has a plugin to warn when SQL queries are not literal
strings. The problem is that many perfectly safe SQL
@ -630,7 +630,7 @@ handles it with no burden on the programmer:
# Example usage
data_to_insert = {
"column_1": value_1, # Note: values are not literals
"column_1": value_1, # Note: values are not literals
"column_2": value_2,
"column_3": value_3,
}
@ -650,6 +650,14 @@ on to library users instead of allowing the libraries themselves to
specify precisely how their APIs must be called (as is possible with
``Literal[str]``).
One final reason to prefer using a new type over a dedicated tool is
that type checkers are more widely used than dedicated security
tooling; for example, MyPy was downloaded `over 7 million times
<https://www.pypistats.org/packages/mypy>`_ in Jan 2022 vs `less than
2 million times <https://www.pypistats.org/packages/bandit>`_ for
Bandit. Having security protections built right into type checkers
will mean that more developers benefit from them.
Why not use a ``NewType`` for ``str``?
--------------------------------------
@ -748,27 +756,8 @@ The implementation simply extends the type checker with
``Literal[str]`` as a supertype of literal string types.
To support composition via addition, join, etc., it was sufficient to
overload the stubs for ``str`` in Pyre's copy of typeshed. For
example, we replaced ``str`` ``__add__``:
overload the stubs for ``str`` in Pyre's copy of typeshed.
::
# Before:
def __add__(self, s: str) -> str: ...
# After:
@overload
def __add__(self: Literal[str], other: Literal[str]) -> Literal[str]: ...
@overload
def __add__(self, other: str) -> str: ...
This means that addition of non-literal string types remains to have
type ``str``. The only change is that addition of literal string types
now produces ``Literal[str]``.
One implementation strategy is to update the official Typeshed `stub
<https://github.com/python/typeshed/blob/aa7e277adb9049e24ea3434fc9848defbfa87673/stdlib/builtins.pyi#L420>`_
for ``str`` with these changes.
Appendix A: Other Uses
======================
@ -868,6 +857,40 @@ the ``Template`` API to only accept ``Literal[str]``:
def __init__(self, source: Literal[str]): ...
Logging Format String Injection
-------------------------------
Logging frameworks often allow their input strings to contain
formatting directives. At its worst, allowing users to control the
logged string has led to `CVE-2021-44228
<https://nvd.nist.gov/vuln/detail/CVE-2021-44228>`_ (colloquially
known as ``log4shell``), which has been described as the `"most
critical vulnerability of the last decade"
<https://www.theguardian.com/technology/2021/dec/10/software-flaw-most-critical-vulnerability-log-4-shell>`_.
While no Python frameworks are currently known to be vulnerable to a
similar attack, the built-in logging framework does provide formatting
options which are vulnerable to Denial of Service attacks from
externally controlled logging strings. The following example
illustrates a simple denial of service scenario:
::
external_string = "%(foo)999999999s"
...
# Tries to add > 1GB of whitespace to the logged string:
logger.info(f'Received: {external_string}', some_dict)
This kind of attack could be prevented by requiring that the format
string passed to the logger be a ``Literal[str]`` and that all
externally controlled data be passed separately as arguments (as
proposed in `Issue 46200 <https://bugs.python.org/issue46200>`_):
::
def info(msg: Literal[str], *args: object) -> None:
...
Appendix B: Limitations
=======================
@ -913,6 +936,275 @@ is documentation, which is easily ignored and often not seen. With
``Literal[str]``, API misuse requires conscious thought and artifacts
in the code that reviewers and future developers can notice.
.. _appendix_C:
Appendix C: ``str`` methods that preserve ``Literal[str]``
==========================================================
The ``str`` class has several methods that would benefit from
``Literal[str]``. For example, users might expect
``"hello".capitalize()`` to have the type ``Literal[str]`` similar to
the other examples we have seen in the `Inferring Literal[str]
<inferring_literal_str>`_ section. Inferring the type ``Literal[str]``
is correct because the string is not an arbitrary user-supplied string
- we know that it has the type ``Literal["HELLO"]``, which is
compatible with ``Literal[str]``. In other words, the ``capitalize``
method preserves the ``Literal[str]`` type. There are several other
``str`` methods that preserve ``Literal[str]``.
We propose updating the stub for ``str`` in typeshed so that the
methods are overloaded with the ``Literal[str]``-preserving
versions. This means type checkers do not have to hardcode
``Literal[str]`` behavior for each method. It also lets us easily
support new methods in the future by updating the typeshed stub.
For example, to preserve literal types for the ``capitalize`` method,
we would change the stub as below:
::
# before
def capitalize(self) -> str: ...
# after
@overload
def capitalize(self: Literal[str]) -> Literal[str]: ...
@overload
def capitalize(self) -> str: ...
The downside of changing the ``str`` stub is that the stub becomes
more complicated and can make error messages harder to
understand. Type checkers may need to special-case ``str`` to make
error messages understandable for users.
Below is an exhaustive list of ``str`` methods which, when called as
indicated with arguments of type ``Literal[str]``, must be treated as
returning a ``Literal[str]``. If this PEP is accepted, we will update
these method signatures in typeshed:
::
@overload
def capitalize(self: Literal[str]) -> Literal[str]: ...
@overload
def capitalize(self) -> str: ...
@overload
def casefold(self: Literal[str]) -> Literal[str]: ...
@overload
def casefold(self) -> str: ...
@overload
def center(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
@overload
def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
if sys.version_info >= (3, 8):
@overload
def expandtabs(self: Literal[str], tabsize: SupportsIndex = ...) -> Literal[str]: ...
@overload
def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ...
else:
@overload
def expandtabs(self: Literal[str], tabsize: int = ...) -> Literal[str]: ...
@overload
def expandtabs(self, tabsize: int = ...) -> str: ...
@overload
def format(self: Literal[str], *args: Literal[str], **kwargs: Literal[str]) -> Literal[str]: ...
@overload
def format(self, *args: str, **kwargs: str) -> str: ...
@overload
def join(self: Literal[str], __iterable: Iterable[Literal[str]]) -> Literal[str]: ...
@overload
def join(self, __iterable: Iterable[str]) -> str: ...
@overload
def ljust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
@overload
def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
@overload
def lower(self: Literal[str]) -> Literal[str]: ...
@overload
def lower(self) -> Literal[str]: ...
@overload
def lstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
@overload
def lstrip(self, __chars: str | None = ...) -> str: ...
@overload
def partition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...
@overload
def partition(self, __sep: str) -> tuple[str, str, str]: ...
@overload
def replace(self: Literal[str], __old: Literal[str], __new: Literal[str], __count: SupportsIndex = ...) -> Literal[str]: ...
@overload
def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ...
if sys.version_info >= (3, 9):
@overload
def removeprefix(self: Literal[str], __prefix: Literal[str]) -> Literal[str]: ...
@overload
def removeprefix(self, __prefix: str) -> str: ...
@overload
def removesuffix(self: Literal[str], __suffix: Literal[str]) -> Literal[str]: ...
@overload
def removesuffix(self, __suffix: str) -> str: ...
@overload
def rjust(self: Literal[str], __width: SupportsIndex, __fillchar: Literal[str] = ...) -> Literal[str]: ...
@overload
def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
@overload
def rpartition(self: Literal[str], __sep: Literal[str]) -> tuple[Literal[str], Literal[str], Literal[str]]: ...
@overload
def rpartition(self, __sep: str) -> tuple[str, str, str]: ...
@overload
def rsplit(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...
@overload
def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
@overload
def rstrip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
@overload
def rstrip(self, __chars: str | None = ...) -> str: ...
@overload
def split(self: Literal[str], sep: Literal[str] | None = ..., maxsplit: SupportsIndex = ...) -> list[Literal[str]]: ...
@overload
def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
@overload
def splitlines(self: Literal[str], keepends: bool = ...) -> list[Literal[str]]: ...
@overload
def splitlines(self, keepends: bool = ...) -> list[str]: ...
@overload
def strip(self: Literal[str], __chars: Literal[str] | None = ...) -> Literal[str]: ...
@overload
def strip(self, __chars: str | None = ...) -> str: ...
@overload
def swapcase(self: Literal[str]) -> Literal[str]: ...
@overload
def swapcase(self) -> str: ...
@overload
def title(self: Literal[str]) -> Literal[str]: ...
@overload
def title(self) -> str: ...
@overload
def upper(self: Literal[str]) -> Literal[str]: ...
@overload
def upper(self) -> str: ...
@overload
def zfill(self: Literal[str], __width: SupportsIndex) -> Literal[str]: ...
@overload
def zfill(self, __width: SupportsIndex) -> str: ...
@overload
def __add__(self: Literal[str], __s: Literal[str]) -> Literal[str]: ...
@overload
def __add__(self, __s: str) -> str: ...
@overload
def __iter__(self: Literal[str]) -> Iterator[str]: ...
@overload
def __iter__(self) -> Iterator[str]: ...
@overload
def __mod__(self: Literal[str], __x: Union[Literal[str], Tuple[Literal[str], ...]]) -> str: ...
@overload
def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ...
@overload
def __mul__(self: Literal[str], __n: SupportsIndex) -> Literal[str]: ...
@overload
def __mul__(self, __n: SupportsIndex) -> str: ...
@overload
def __repr__(self: Literal[str]) -> Literal[str]: ...
@overload
def __repr__(self) -> str: ...
@overload
def __rmul__(self: Literal[str], n: SupportsIndex) -> Literal[str]: ...
@overload
def __rmul__(self, n: SupportsIndex) -> str: ...
@overload
def __str__(self: Literal[str]) -> Literal[str]: ...
@overload
def __str__(self) -> str: ...
Appendix D: Guidelines for using ``Literal[str]`` in Stubs
==========================================================
Libraries that do not contain type annotations within their source may
specify type stubs in Typeshed. Libraries written in other languages,
such as those for machine learning, may also provide Python type
stubs. This means the type checker cannot verify that the type
annotations match the source code and must trust the type stub. Thus,
authors of type stubs need to be careful when using ``Literal[str]``
since a function may falsely appear to be safe when it is not.
We recommend the following guidelines for using ``Literal[str]`` in stubs:
+ If the stub is for a function, we recommend using ``Literal[str]``
in the return type of the function or of its overloads only if all
the corresponding arguments have literal types (i.e.,
``Literal[str]`` or ``Literal["a", "b"]``).
::
# OK
@overload
def my_transform(x: Literal[str], y: Literal["a", "b"]) -> Literal[str]: ...
@overload
def my_transform(x: str, y: str) -> str: ...
# Not OK
@overload
def my_transform(x: Literal[str], y: str) -> Literal[str]: ...
@overload
def my_transform(x: str, y: str) -> str: ...
+ If the stub is for a ``staticmethod``, we recommend the same
guideline as above.
+ If the stub is for any other kind of method, we recommend against
using ``Literal[str]`` in the return type of the method or any of
its overloads. This is because, even if all the explicit arguments
have type ``Literal[str]``, the object itself may be created using
user data and thus the return type may be user-controlled.
+ If the stub is for a class attribute or global variable, we also
recommend against using ``Literal[str]`` because the untyped code
may write arbitrary values to the attribute.
However, we leave the final call to the library author. They may use
``Literal[str]`` if they feel confident that the string returned by
the method or function or the string stored in the attribute is
guaranteed to have a literal type - i.e., the string is created by
applying only literal-preserving ``str`` operations to a string
literal.
Note that these guidelines do not apply to inline type annotations
since the type checker can verify that, say, a method returning
``Literal[str]`` does in fact return an expression of that type.
Resources
=========
@ -936,7 +1228,8 @@ Thanks
Thanks to the following people for their feedback on the PEP:
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, and Shengye Wan
Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев,
CAM Gerlach, and Shengye Wan
Copyright
=========