From 0cb0066fbadeb458235d6c49fe7d82fe04a3dcba Mon Sep 17 00:00:00 2001 From: Jelle Zijlstra Date: Sun, 24 Apr 2022 20:38:57 -0700 Subject: [PATCH] PEP 688: Making the buffer protocol accessible in Python (#2549) --- .github/CODEOWNERS | 1 + pep-0688.rst | 275 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 276 insertions(+) create mode 100644 pep-0688.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 5ac9e42e6..3662785b5 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -568,6 +568,7 @@ pep-0684.rst @ericsnowcurrently pep-0685.rst @brettcannon pep-0686.rst @methane pep-0687.rst @encukou +pep-0688.rst @jellezijlstra # ... # pep-0754.txt # ... diff --git a/pep-0688.rst b/pep-0688.rst new file mode 100644 index 000000000..7544f0331 --- /dev/null +++ b/pep-0688.rst @@ -0,0 +1,275 @@ +PEP: 688 +Title: Making the buffer protocol accessible in Python +Author: Jelle Zijlstra +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 23-Apr-2022 +Python-Version: 3.12 + + +Abstract +======== + +This PEP proposes a mechanism for Python code to inspect whether a +type supports the C-level buffer protocol. This allows type +checkers to evaluate whether objects implement the protocol. + + +Motivation +========== + +The CPython C API provides a versatile mechanism for accessing the +underlying memory of an object—the buffer protocol from :pep:`3118`. +Functions that accept binary data are usually written to handle any +object implementing the buffer protocol. For example, at the time of writing, +there are around 130 functions in CPython using the Argument Clinic +``Py_buffer`` type, which accepts the buffer protocol. + +Currently, there is no way for Python code to inspect whether an object +supports the buffer protocol. Moreover, the static type system +does not provide a type annotation to represent the protocol. +This is a `common problem `__ +when writing type annotations for code that accepts generic buffers. + + +Rationale +========= + +Current options +--------------- + +There are two current workarounds for annotating buffer types in +the type system, but neither is adequate. + +First, the `current workaround `__ +for buffer types in typeshed is a type alias +that lists well-known buffer types in the standard library, such as +``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This +approach works for the standard library, but it does not extend to +third-party buffer types. + +Second, the `documentation `__ +for ``typing.ByteString`` currently states: + + This type represents the types ``bytes``, ``bytearray``, and + ``memoryview`` of byte sequences. + + As a shorthand for this type, ``bytes`` can be used to annotate + arguments of any of the types mentioned above. + +Although this sentence has been in the documentation +`since 2015 `__, +the use of ``bytes`` to include these other types is not specified +in any of the typing PEPs. Furthermore, this mechanism has a number of +problems. It does not include all possible buffer types, and it +makes the ``bytes`` type ambiguous in type annotations. After all, +there are many operations that are valid on ``bytes`` objects, but +not on ``memoryview`` objects, and it is perfectly possible for +a function to accept ``bytes`` but not ``memoryview`` objects. +A mypy user +`reports `__ +that this shortcut has caused significant problems for the ``psycopg`` project. + +Kinds of buffers +---------------- + +The C buffer protocol supports +`many options `__, +affecting strides, contiguity, and support for writing to the buffer. Some of these +options would be useful in the type system. For example, typeshed +currently provides separate type aliases for writable and read-only +buffers. + +However, in the C buffer protocol, these options cannot be +queried directly on the type object. The only way to figure out +whether an object supports a writable buffer is to actually +ask for the buffer. For some types, such as ``memoryview``, +whether the buffer is writable depends on the instance: +some instances are read-only and others are not. As such, we propose to +expose only whether a type implements the buffer protocol at +all, not whether it supports more specific options such as +writable buffers. + +Specification +============= + +types.Buffer +------------ + +A new class, ``types.Buffer``, will be added. It cannot be instantiated or +subclassed at runtime, but supports the ``__instancecheck__`` and +``__subclasscheck__`` hooks. In CPython, these will check for the presence of the +``bf_getbuffer`` slot in the type object: + +.. code-block:: pycon + + >>> from types import Buffer + >>> isinstance(b"xy", Buffer) + True + >>> issubclass(bytes, Buffer) + True + >>> issubclass(memoryview, Buffer) + True + >>> isinstance("xy", Buffer) + False + >>> issubclass(str, Buffer) + False + +The new class can also be used in type annotations: + +.. code-block:: python + + def need_buffer(b: Buffer) -> memoryview: + return memoryview(b) + + need_buffer(b"xy") # ok + need_buffer("xy") # rejected by static type checkers + +Usage in stub files +------------------- + +For static typing purposes, types defined in C extensions usually +require stub files, as :pep:`described in PEP 484 <484#stub-files>`. +In stub files, ``types.Buffer`` may be used as a base class to +indicate that a class implements the buffer protocol. + +For example, ``memoryview`` may be declared as follows in a stub: + +.. code-block:: python + + class memoryview(types.Buffer, Sized, Sequence[int]): + ... + +The ``types.Buffer`` class does not require any special treatment +by type checkers. + +Equivalent for older Python versions +------------------------------------ + +New typing features are usually backported to older Python versions +in the `typing_extensions `_ +package. Because the buffer protocol +is accessible only in C, ``types.Buffer`` cannot be implemented +in a pure-Python package like ``typing_extensions``. As a temporary +workaround, a ``typing_extensions.Buffer`` +`abstract base class `__ will be provided for Python versions +that do not have ``types.Buffer`` available. + +For the benefit of +static type checkers, ``typing_extensions.Buffer`` can be used as +a base class in stubs to mark types as supporting the buffer protocol. +For runtime uses, the ``ABC.register`` API can be used to register +buffer classes with ``typing_extensions.Buffer``. + +When ``types.Buffer`` is available, ``typing_extensions`` should simply +re-export it. Thus, users who register their buffer class manually +with ``typing_extensions.Buffer.register`` should use a guard to make +sure their code continues to work once ``types.Buffer`` is in the +standard library. + + +No special meaning for ``bytes`` +-------------------------------- + +The special case stating that ``bytes`` may be used as a shorthand +for other ``ByteString`` types will be removed from the ``typing`` +documentation. +With ``types.Buffer`` available as an alternative, there will be no good +reason to allow ``bytes`` as a shorthand. +We suggest that type checkers currently implementing this behavior +should deprecate and eventually remove it. + + +Backwards Compatibility +======================= + +As the runtime changes in this PEP only add a new class, there are +no backwards compatibility concerns. + +However, the recommendation to remove the special behavior for +``bytes`` in type checkers does have a backwards compatibility +impact on their users. An `experiment `__ +with mypy shows that several major open source projects that use it +for type checking will see new errors if the ``bytes`` promotion +is removed. Many of these errors can be fixed by improving +the stubs in typeshed, as has already been done for the +`builtins `__, +`binascii `__, +`pickle `__, and +`re `__ modules. +Overall, the change improves type safety and makes the type system +more consistent, so we believe the migration cost is worth it. + + +How to Teach This +================= + +We will add notes pointing to ``types.Buffer`` in appropriate places in the +documentation, such as `typing.readthedocs.io `__ +and the `mypy cheat sheet `__. +Type checkers may provide additional pointers in their error messages. For example, +when they encounter a buffer object being passed to a function that +is annotated to only accept ``bytes``, the error message could include a note suggesting +the use of ``types.Buffer`` instead. + + +Reference Implementation +======================== + +An implementation of ``types.Buffer`` is +`available `__ +in the author's fork. + + +Rejected Ideas +============== + +Buffer ABC +---------- + +An `earlier proposal `__ suggested +adding a ``collections.abc.Buffer`` +`abstract base class `__ +to represent buffer objects. This idea +stalled because an ABC with no methods does not fit well into the ``collections.abc`` +module. Furthermore, it required manual registration of buffer classes, including +those in the standard library. This PEP's approach of using the ``__instancecheck__`` +hook is more natural and does not require explicit registration. + +Nevertheless, the ABC proposal has the advantage that it does not require C changes. +This PEP proposes to adopt a version of it in the third-party ``typing_extensions`` +package for the benefit of users of older Python versions. + +Keep ``bytearray`` compatible with ``bytes`` +-------------------------------------------- + +It has been suggested to remove the special case where ``memoryview`` is +always compatible with ``bytes``, but keep it for ``bytearray``, because +the two types have very similar interfaces. However, several standard +library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept +``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also +not a very common type. We prefer to have users spell out accepted types +explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of +methods is required). + + +Open Issues +=========== + +Read-only and writable buffers +------------------------------ + +To avoid making changes to the buffer protocol itself, this PEP currently +does not provide a way to distinguish between read-only and writable buffers. +That's unfortunate, because some APIs require a writable buffer, and one of +the most common buffer types (``bytes``) is always read-only. +Should we add a new mechanism in C to declare that a type implementing the +buffer protocol is potentially writable? + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.