PEP 688: Making the buffer protocol accessible in Python (#2549)

This commit is contained in:
Jelle Zijlstra 2022-04-24 20:38:57 -07:00 committed by GitHub
parent 24dec4419d
commit 0cb0066fba
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 276 additions and 0 deletions

1
.github/CODEOWNERS vendored
View File

@ -568,6 +568,7 @@ pep-0684.rst @ericsnowcurrently
pep-0685.rst @brettcannon
pep-0686.rst @methane
pep-0687.rst @encukou
pep-0688.rst @jellezijlstra
# ...
# pep-0754.txt
# ...

275
pep-0688.rst Normal file
View File

@ -0,0 +1,275 @@
PEP: 688
Title: Making the buffer protocol accessible in Python
Author: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Apr-2022
Python-Version: 3.12
Abstract
========
This PEP proposes a mechanism for Python code to inspect whether a
type supports the C-level buffer protocol. This allows type
checkers to evaluate whether objects implement the protocol.
Motivation
==========
The CPython C API provides a versatile mechanism for accessing the
underlying memory of an object—the buffer protocol from :pep:`3118`.
Functions that accept binary data are usually written to handle any
object implementing the buffer protocol. For example, at the time of writing,
there are around 130 functions in CPython using the Argument Clinic
``Py_buffer`` type, which accepts the buffer protocol.
Currently, there is no way for Python code to inspect whether an object
supports the buffer protocol. Moreover, the static type system
does not provide a type annotation to represent the protocol.
This is a `common problem <https://github.com/python/typing/issues/593>`__
when writing type annotations for code that accepts generic buffers.
Rationale
=========
Current options
---------------
There are two current workarounds for annotating buffer types in
the type system, but neither is adequate.
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
for buffer types in typeshed is a type alias
that lists well-known buffer types in the standard library, such as
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
approach works for the standard library, but it does not extend to
third-party buffer types.
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
for ``typing.ByteString`` currently states:
This type represents the types ``bytes``, ``bytearray``, and
``memoryview`` of byte sequences.
As a shorthand for this type, ``bytes`` can be used to annotate
arguments of any of the types mentioned above.
Although this sentence has been in the documentation
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
the use of ``bytes`` to include these other types is not specified
in any of the typing PEPs. Furthermore, this mechanism has a number of
problems. It does not include all possible buffer types, and it
makes the ``bytes`` type ambiguous in type annotations. After all,
there are many operations that are valid on ``bytes`` objects, but
not on ``memoryview`` objects, and it is perfectly possible for
a function to accept ``bytes`` but not ``memoryview`` objects.
A mypy user
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
that this shortcut has caused significant problems for the ``psycopg`` project.
Kinds of buffers
----------------
The C buffer protocol supports
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
affecting strides, contiguity, and support for writing to the buffer. Some of these
options would be useful in the type system. For example, typeshed
currently provides separate type aliases for writable and read-only
buffers.
However, in the C buffer protocol, these options cannot be
queried directly on the type object. The only way to figure out
whether an object supports a writable buffer is to actually
ask for the buffer. For some types, such as ``memoryview``,
whether the buffer is writable depends on the instance:
some instances are read-only and others are not. As such, we propose to
expose only whether a type implements the buffer protocol at
all, not whether it supports more specific options such as
writable buffers.
Specification
=============
types.Buffer
------------
A new class, ``types.Buffer``, will be added. It cannot be instantiated or
subclassed at runtime, but supports the ``__instancecheck__`` and
``__subclasscheck__`` hooks. In CPython, these will check for the presence of the
``bf_getbuffer`` slot in the type object:
.. code-block:: pycon
>>> from types import Buffer
>>> isinstance(b"xy", Buffer)
True
>>> issubclass(bytes, Buffer)
True
>>> issubclass(memoryview, Buffer)
True
>>> isinstance("xy", Buffer)
False
>>> issubclass(str, Buffer)
False
The new class can also be used in type annotations:
.. code-block:: python
def need_buffer(b: Buffer) -> memoryview:
return memoryview(b)
need_buffer(b"xy") # ok
need_buffer("xy") # rejected by static type checkers
Usage in stub files
-------------------
For static typing purposes, types defined in C extensions usually
require stub files, as :pep:`described in PEP 484 <484#stub-files>`.
In stub files, ``types.Buffer`` may be used as a base class to
indicate that a class implements the buffer protocol.
For example, ``memoryview`` may be declared as follows in a stub:
.. code-block:: python
class memoryview(types.Buffer, Sized, Sequence[int]):
...
The ``types.Buffer`` class does not require any special treatment
by type checkers.
Equivalent for older Python versions
------------------------------------
New typing features are usually backported to older Python versions
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_
package. Because the buffer protocol
is accessible only in C, ``types.Buffer`` cannot be implemented
in a pure-Python package like ``typing_extensions``. As a temporary
workaround, a ``typing_extensions.Buffer``
`abstract base class <Buffer ABC_>`__ will be provided for Python versions
that do not have ``types.Buffer`` available.
For the benefit of
static type checkers, ``typing_extensions.Buffer`` can be used as
a base class in stubs to mark types as supporting the buffer protocol.
For runtime uses, the ``ABC.register`` API can be used to register
buffer classes with ``typing_extensions.Buffer``.
When ``types.Buffer`` is available, ``typing_extensions`` should simply
re-export it. Thus, users who register their buffer class manually
with ``typing_extensions.Buffer.register`` should use a guard to make
sure their code continues to work once ``types.Buffer`` is in the
standard library.
No special meaning for ``bytes``
--------------------------------
The special case stating that ``bytes`` may be used as a shorthand
for other ``ByteString`` types will be removed from the ``typing``
documentation.
With ``types.Buffer`` available as an alternative, there will be no good
reason to allow ``bytes`` as a shorthand.
We suggest that type checkers currently implementing this behavior
should deprecate and eventually remove it.
Backwards Compatibility
=======================
As the runtime changes in this PEP only add a new class, there are
no backwards compatibility concerns.
However, the recommendation to remove the special behavior for
``bytes`` in type checkers does have a backwards compatibility
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__
with mypy shows that several major open source projects that use it
for type checking will see new errors if the ``bytes`` promotion
is removed. Many of these errors can be fixed by improving
the stubs in typeshed, as has already been done for the
`builtins <https://github.com/python/typeshed/pull/7631>`__,
`binascii <https://github.com/python/typeshed/pull/7677>`__,
`pickle <https://github.com/python/typeshed/pull/7678>`__, and
`re <https://github.com/python/typeshed/pull/7679>`__ modules.
Overall, the change improves type safety and makes the type system
more consistent, so we believe the migration cost is worth it.
How to Teach This
=================
We will add notes pointing to ``types.Buffer`` in appropriate places in the
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
Type checkers may provide additional pointers in their error messages. For example,
when they encounter a buffer object being passed to a function that
is annotated to only accept ``bytes``, the error message could include a note suggesting
the use of ``types.Buffer`` instead.
Reference Implementation
========================
An implementation of ``types.Buffer`` is
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:typesbuffer?expand=1>`__
in the author's fork.
Rejected Ideas
==============
Buffer ABC
----------
An `earlier proposal <https://github.com/python/cpython/issues/71688>`__ suggested
adding a ``collections.abc.Buffer``
`abstract base class <https://docs.python.org/3/glossary.html#term-abstract-base-class>`__
to represent buffer objects. This idea
stalled because an ABC with no methods does not fit well into the ``collections.abc``
module. Furthermore, it required manual registration of buffer classes, including
those in the standard library. This PEP's approach of using the ``__instancecheck__``
hook is more natural and does not require explicit registration.
Nevertheless, the ABC proposal has the advantage that it does not require C changes.
This PEP proposes to adopt a version of it in the third-party ``typing_extensions``
package for the benefit of users of older Python versions.
Keep ``bytearray`` compatible with ``bytes``
--------------------------------------------
It has been suggested to remove the special case where ``memoryview`` is
always compatible with ``bytes``, but keep it for ``bytearray``, because
the two types have very similar interfaces. However, several standard
library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also
not a very common type. We prefer to have users spell out accepted types
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of
methods is required).
Open Issues
===========
Read-only and writable buffers
------------------------------
To avoid making changes to the buffer protocol itself, this PEP currently
does not provide a way to distinguish between read-only and writable buffers.
That's unfortunate, because some APIs require a writable buffer, and one of
the most common buffer types (``bytes``) is always read-only.
Should we add a new mechanism in C to declare that a type implementing the
buffer protocol is potentially writable?
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.