python-peps/pep-0688.rst

360 lines
14 KiB
ReStructuredText

PEP: 688
Title: Making the buffer protocol accessible in Python
Author: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Discussions-To: https://discuss.python.org/t/15265
Status: Draft
Type: Standards Track
Topic: Typing
Content-Type: text/x-rst
Created: 23-Apr-2022
Python-Version: 3.12
Post-History: `23-Apr-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/CX7GPSIYQEL23RXMYL66GAKGP4RLUD7P/>`__,
`25-Apr-2022 <https://discuss.python.org/t/15265>`__
Abstract
========
This PEP proposes a Python-level API for the buffer protocol,
which is currently accessible only to C code. This allows type
checkers to evaluate whether objects implement the protocol.
Motivation
==========
The CPython C API provides a versatile mechanism for accessing the
underlying memory of an object—the `buffer protocol <https://docs.python.org/3/c-api/buffer.html>`__
introduced in :pep:`3118`.
Functions that accept binary data are usually written to handle any
object implementing the buffer protocol. For example, at the time of writing,
there are around 130 functions in CPython using the Argument Clinic
``Py_buffer`` type, which accepts the buffer protocol.
Currently, there is no way for Python code to inspect whether an object
supports the buffer protocol. Moreover, the static type system
does not provide a type annotation to represent the protocol.
This is a `common problem <https://github.com/python/typing/issues/593>`__
when writing type annotations for code that accepts generic buffers.
Similarly, it is impossible for a class written in Python to support
the buffer protocol. A buffer class in
Python would give users the ability to easily wrap a C buffer object, or to test
the behavior of an API that consumes the buffer protocol. Granted, this is not
a particularly common need. However, there has been a
`CPython feature request <https://github.com/python/cpython/issues/58006>`__
for supporting buffer classes written in Python that has been open since 2012.
Rationale
=========
Current options
---------------
There are two known workarounds for annotating buffer types in
the type system, but neither is adequate.
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
for buffer types in typeshed is a type alias
that lists well-known buffer types in the standard library, such as
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
approach works for the standard library, but it does not extend to
third-party buffer types.
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
for ``typing.ByteString`` currently states:
This type represents the types ``bytes``, ``bytearray``, and
``memoryview`` of byte sequences.
As a shorthand for this type, ``bytes`` can be used to annotate
arguments of any of the types mentioned above.
Although this sentence has been in the documentation
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
the use of ``bytes`` to include these other types is not specified
in any of the typing PEPs. Furthermore, this mechanism has a number of
problems. It does not include all possible buffer types, and it
makes the ``bytes`` type ambiguous in type annotations. After all,
there are many operations that are valid on ``bytes`` objects, but
not on ``memoryview`` objects, and it is perfectly possible for
a function to accept ``bytes`` but not ``memoryview`` objects.
A mypy user
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
that this shortcut has caused significant problems for the ``psycopg`` project.
Kinds of buffers
----------------
The C buffer protocol supports
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
affecting strides, contiguity, and support for writing to the buffer. Some of these
options would be useful in the type system. For example, typeshed
currently provides separate type aliases for writable and read-only
buffers.
However, in the C buffer protocol, most of these options cannot be
queried directly on the type object. The only way to figure out
whether an object supports a particular flag is to actually
ask for the buffer. For some types, such as ``memoryview``,
the supported flags depend on the instance. As a result, it would
be difficult to represent support for these flags in the type system.
There is one exception: if a buffer is writable, it must define the
``bf_releasebuffer`` slot, so that the buffer can be released when
a consumer is done writing to it. For read-only buffers, such as
``bytes``, there is generally no need for this slot. Therefore, the
presence of this slot can be used to determine whether a buffer is
writable.
Specification
=============
Python-level buffer protocol
----------------------------
We propose to add two Python-level special methods, ``__buffer__``
and ``__release_buffer__``. Python
classes that implement these methods are usable as buffers from C
code. Conversely, classes implemented in C that support the
buffer protocol acquire synthesized methods accessible from Python
code.
The ``__buffer__`` method corresponds to the ``bf_getbuffer`` C slot,
which is called to create a buffer from a Python object.
The Python signature for this method is
``def __buffer__(self, flags: int, /) -> memoryview: ...``. The method
must return a ``memoryview`` object. If the method is called from C
code, the interpreter extracts the underlying ``Py_buffer`` from the
``memoryview`` and returns it to the C caller. Similarly, if the
``__buffer__`` method is called on an instance of a C class that
implements ``bf_getbuffer``, the returned buffer is wrapped in a
``memoryview`` for consumption by Python code.
The ``__release_buffer__`` method corresponds to the
``bf_releasebuffer`` C slot, which is called when a caller no
longer needs the buffer returned by ``bf_getbuffer``. This is an
optional part of the buffer protocol, and read-only buffers can
generally omit it. The Python signature for this method is
``def __release_buffer__(self, buffer: memoryview, /) -> None: ...``.
The buffer to be released is wrapped in a ``memoryview``. It is
also possible to call ``__release_buffer__`` on a C class that
implements ``bf_releasebuffer``.
inspect.BufferFlags
-------------------
To help implementations of ``__buffer__``, we add ``inspect.BufferFlags``,
a subclass of ``enum.IntFlag``. This enum contains all flags defined in the
C buffer protocol. For example, ``inspect.BufferFlags.SIMPLE`` has the same
value as the ``PyBUF_SIMPLE`` constant.
collections.abc.Buffer and MutableBuffer
----------------------------------------
We add two new abstract base classes, ``collections.abc.Buffer`` and
``collections.abc.MutableBuffer``. The former requires the ``__buffer__``
method; the latter requires both ``__buffer__`` and ``__release_buffer__``.
These classes are intended primarily for use in type annotations:
.. code-block:: python
def need_buffer(b: Buffer) -> memoryview:
return memoryview(b)
need_buffer(b"xy") # ok
need_buffer("xy") # rejected by static type checkers
def need_mutable_buffer(b: MutableBuffer) -> None:
view = memoryview(b)
view[0] = 42
need_mutable_buffer(bytearray(b"xy")) # ok
need_mutable_buffer(b"xy") # rejected by static type checkers
They can also be used in ``isinstance`` and ``issubclass`` checks:
.. code-block:: pycon
>>> from collections.abc import Buffer
>>> isinstance(b"xy", Buffer)
True
>>> issubclass(bytes, Buffer)
True
>>> issubclass(memoryview, Buffer)
True
>>> isinstance("xy", Buffer)
False
>>> issubclass(str, Buffer)
False
In the typeshed stub files, these classes should be defined as ``Protocol``\ s,
following the precedent of other simple ABCs in ``collections.abc`` such as
``collections.abc.Iterable`` or ``collections.abc.Sized``.
Example
-------
The following is an example of a Python class that implements the
buffer protocol:
.. code-block:: python
class MyBuffer:
def __init__(self, data: bytes):
self.data = bytearray(data)
self.view = memoryview(self.data)
self.held = False
def __buffer__(self, flags: int) -> memoryview:
if flags != inspect.BufferFlags.FULL_RO:
raise TypeError("Only BufferFlags.FULL_RO supported")
if self.held:
raise RuntimeError("Buffer already held")
self.held = True
return self.view
def __release_buffer__(self, view: memoryview) -> None:
assert self.view is view
self.held = False
buffer = MyBuffer(b"capybara")
with memoryview(buffer) as view:
view[0] = ord("C")
with memoryview(buffer) as view:
assert view.tobytes() == b"Capybara"
Equivalent for older Python versions
------------------------------------
New typing features are usually backported to older Python versions
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_
package. Because the buffer protocol
is currently accessible only in C, this PEP cannot be fully implemented
in a pure-Python package like ``typing_extensions``. As a temporary
workaround, two abstract base classes, ``typing_extensions.Buffer``
and ``typing_extensions.MutableBuffer``, will be provided for Python versions
that do not have ``collections.abc.Buffer`` available.
After this PEP is implemented, inheriting from ``collections.abc.Buffer`` will
not be necessary to indicate that an object supports the buffer protocol.
However, in older Python versions, it will be necessary to explicitly
inherit from ``typing_extensions.Buffer`` to indicate to type checkers that
a class supports the buffer protocol, since objects supporting the buffer
protocol will not have a ``__buffer__`` method. It is expected that this
will happen primarily in stub files, because buffer classes are necessarily
implemented in C code, which cannot have types defined inline.
For runtime uses, the ``ABC.register`` API can be used to register
buffer classes with ``typing_extensions.Buffer``.
No special meaning for ``bytes``
--------------------------------
The special case stating that ``bytes`` may be used as a shorthand
for other ``ByteString`` types will be removed from the ``typing``
documentation.
With ``types.Buffer`` available as an alternative, there will be no good
reason to allow ``bytes`` as a shorthand.
We suggest that type checkers currently implementing this behavior
should deprecate and eventually remove it.
Backwards Compatibility
=======================
As the runtime changes in this PEP only add new functionality, there are
no backwards compatibility concerns.
However, the recommendation to remove the special behavior for
``bytes`` in type checkers does have a backwards compatibility
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__
with mypy shows that several major open source projects that use it
for type checking will see new errors if the ``bytes`` promotion
is removed. Many of these errors can be fixed by improving
the stubs in typeshed, as has already been done for the
`builtins <https://github.com/python/typeshed/pull/7631>`__,
`binascii <https://github.com/python/typeshed/pull/7677>`__,
`pickle <https://github.com/python/typeshed/pull/7678>`__, and
`re <https://github.com/python/typeshed/pull/7679>`__ modules.
Overall, the change improves type safety and makes the type system
more consistent, so we believe the migration cost is worth it.
How to Teach This
=================
We will add notes pointing to ``collections.abc.Buffer`` in appropriate places in the
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
Type checkers may provide additional pointers in their error messages. For example,
when they encounter a buffer object being passed to a function that
is annotated to only accept ``bytes``, the error message could include a note suggesting
the use of ``collections.abc.Buffer`` instead.
Reference Implementation
========================
An implementation of this PEP is
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:pep688v2?expand=1>`__
in the author's fork.
Rejected Ideas
==============
types.Buffer
------------
An earlier version of this PEP proposed adding a new ``types.Buffer`` type with
an ``__instancecheck__`` implemented in C so that ``isinstance()`` checks can be
used to check whether a type implements the buffer protocol. This avoids the
complexity of exposing the full buffer protocol to Python code, while still
allowing the type system to check for the buffer protocol.
However, that approach
does not compose well with the rest of the type system, because ``types.Buffer``
would be a nominal type, not a structural one. For example, there would be no way
to represent "an object that supports both the buffer protocol and ``__len__``". With
the current proposal, ``__buffer__`` is like any other special method, so a
``Protocol`` can be defined combining it with another method.
More generally, no other part of Python works like the proposed ``types.Buffer``.
The current proposal is more consistent with the rest of the language, where
C-level slots usually have corresponding Python-level special methods.
Keep ``bytearray`` compatible with ``bytes``
--------------------------------------------
It has been suggested to remove the special case where ``memoryview`` is
always compatible with ``bytes``, but keep it for ``bytearray``, because
the two types have very similar interfaces. However, several standard
library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also
not a very common type. We prefer to have users spell out accepted types
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of
methods is required).
Open Issues
===========
Where should the ``Buffer`` and ``MutableBuffer`` ABCs live? This PEP
proposes putting them in ``collections.abc`` like most other ABCs,
but buffers are not "collections" in the original sense of the word.
Alternatively, these classes could be put in ``typing``, like
``SupportsInt`` and several other simple protocols.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.