444 lines
19 KiB
ReStructuredText
444 lines
19 KiB
ReStructuredText
PEP: 688
|
|
Title: Making the buffer protocol accessible in Python
|
|
Author: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/19756
|
|
Status: Accepted
|
|
Type: Standards Track
|
|
Topic: Typing
|
|
Content-Type: text/x-rst
|
|
Created: 23-Apr-2022
|
|
Python-Version: 3.12
|
|
Post-History: `23-Apr-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/CX7GPSIYQEL23RXMYL66GAKGP4RLUD7P/>`__,
|
|
`25-Apr-2022 <https://discuss.python.org/t/15265>`__,
|
|
`06-Oct-2022 <https://discuss.python.org/t/19756>`__,
|
|
`26-Oct-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/XH5ZK2MSZIQLL62PYZ6I5532SQKKVCBL/>`__
|
|
Resolution: https://discuss.python.org/t/pep-688-making-the-buffer-protocol-accessible-in-python/15265/35
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a Python-level API for the buffer protocol,
|
|
which is currently accessible only to C code. This allows type
|
|
checkers to evaluate whether objects implement the protocol.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The CPython C API provides a versatile mechanism for accessing the
|
|
underlying memory of an object—the `buffer protocol <https://docs.python.org/3/c-api/buffer.html>`__
|
|
introduced in :pep:`3118`.
|
|
Functions that accept binary data are usually written to handle any
|
|
object implementing the buffer protocol. For example, at the time of writing,
|
|
there are around 130 functions in CPython using the Argument Clinic
|
|
``Py_buffer`` type, which accepts the buffer protocol.
|
|
|
|
Currently, there is no way for Python code to inspect whether an object
|
|
supports the buffer protocol. Moreover, the static type system
|
|
does not provide a type annotation to represent the protocol.
|
|
This is a `common problem <https://github.com/python/typing/issues/593>`__
|
|
when writing type annotations for code that accepts generic buffers.
|
|
|
|
Similarly, it is impossible for a class written in Python to support
|
|
the buffer protocol. A buffer class in
|
|
Python would give users the ability to easily wrap a C buffer object, or to test
|
|
the behavior of an API that consumes the buffer protocol. Granted, this is not
|
|
a particularly common need. However, there has been a
|
|
`CPython feature request <https://github.com/python/cpython/issues/58006>`__
|
|
for supporting buffer classes written in Python that has been open since 2012.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Current options
|
|
---------------
|
|
|
|
There are two known workarounds for annotating buffer types in
|
|
the type system, but neither is adequate.
|
|
|
|
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
|
|
for buffer types in typeshed is a type alias
|
|
that lists well-known buffer types in the standard library, such as
|
|
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
|
|
approach works for the standard library, but it does not extend to
|
|
third-party buffer types.
|
|
|
|
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
|
|
for ``typing.ByteString`` currently states:
|
|
|
|
This type represents the types ``bytes``, ``bytearray``, and
|
|
``memoryview`` of byte sequences.
|
|
|
|
As a shorthand for this type, ``bytes`` can be used to annotate
|
|
arguments of any of the types mentioned above.
|
|
|
|
Although this sentence has been in the documentation
|
|
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
|
|
the use of ``bytes`` to include these other types is not specified
|
|
in any of the typing PEPs. Furthermore, this mechanism has a number of
|
|
problems. It does not include all possible buffer types, and it
|
|
makes the ``bytes`` type ambiguous in type annotations. After all,
|
|
there are many operations that are valid on ``bytes`` objects, but
|
|
not on ``memoryview`` objects, and it is perfectly possible for
|
|
a function to accept ``bytes`` but not ``memoryview`` objects.
|
|
A mypy user
|
|
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
|
|
that this shortcut has caused significant problems for the ``psycopg`` project.
|
|
|
|
Kinds of buffers
|
|
----------------
|
|
|
|
The C buffer protocol supports
|
|
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
|
|
affecting strides, contiguity, and support for writing to the buffer. Some of these
|
|
options would be useful in the type system. For example, typeshed
|
|
currently provides separate type aliases for writable and read-only
|
|
buffers.
|
|
|
|
However, in the C buffer protocol, most of these options cannot be
|
|
queried directly on the type object. The only way to figure out
|
|
whether an object supports a particular flag is to actually
|
|
ask for the buffer. For some types, such as ``memoryview``,
|
|
the supported flags depend on the instance. As a result, it would
|
|
be difficult to represent support for these flags in the type system.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Python-level buffer protocol
|
|
----------------------------
|
|
|
|
We propose to add two Python-level special methods, ``__buffer__``
|
|
and ``__release_buffer__``. Python
|
|
classes that implement these methods are usable as buffers from C
|
|
code. Conversely, classes implemented in C that support the
|
|
buffer protocol acquire synthesized methods accessible from Python
|
|
code.
|
|
|
|
The ``__buffer__`` method is called to create a buffer from a Python
|
|
object, for example by the ``memoryview()`` constructor.
|
|
It corresponds to the ``bf_getbuffer`` C slot.
|
|
The Python signature for this method is
|
|
``def __buffer__(self, flags: int, /) -> memoryview: ...``. The method
|
|
must return a ``memoryview`` object. If the ``bf_getbuffer`` slot
|
|
is invoked on a Python class with a ``__buffer__`` method,
|
|
the interpreter extracts the underlying ``Py_buffer`` from the
|
|
``memoryview`` returned by the method
|
|
and returns it to the C caller. Similarly, if Python code calls the
|
|
``__buffer__`` method on an instance of a C class that
|
|
implements ``bf_getbuffer``, the returned buffer is wrapped in a
|
|
``memoryview`` for consumption by Python code.
|
|
|
|
The ``__release_buffer__`` method should be called when a caller no
|
|
longer needs the buffer returned by ``__buffer__``. It corresponds to the
|
|
``bf_releasebuffer`` C slot. This is an
|
|
optional part of the buffer protocol.
|
|
The Python signature for this method is
|
|
``def __release_buffer__(self, buffer: memoryview, /) -> None: ...``.
|
|
The buffer to be released is wrapped in a ``memoryview``. When this
|
|
method is invoked through CPython's buffer API (for example, through
|
|
calling ``memoryview.release`` on a ``memoryview`` returned by
|
|
``__buffer__``), the passed ``memoryview`` is the same object
|
|
as was returned by ``__buffer__``. It is
|
|
also possible to call ``__release_buffer__`` on a C class that
|
|
implements ``bf_releasebuffer``.
|
|
|
|
If ``__release_buffer__`` exists on an object,
|
|
Python code that calls ``__buffer__`` directly on the object must
|
|
call ``__release_buffer__`` on the same object when it is done
|
|
with the buffer. Otherwise, resources used by the object may
|
|
not be reclaimed. Similarly, it is a programming error
|
|
to call ``__release_buffer__`` without a previous call to
|
|
``__buffer__``, or to call it multiple times for a single call
|
|
to ``__buffer__``. For objects that implement the C buffer protocol,
|
|
calls to ``__release_buffer__`` where the argument is not a
|
|
``memoryview`` wrapping the same object will raise an exception.
|
|
After a valid call to ``__release_buffer__``, the ``memoryview``
|
|
is invalidated (as if its ``release()`` method had been called),
|
|
and any subsequent calls to ``__release_buffer__`` with the same
|
|
``memoryview`` will raise an exception.
|
|
The interpreter will ensure that misuse
|
|
of the Python API will not break invariants at the C level -- for
|
|
example, it will not cause memory safety violations.
|
|
|
|
``inspect.BufferFlags``
|
|
-----------------------
|
|
|
|
To help implementations of ``__buffer__``, we add ``inspect.BufferFlags``,
|
|
a subclass of ``enum.IntFlag``. This enum contains all flags defined in the
|
|
C buffer protocol. For example, ``inspect.BufferFlags.SIMPLE`` has the same
|
|
value as the ``PyBUF_SIMPLE`` constant.
|
|
|
|
``collections.abc.Buffer``
|
|
--------------------------
|
|
|
|
We add a new abstract base classes, ``collections.abc.Buffer``,
|
|
which requires the ``__buffer__`` method.
|
|
This class is intended primarily for use in type annotations:
|
|
|
|
.. code-block:: python
|
|
|
|
def need_buffer(b: Buffer) -> memoryview:
|
|
return memoryview(b)
|
|
|
|
need_buffer(b"xy") # ok
|
|
need_buffer("xy") # rejected by static type checkers
|
|
|
|
|
|
It can also be used in ``isinstance`` and ``issubclass`` checks:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> from collections.abc import Buffer
|
|
>>> isinstance(b"xy", Buffer)
|
|
True
|
|
>>> issubclass(bytes, Buffer)
|
|
True
|
|
>>> issubclass(memoryview, Buffer)
|
|
True
|
|
>>> isinstance("xy", Buffer)
|
|
False
|
|
>>> issubclass(str, Buffer)
|
|
False
|
|
|
|
In the typeshed stub files, the class should be defined as a ``Protocol``,
|
|
following the precedent of other simple ABCs in ``collections.abc`` such as
|
|
``collections.abc.Iterable`` or ``collections.abc.Sized``.
|
|
|
|
Example
|
|
-------
|
|
|
|
The following is an example of a Python class that implements the
|
|
buffer protocol:
|
|
|
|
.. code-block:: python
|
|
|
|
import contextlib
|
|
import inspect
|
|
|
|
class MyBuffer:
|
|
def __init__(self, data: bytes):
|
|
self.data = bytearray(data)
|
|
self.view = None
|
|
|
|
def __buffer__(self, flags: int) -> memoryview:
|
|
if flags != inspect.BufferFlags.FULL_RO:
|
|
raise TypeError("Only BufferFlags.FULL_RO supported")
|
|
if self.view is not None:
|
|
raise RuntimeError("Buffer already held")
|
|
self.view = memoryview(self.data)
|
|
return self.view
|
|
|
|
def __release_buffer__(self, view: memoryview) -> None:
|
|
assert self.view is view # guaranteed to be true
|
|
self.view.release()
|
|
self.view = None
|
|
|
|
def extend(self, b: bytes) -> None:
|
|
if self.view is not None:
|
|
raise RuntimeError("Cannot extend held buffer")
|
|
self.data.extend(b)
|
|
|
|
buffer = MyBuffer(b"capybara")
|
|
with memoryview(buffer) as view:
|
|
view[0] = ord("C")
|
|
|
|
with contextlib.suppress(RuntimeError):
|
|
buffer.extend(b"!") # raises RuntimeError
|
|
|
|
buffer.extend(b"!") # ok, buffer is no longer held
|
|
|
|
with memoryview(buffer) as view:
|
|
assert view.tobytes() == b"Capybara!"
|
|
|
|
|
|
Equivalent for older Python versions
|
|
------------------------------------
|
|
|
|
New typing features are usually backported to older Python versions
|
|
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_
|
|
package. Because the buffer protocol
|
|
is currently accessible only in C, this PEP cannot be fully implemented
|
|
in a pure-Python package like ``typing_extensions``. As a temporary
|
|
workaround, an abstract base class ``typing_extensions.Buffer``
|
|
will be provided for Python versions
|
|
that do not have ``collections.abc.Buffer`` available.
|
|
|
|
After this PEP is implemented, inheriting from ``collections.abc.Buffer`` will
|
|
not be necessary to indicate that an object supports the buffer protocol.
|
|
However, in older Python versions, it will be necessary to explicitly
|
|
inherit from ``typing_extensions.Buffer`` to indicate to type checkers that
|
|
a class supports the buffer protocol, since objects supporting the buffer
|
|
protocol will not have a ``__buffer__`` method. It is expected that this
|
|
will happen primarily in stub files, because buffer classes are necessarily
|
|
implemented in C code, which cannot have types defined inline.
|
|
For runtime uses, the ``ABC.register`` API can be used to register
|
|
buffer classes with ``typing_extensions.Buffer``.
|
|
|
|
|
|
No special meaning for ``bytes``
|
|
--------------------------------
|
|
|
|
The special case stating that ``bytes`` may be used as a shorthand
|
|
for other ``ByteString`` types will be removed from the ``typing``
|
|
documentation.
|
|
With ``collections.abc.Buffer`` available as an alternative, there will be no good
|
|
reason to allow ``bytes`` as a shorthand.
|
|
Type checkers currently implementing this behavior
|
|
should deprecate and eventually remove it.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
``__buffer__`` and ``__release_buffer__`` attributes
|
|
----------------------------------------------------
|
|
|
|
As the runtime changes in this PEP only add new functionality, there are
|
|
few backwards compatibility concerns.
|
|
|
|
However, code that uses a ``__buffer__`` or ``__release_buffer__`` attribute for
|
|
other purposes may be affected. While all dunders are technically reserved for the
|
|
language, it is still good practice to ensure that a new dunder does not
|
|
interfere with too much existing code, especially widely used packages. A survey
|
|
of publicly accessible code found:
|
|
|
|
- PyPy `supports <https://doc.pypy.org/en/latest/__pypy__-module.html#generally-available-functionality>`__
|
|
a ``__buffer__`` method with compatible semantics to those proposed in this
|
|
PEP. A PyPy core developer `expressed his support <https://discuss.python.org/t/pep-688-making-the-buffer-protocol-accessible-in-python/15265/34>`__
|
|
for this PEP.
|
|
- pyzmq `implements <https://github.com/zeromq/pyzmq/blob/fe18dc55516ef50d168fc02f8550a67ff5b5633d/zmq/backend/cffi/message.py#L190>`__
|
|
a PyPy-compatible ``__buffer__`` method.
|
|
- mpi4py `defines <https://github.com/mpi4py/mpi4py/blob/453b87d0da37c5914b91afb511b188556dff2a9c/src/mpi4py/typing.py#L66>`__
|
|
a ``SupportsBuffer`` protocol that would be equivalent to this PEP's ``collections.abc.Buffer``.
|
|
- NumPy used to have an undocumented behavior where it would access a ``__buffer__`` attribute
|
|
(not method) to get an object's buffer. This was `removed <https://github.com/numpy/numpy/pull/13049>`__
|
|
in 2019 for NumPy 1.17. The behavior would have last worked in NumPy 1.16, which only supported
|
|
Python 3.7 and older. Python 3.7 will have reached its end of life by the time this PEP is expected to
|
|
be implemented.
|
|
|
|
Thus, this PEP's use of the ``__buffer__`` method will improve interoperability with
|
|
PyPy and not interfere with the current versions of any major Python packages.
|
|
|
|
No publicly accessible code uses the name ``__release_buffer__``.
|
|
|
|
Removal of the ``bytes`` special case
|
|
-------------------------------------
|
|
|
|
Separately, the recommendation to remove the special behavior for
|
|
``bytes`` in type checkers does have a backwards compatibility
|
|
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__
|
|
with mypy shows that several major open source projects that use it
|
|
for type checking will see new errors if the ``bytes`` promotion
|
|
is removed. Many of these errors can be fixed by improving
|
|
the stubs in typeshed, as has already been done for the
|
|
`builtins <https://github.com/python/typeshed/pull/7631>`__,
|
|
`binascii <https://github.com/python/typeshed/pull/7677>`__,
|
|
`pickle <https://github.com/python/typeshed/pull/7678>`__, and
|
|
`re <https://github.com/python/typeshed/pull/7679>`__ modules.
|
|
A `review <https://github.com/python/typeshed/issues/9006>`__ of all
|
|
usage of ``bytes`` types in typeshed is in progress.
|
|
Overall, the change improves type safety and makes the type system
|
|
more consistent, so we believe the migration cost is worth it.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
We will add notes pointing to ``collections.abc.Buffer`` in appropriate places in the
|
|
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
|
|
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
|
|
Type checkers may provide additional pointers in their error messages. For example,
|
|
when they encounter a buffer object being passed to a function that
|
|
is annotated to only accept ``bytes``, the error message could include a note suggesting
|
|
the use of ``collections.abc.Buffer`` instead.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
An implementation of this PEP is
|
|
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:pep688v2?expand=1>`__
|
|
in the author's fork.
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
``types.Buffer``
|
|
----------------
|
|
|
|
An earlier version of this PEP proposed adding a new ``types.Buffer`` type with
|
|
an ``__instancecheck__`` implemented in C so that ``isinstance()`` checks can be
|
|
used to check whether a type implements the buffer protocol. This avoids the
|
|
complexity of exposing the full buffer protocol to Python code, while still
|
|
allowing the type system to check for the buffer protocol.
|
|
|
|
However, that approach
|
|
does not compose well with the rest of the type system, because ``types.Buffer``
|
|
would be a nominal type, not a structural one. For example, there would be no way
|
|
to represent "an object that supports both the buffer protocol and ``__len__``". With
|
|
the current proposal, ``__buffer__`` is like any other special method, so a
|
|
``Protocol`` can be defined combining it with another method.
|
|
|
|
More generally, no other part of Python works like the proposed ``types.Buffer``.
|
|
The current proposal is more consistent with the rest of the language, where
|
|
C-level slots usually have corresponding Python-level special methods.
|
|
|
|
Keep ``bytearray`` compatible with ``bytes``
|
|
--------------------------------------------
|
|
|
|
It has been suggested to remove the special case where ``memoryview`` is
|
|
always compatible with ``bytes``, but keep it for ``bytearray``, because
|
|
the two types have very similar interfaces. However, several standard
|
|
library functions (e.g., ``re.compile``, ``socket.getaddrinfo``, and most
|
|
functions accepting path-like arguments) accept
|
|
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also
|
|
not a very common type. We prefer to have users spell out accepted types
|
|
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of
|
|
methods is required). This aspect of the proposal was `specifically
|
|
discussed <https://mail.python.org/archives/list/typing-sig@python.org/thread/XH5ZK2MSZIQLL62PYZ6I5532SQKKVCBL/>`__
|
|
on the typing-sig mailing list, without any strong disagreement from the
|
|
typing community.
|
|
|
|
Distinguish between mutable and immutable buffers
|
|
-------------------------------------------------
|
|
|
|
The most frequently used distinction within buffer types is
|
|
whether or not the buffer is mutable. Some functions accept only
|
|
mutable buffers (e.g., ``bytearray``, some ``memoryview`` objects),
|
|
others accept all buffers.
|
|
|
|
An earlier version of this PEP proposed using the presence of the
|
|
``bf_releasebuffer`` slot to determine whether a buffer type is mutable.
|
|
This rule holds for most standard library buffer types, but the relationship
|
|
between mutability and the presence of this slot is not absolute. For
|
|
example, ``numpy`` arrays are mutable but do not have this slot.
|
|
|
|
The current buffer protocol does not provide any way to reliably
|
|
determine whether a buffer type represents a mutable or immutable
|
|
buffer. Therefore, this PEP does not add type system support
|
|
for this distinction.
|
|
The question can be revisited in the future if the buffer protocol
|
|
is enhanced to provide static introspection support.
|
|
A `sketch <https://discuss.python.org/t/introspection-and-mutable-xor-shared-semantics-for-pybuffer/20314>`_
|
|
for such a mechanism exists.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
Many people have provided useful feedback on drafts of this PEP.
|
|
Petr Viktorin has been particularly helpful in improving my understanding
|
|
of the subtleties of the buffer protocol.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|