359 lines
14 KiB
ReStructuredText
359 lines
14 KiB
ReStructuredText
PEP: 688
|
|
Title: Making the buffer protocol accessible in Python
|
|
Author: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/15265
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 23-Apr-2022
|
|
Python-Version: 3.12
|
|
Post-History: `23-Apr-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/CX7GPSIYQEL23RXMYL66GAKGP4RLUD7P/>`__,
|
|
`25-Apr-2022 <https://discuss.python.org/t/15265>`__
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a Python-level API for the buffer protocol,
|
|
which is currently accessible only to C code. This allows type
|
|
checkers to evaluate whether objects implement the protocol.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The CPython C API provides a versatile mechanism for accessing the
|
|
underlying memory of an object—the `buffer protocol <https://docs.python.org/3/c-api/buffer.html>`__
|
|
introduced in :pep:`3118`.
|
|
Functions that accept binary data are usually written to handle any
|
|
object implementing the buffer protocol. For example, at the time of writing,
|
|
there are around 130 functions in CPython using the Argument Clinic
|
|
``Py_buffer`` type, which accepts the buffer protocol.
|
|
|
|
Currently, there is no way for Python code to inspect whether an object
|
|
supports the buffer protocol. Moreover, the static type system
|
|
does not provide a type annotation to represent the protocol.
|
|
This is a `common problem <https://github.com/python/typing/issues/593>`__
|
|
when writing type annotations for code that accepts generic buffers.
|
|
|
|
Similarly, it is impossible for a class written in Python to support
|
|
the buffer protocol. A buffer class in
|
|
Python would give users the ability to easily wrap a C buffer object, or to test
|
|
the behavior of an API that consumes the buffer protocol. Granted, this is not
|
|
a particularly common need. However, there has been a
|
|
`CPython feature request <https://github.com/python/cpython/issues/58006>`__
|
|
for supporting buffer classes written in Python that has been open since 2012.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Current options
|
|
---------------
|
|
|
|
There are two known workarounds for annotating buffer types in
|
|
the type system, but neither is adequate.
|
|
|
|
First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
|
|
for buffer types in typeshed is a type alias
|
|
that lists well-known buffer types in the standard library, such as
|
|
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
|
|
approach works for the standard library, but it does not extend to
|
|
third-party buffer types.
|
|
|
|
Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
|
|
for ``typing.ByteString`` currently states:
|
|
|
|
This type represents the types ``bytes``, ``bytearray``, and
|
|
``memoryview`` of byte sequences.
|
|
|
|
As a shorthand for this type, ``bytes`` can be used to annotate
|
|
arguments of any of the types mentioned above.
|
|
|
|
Although this sentence has been in the documentation
|
|
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
|
|
the use of ``bytes`` to include these other types is not specified
|
|
in any of the typing PEPs. Furthermore, this mechanism has a number of
|
|
problems. It does not include all possible buffer types, and it
|
|
makes the ``bytes`` type ambiguous in type annotations. After all,
|
|
there are many operations that are valid on ``bytes`` objects, but
|
|
not on ``memoryview`` objects, and it is perfectly possible for
|
|
a function to accept ``bytes`` but not ``memoryview`` objects.
|
|
A mypy user
|
|
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
|
|
that this shortcut has caused significant problems for the ``psycopg`` project.
|
|
|
|
Kinds of buffers
|
|
----------------
|
|
|
|
The C buffer protocol supports
|
|
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
|
|
affecting strides, contiguity, and support for writing to the buffer. Some of these
|
|
options would be useful in the type system. For example, typeshed
|
|
currently provides separate type aliases for writable and read-only
|
|
buffers.
|
|
|
|
However, in the C buffer protocol, most of these options cannot be
|
|
queried directly on the type object. The only way to figure out
|
|
whether an object supports a particular flag is to actually
|
|
ask for the buffer. For some types, such as ``memoryview``,
|
|
the supported flags depend on the instance. As a result, it would
|
|
be difficult to represent support for these flags in the type system.
|
|
|
|
There is one exception: if a buffer is writable, it must define the
|
|
``bf_releasebuffer`` slot, so that the buffer can be released when
|
|
a consumer is done writing to it. For read-only buffers, such as
|
|
``bytes``, there is generally no need for this slot. Therefore, the
|
|
presence of this slot can be used to determine whether a buffer is
|
|
writable.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Python-level buffer protocol
|
|
----------------------------
|
|
|
|
We propose to add two Python-level special methods, ``__buffer__``
|
|
and ``__release_buffer__``. Python
|
|
classes that implement these methods are usable as buffers from C
|
|
code. Conversely, classes implemented in C that support the
|
|
buffer protocol acquire synthesized methods accessible from Python
|
|
code.
|
|
|
|
The ``__buffer__`` method corresponds to the ``bf_getbuffer`` C slot,
|
|
which is called to create a buffer from a Python object.
|
|
The Python signature for this method is
|
|
``def __buffer__(self, flags: int, /) -> memoryview: ...``. The method
|
|
must return a ``memoryview`` object. If the method is called from C
|
|
code, the interpreter extracts the underlying ``Py_buffer`` from the
|
|
``memoryview`` and returns it to the C caller. Similarly, if the
|
|
``__buffer__`` method is called on an instance of a C class that
|
|
implements ``bf_getbuffer``, the returned buffer is wrapped in a
|
|
``memoryview`` for consumption by Python code.
|
|
|
|
The ``__release_buffer__`` method corresponds to the
|
|
``bf_releasebuffer`` C slot, which is called when a caller no
|
|
longer needs the buffer returned by ``bf_getbuffer``. This is an
|
|
optional part of the buffer protocol, and read-only buffers can
|
|
generally omit it. The Python signature for this method is
|
|
``def __release_buffer__(self, buffer: memoryview, /) -> None: ...``.
|
|
The buffer to be released is wrapped in a ``memoryview``. It is
|
|
also possible to call ``__release_buffer__`` on a C class that
|
|
implements ``bf_releasebuffer``.
|
|
|
|
inspect.BufferFlags
|
|
-------------------
|
|
|
|
To help implementations of ``__buffer__``, we add ``inspect.BufferFlags``,
|
|
a subclass of ``enum.IntFlag``. This enum contains all flags defined in the
|
|
C buffer protocol. For example, ``inspect.BufferFlags.SIMPLE`` has the same
|
|
value as the ``PyBUF_SIMPLE`` constant.
|
|
|
|
collections.abc.Buffer and MutableBuffer
|
|
----------------------------------------
|
|
|
|
We add two new abstract base classes, ``collections.abc.Buffer`` and
|
|
``collections.abc.MutableBuffer``. The former requires the ``__buffer__``
|
|
method; the latter requires both ``__buffer__`` and ``__release_buffer__``.
|
|
These classes are intended primarily for use in type annotations:
|
|
|
|
.. code-block:: python
|
|
|
|
def need_buffer(b: Buffer) -> memoryview:
|
|
return memoryview(b)
|
|
|
|
need_buffer(b"xy") # ok
|
|
need_buffer("xy") # rejected by static type checkers
|
|
|
|
def need_mutable_buffer(b: MutableBuffer) -> None:
|
|
view = memoryview(b)
|
|
view[0] = 42
|
|
|
|
need_mutable_buffer(bytearray(b"xy")) # ok
|
|
need_mutable_buffer(b"xy") # rejected by static type checkers
|
|
|
|
|
|
They can also be used in ``isinstance`` and ``issubclass`` checks:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> from collections.abc import Buffer
|
|
>>> isinstance(b"xy", Buffer)
|
|
True
|
|
>>> issubclass(bytes, Buffer)
|
|
True
|
|
>>> issubclass(memoryview, Buffer)
|
|
True
|
|
>>> isinstance("xy", Buffer)
|
|
False
|
|
>>> issubclass(str, Buffer)
|
|
False
|
|
|
|
In the typeshed stub files, these classes should be defined as ``Protocol``\ s,
|
|
following the precedent of other simple ABCs in ``collections.abc`` such as
|
|
``collections.abc.Iterable`` or ``collections.abc.Sized``.
|
|
|
|
Example
|
|
-------
|
|
|
|
The following is an example of a Python class that implements the
|
|
buffer protocol:
|
|
|
|
.. code-block:: python
|
|
|
|
class MyBuffer:
|
|
def __init__(self, data: bytes):
|
|
self.data = bytearray(data)
|
|
self.view = memoryview(self.data)
|
|
self.held = False
|
|
|
|
def __buffer__(self, flags: int) -> memoryview:
|
|
if flags != inspect.BufferFlags.FULL_RO:
|
|
raise TypeError("Only BufferFlags.FULL_RO supported")
|
|
if self.held:
|
|
raise RuntimeError("Buffer already held")
|
|
self.held = True
|
|
return self.view
|
|
|
|
def __release_buffer__(self, view: memoryview) -> None:
|
|
assert self.view is view
|
|
self.held = False
|
|
|
|
buffer = MyBuffer(b"capybara")
|
|
with memoryview(buffer) as view:
|
|
view[0] = ord("C")
|
|
|
|
with memoryview(buffer) as view:
|
|
assert view.tobytes() == b"Capybara"
|
|
|
|
|
|
Equivalent for older Python versions
|
|
------------------------------------
|
|
|
|
New typing features are usually backported to older Python versions
|
|
in the `typing_extensions <https://pypi.org/project/typing-extensions/>`_
|
|
package. Because the buffer protocol
|
|
is currently accessible only in C, this PEP cannot be fully implemented
|
|
in a pure-Python package like ``typing_extensions``. As a temporary
|
|
workaround, two abstract base classes, ``typing_extensions.Buffer``
|
|
and ``typing_extensions.MutableBuffer``, will be provided for Python versions
|
|
that do not have ``collections.abc.Buffer`` available.
|
|
|
|
After this PEP is implemented, inheriting from ``collections.abc.Buffer`` will
|
|
not be necessary to indicate that an object supports the buffer protocol.
|
|
However, in older Python versions, it will be necessary to explicitly
|
|
inherit from ``typing_extensions.Buffer`` to indicate to type checkers that
|
|
a class supports the buffer protocol, since objects supporting the buffer
|
|
protocol will not have a ``__buffer__`` method. It is expected that this
|
|
will happen primarily in stub files, because buffer classes are necessarily
|
|
implemented in C code, which cannot have types defined inline.
|
|
For runtime uses, the ``ABC.register`` API can be used to register
|
|
buffer classes with ``typing_extensions.Buffer``.
|
|
|
|
|
|
No special meaning for ``bytes``
|
|
--------------------------------
|
|
|
|
The special case stating that ``bytes`` may be used as a shorthand
|
|
for other ``ByteString`` types will be removed from the ``typing``
|
|
documentation.
|
|
With ``types.Buffer`` available as an alternative, there will be no good
|
|
reason to allow ``bytes`` as a shorthand.
|
|
We suggest that type checkers currently implementing this behavior
|
|
should deprecate and eventually remove it.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
As the runtime changes in this PEP only add new functionality, there are
|
|
no backwards compatibility concerns.
|
|
|
|
However, the recommendation to remove the special behavior for
|
|
``bytes`` in type checkers does have a backwards compatibility
|
|
impact on their users. An `experiment <https://github.com/python/mypy/pull/12661>`__
|
|
with mypy shows that several major open source projects that use it
|
|
for type checking will see new errors if the ``bytes`` promotion
|
|
is removed. Many of these errors can be fixed by improving
|
|
the stubs in typeshed, as has already been done for the
|
|
`builtins <https://github.com/python/typeshed/pull/7631>`__,
|
|
`binascii <https://github.com/python/typeshed/pull/7677>`__,
|
|
`pickle <https://github.com/python/typeshed/pull/7678>`__, and
|
|
`re <https://github.com/python/typeshed/pull/7679>`__ modules.
|
|
Overall, the change improves type safety and makes the type system
|
|
more consistent, so we believe the migration cost is worth it.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
We will add notes pointing to ``collections.abc.Buffer`` in appropriate places in the
|
|
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
|
|
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
|
|
Type checkers may provide additional pointers in their error messages. For example,
|
|
when they encounter a buffer object being passed to a function that
|
|
is annotated to only accept ``bytes``, the error message could include a note suggesting
|
|
the use of ``collections.abc.Buffer`` instead.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
An implementation of this PEP is
|
|
`available <https://github.com/python/cpython/compare/main...JelleZijlstra:pep688v2?expand=1>`__
|
|
in the author's fork.
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
types.Buffer
|
|
------------
|
|
|
|
An earlier version of this PEP proposed adding a new ``types.Buffer`` type with
|
|
an ``__instancecheck__`` implemented in C so that ``isinstance()`` checks can be
|
|
used to check whether a type implements the buffer protocol. This avoids the
|
|
complexity of exposing the full buffer protocol to Python code, while still
|
|
allowing the type system to check for the buffer protocol.
|
|
|
|
However, that approach
|
|
does not compose well with the rest of the type system, because ``types.Buffer``
|
|
would be a nominal type, not a structural one. For example, there would be no way
|
|
to represent "an object that supports both the buffer protocol and ``__len__``". With
|
|
the current proposal, ``__buffer__`` is like any other special method, so a
|
|
``Protocol`` can be defined combining it with another method.
|
|
|
|
More generally, no other part of Python works like the proposed ``types.Buffer``.
|
|
The current proposal is more consistent with the rest of the language, where
|
|
C-level slots usually have corresponding Python-level special methods.
|
|
|
|
Keep ``bytearray`` compatible with ``bytes``
|
|
--------------------------------------------
|
|
|
|
It has been suggested to remove the special case where ``memoryview`` is
|
|
always compatible with ``bytes``, but keep it for ``bytearray``, because
|
|
the two types have very similar interfaces. However, several standard
|
|
library functions (e.g., ``re.compile`` and ``socket.getaddrinfo``) accept
|
|
``bytes`` but not ``bytearray``. In most codebases, ``bytearray`` is also
|
|
not a very common type. We prefer to have users spell out accepted types
|
|
explicitly (or use ``Protocol`` from :pep:`544` if only a specific set of
|
|
methods is required).
|
|
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
Where should the ``Buffer`` and ``MutableBuffer`` ABCs live? This PEP
|
|
proposes putting them in ``collections.abc`` like most other ABCs,
|
|
but buffers are not "collections" in the original sense of the word.
|
|
Alternatively, these classes could be put in ``typing``, like
|
|
``SupportsInt`` and several other simple protocols.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|