195 lines
8.2 KiB
Plaintext
195 lines
8.2 KiB
Plaintext
PEP: 399
|
|
Title: Pure Python/C Accelerator Module Compatibility Requirements
|
|
Version: $Revision: 88219 $
|
|
Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $
|
|
Author: Brett Cannon <brett@python.org>
|
|
Status: Final
|
|
Type: Informational
|
|
Content-Type: text/x-rst
|
|
Created: 04-Apr-2011
|
|
Python-Version: 3.3
|
|
Post-History: 04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011
|
|
|
|
Abstract
|
|
========
|
|
|
|
The Python standard library under CPython contains various instances
|
|
of modules implemented in both pure Python and C (either entirely or
|
|
partially). This PEP requires that in these instances that the
|
|
C code **must** pass the test suite used for the pure Python code
|
|
so as to act as much as a drop-in replacement as reasonably possible
|
|
(C- and VM-specific tests are exempt). It is also required that new
|
|
C-based modules lacking a pure Python equivalent implementation get
|
|
special permission to be added to the standard library.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Python has grown beyond the CPython virtual machine (VM). IronPython_,
|
|
Jython_, and PyPy_ are all currently viable alternatives to the
|
|
CPython VM. The VM ecosystem that has sprung up around the Python
|
|
programming language has led to Python being used in many different
|
|
areas where CPython cannot be used, e.g., Jython allowing Python to be
|
|
used in Java applications.
|
|
|
|
A problem all of the VMs other than CPython face is handling modules
|
|
from the standard library that are implemented (to some extent) in C.
|
|
Since other VMs do not typically support the entire `C API of CPython`_
|
|
they are unable to use the code used to create the module. Often times
|
|
this leads these other VMs to either re-implement the modules in pure
|
|
Python or in the programming language used to implement the VM itself
|
|
(e.g., in C# for IronPython). This duplication of effort between
|
|
CPython, PyPy, Jython, and IronPython is extremely unfortunate as
|
|
implementing a module **at least** in pure Python would help mitigate
|
|
this duplicate effort.
|
|
|
|
The purpose of this PEP is to minimize this duplicate effort by
|
|
mandating that all new modules added to Python's standard library
|
|
**must** have a pure Python implementation *unless* special dispensation
|
|
is given. This makes sure that a module in the stdlib is available to
|
|
all VMs and not just to CPython (pre-existing modules that do not meet
|
|
this requirement are exempt, although there is nothing preventing
|
|
someone from adding in a pure Python implementation retroactively).
|
|
|
|
Re-implementing parts (or all) of a module in C (in the case
|
|
of CPython) is still allowed for performance reasons, but any such
|
|
accelerated code must pass the same test suite (sans VM- or C-specific
|
|
tests) to verify semantics and prevent divergence. To accomplish this,
|
|
the test suite for the module must have comprehensive coverage of the
|
|
pure Python implementation before the acceleration code may be added.
|
|
|
|
|
|
Details
|
|
=======
|
|
|
|
Starting in Python 3.3, any modules added to the standard library must
|
|
have a pure Python implementation. This rule can only be ignored if
|
|
the Python development team grants a special exemption for the module.
|
|
Typically the exemption will be granted only when a module wraps a
|
|
specific C-based library (e.g., sqlite3_). In granting an exemption it
|
|
will be recognized that the module will be considered exclusive to
|
|
CPython and not part of Python's standard library that other VMs are
|
|
expected to support. Usage of ``ctypes`` to provide an
|
|
API for a C library will continue to be frowned upon as ``ctypes``
|
|
lacks compiler guarantees that C code typically relies upon to prevent
|
|
certain errors from occurring (e.g., API changes).
|
|
|
|
Even though a pure Python implementation is mandated by this PEP, it
|
|
does not preclude the use of a companion acceleration module. If an
|
|
acceleration module is provided it is to be named the same as the
|
|
module it is accelerating with an underscore attached as a prefix,
|
|
e.g., ``_warnings`` for ``warnings``. The common pattern to access
|
|
the accelerated code from the pure Python implementation is to import
|
|
it with an ``import *``, e.g., ``from _warnings import *``. This is
|
|
typically done at the end of the module to allow it to overwrite
|
|
specific Python objects with their accelerated equivalents. This kind
|
|
of import can also be done before the end of the module when needed,
|
|
e.g., an accelerated base class is provided but is then subclassed by
|
|
Python code. This PEP does not mandate that pre-existing modules in
|
|
the stdlib that lack a pure Python equivalent gain such a module. But
|
|
if people do volunteer to provide and maintain a pure Python
|
|
equivalent (e.g., the PyPy team volunteering their pure Python
|
|
implementation of the ``csv`` module and maintaining it) then such
|
|
code will be accepted. In those instances the C version is considered
|
|
the reference implementation in terms of expected semantics.
|
|
|
|
Any new accelerated code must act as a drop-in replacement as close
|
|
to the pure Python implementation as reasonable. Technical details of
|
|
the VM providing the accelerated code are allowed to differ as
|
|
necessary, e.g., a class being a ``type`` when implemented in C. To
|
|
verify that the Python and equivalent C code operate as similarly as
|
|
possible, both code bases must be tested using the same tests which
|
|
apply to the pure Python code (tests specific to the C code or any VM
|
|
do not follow under this requirement). The test suite is expected to
|
|
be extensive in order to verify expected semantics.
|
|
|
|
Acting as a drop-in replacement also dictates that no public API be
|
|
provided in accelerated code that does not exist in the pure Python
|
|
code. Without this requirement people could accidentally come to rely
|
|
on a detail in the accelerated code which is not made available to
|
|
other VMs that use the pure Python implementation. To help verify
|
|
that the contract of semantic equivalence is being met, a module must
|
|
be tested both with and without its accelerated code as thoroughly as
|
|
possible.
|
|
|
|
As an example, to write tests which exercise both the pure Python and
|
|
C accelerated versions of a module, a basic idiom can be followed::
|
|
|
|
import collections.abc
|
|
from test.support import import_fresh_module, run_unittest
|
|
import unittest
|
|
|
|
c_heapq = import_fresh_module('heapq', fresh=['_heapq'])
|
|
py_heapq = import_fresh_module('heapq', blocked=['_heapq'])
|
|
|
|
|
|
class ExampleTest(unittest.TestCase):
|
|
|
|
def test_heappop_exc_for_non_MutableSequence(self):
|
|
# Raise TypeError when heap is not a
|
|
# collections.abc.MutableSequence.
|
|
class Spam:
|
|
"""Test class lacking many ABC-required methods
|
|
(e.g., pop())."""
|
|
def __len__(self):
|
|
return 0
|
|
|
|
heap = Spam()
|
|
self.assertFalse(isinstance(heap,
|
|
collections.abc.MutableSequence))
|
|
with self.assertRaises(TypeError):
|
|
self.heapq.heappop(heap)
|
|
|
|
|
|
class AcceleratedExampleTest(ExampleTest):
|
|
|
|
"""Test using the accelerated code."""
|
|
|
|
heapq = c_heapq
|
|
|
|
|
|
class PyExampleTest(ExampleTest):
|
|
|
|
"""Test with just the pure Python code."""
|
|
|
|
heapq = py_heapq
|
|
|
|
|
|
def test_main():
|
|
run_unittest(AcceleratedExampleTest, PyExampleTest)
|
|
|
|
|
|
if __name__ == '__main__':
|
|
test_main()
|
|
|
|
|
|
If this test were to provide extensive coverage for
|
|
``heapq.heappop()`` in the pure Python implementation then the
|
|
accelerated C code would be allowed to be added to CPython's standard
|
|
library. If it did not, then the test suite would need to be updated
|
|
until proper coverage was provided before the accelerated C code
|
|
could be added.
|
|
|
|
To also help with compatibility, C code should use abstract APIs on
|
|
objects to prevent accidental dependence on specific types. For
|
|
instance, if a function accepts a sequence then the C code should
|
|
default to using `PyObject_GetItem()` instead of something like
|
|
`PyList_GetItem()`. C code is allowed to have a fast path if the
|
|
proper `PyList_CheckExact()` is used, but otherwise APIs should work
|
|
with any object that duck types to the proper interface instead of a
|
|
specific type.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
.. _IronPython: http://ironpython.net/
|
|
.. _Jython: http://www.jython.org/
|
|
.. _PyPy: http://pypy.org/
|
|
.. _C API of CPython: http://docs.python.org/py3k/c-api/index.html
|
|
.. _sqlite3: http://docs.python.org/py3k/library/sqlite3.html
|