PEP: 399 Title: Pure Python/C Accelerator Module Compatibility Requirements Version: $Revision: 88219 $ Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $ Author: Brett Cannon Status: Final Type: Informational Content-Type: text/x-rst Created: 04-Apr-2011 Python-Version: 3.3 Post-History: 04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011 Abstract ======== The Python standard library under CPython contains various instances of modules implemented in both pure Python and C (either entirely or partially). This PEP requires that in these instances that the C code **must** pass the test suite used for the pure Python code so as to act as much as a drop-in replacement as reasonably possible (C- and VM-specific tests are exempt). It is also required that new C-based modules lacking a pure Python equivalent implementation get special permission to be added to the standard library. Rationale ========= Python has grown beyond the CPython virtual machine (VM). IronPython_, Jython_, and PyPy_ are all currently viable alternatives to the CPython VM. The VM ecosystem that has sprung up around the Python programming language has led to Python being used in many different areas where CPython cannot be used, e.g., Jython allowing Python to be used in Java applications. A problem all of the VMs other than CPython face is handling modules from the standard library that are implemented (to some extent) in C. Since other VMs do not typically support the entire `C API of CPython`_ they are unable to use the code used to create the module. Often times this leads these other VMs to either re-implement the modules in pure Python or in the programming language used to implement the VM itself (e.g., in C# for IronPython). This duplication of effort between CPython, PyPy, Jython, and IronPython is extremely unfortunate as implementing a module **at least** in pure Python would help mitigate this duplicate effort. The purpose of this PEP is to minimize this duplicate effort by mandating that all new modules added to Python's standard library **must** have a pure Python implementation *unless* special dispensation is given. This makes sure that a module in the stdlib is available to all VMs and not just to CPython (pre-existing modules that do not meet this requirement are exempt, although there is nothing preventing someone from adding in a pure Python implementation retroactively). Re-implementing parts (or all) of a module in C (in the case of CPython) is still allowed for performance reasons, but any such accelerated code must pass the same test suite (sans VM- or C-specific tests) to verify semantics and prevent divergence. To accomplish this, the test suite for the module must have comprehensive coverage of the pure Python implementation before the acceleration code may be added. Details ======= Starting in Python 3.3, any modules added to the standard library must have a pure Python implementation. This rule can only be ignored if the Python development team grants a special exemption for the module. Typically the exemption will be granted only when a module wraps a specific C-based library (e.g., sqlite3_). In granting an exemption it will be recognized that the module will be considered exclusive to CPython and not part of Python's standard library that other VMs are expected to support. Usage of ``ctypes`` to provide an API for a C library will continue to be frowned upon as ``ctypes`` lacks compiler guarantees that C code typically relies upon to prevent certain errors from occurring (e.g., API changes). Even though a pure Python implementation is mandated by this PEP, it does not preclude the use of a companion acceleration module. If an acceleration module is provided it is to be named the same as the module it is accelerating with an underscore attached as a prefix, e.g., ``_warnings`` for ``warnings``. The common pattern to access the accelerated code from the pure Python implementation is to import it with an ``import *``, e.g., ``from _warnings import *``. This is typically done at the end of the module to allow it to overwrite specific Python objects with their accelerated equivalents. This kind of import can also be done before the end of the module when needed, e.g., an accelerated base class is provided but is then subclassed by Python code. This PEP does not mandate that pre-existing modules in the stdlib that lack a pure Python equivalent gain such a module. But if people do volunteer to provide and maintain a pure Python equivalent (e.g., the PyPy team volunteering their pure Python implementation of the ``csv`` module and maintaining it) then such code will be accepted. In those instances the C version is considered the reference implementation in terms of expected semantics. Any new accelerated code must act as a drop-in replacement as close to the pure Python implementation as reasonable. Technical details of the VM providing the accelerated code are allowed to differ as necessary, e.g., a class being a ``type`` when implemented in C. To verify that the Python and equivalent C code operate as similarly as possible, both code bases must be tested using the same tests which apply to the pure Python code (tests specific to the C code or any VM do not follow under this requirement). The test suite is expected to be extensive in order to verify expected semantics. Acting as a drop-in replacement also dictates that no public API be provided in accelerated code that does not exist in the pure Python code. Without this requirement people could accidentally come to rely on a detail in the accelerated code which is not made available to other VMs that use the pure Python implementation. To help verify that the contract of semantic equivalence is being met, a module must be tested both with and without its accelerated code as thoroughly as possible. As an example, to write tests which exercise both the pure Python and C accelerated versions of a module, a basic idiom can be followed:: import collections.abc from test.support import import_fresh_module, run_unittest import unittest c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq']) class ExampleTest(unittest.TestCase): def test_heappop_exc_for_non_MutableSequence(self): # Raise TypeError when heap is not a # collections.abc.MutableSequence. class Spam: """Test class lacking many ABC-required methods (e.g., pop()).""" def __len__(self): return 0 heap = Spam() self.assertFalse(isinstance(heap, collections.abc.MutableSequence)) with self.assertRaises(TypeError): self.heapq.heappop(heap) class AcceleratedExampleTest(ExampleTest): """Test using the accelerated code.""" heapq = c_heapq class PyExampleTest(ExampleTest): """Test with just the pure Python code.""" heapq = py_heapq def test_main(): run_unittest(AcceleratedExampleTest, PyExampleTest) if __name__ == '__main__': test_main() If this test were to provide extensive coverage for ``heapq.heappop()`` in the pure Python implementation then the accelerated C code would be allowed to be added to CPython's standard library. If it did not, then the test suite would need to be updated until proper coverage was provided before the accelerated C code could be added. To also help with compatibility, C code should use abstract APIs on objects to prevent accidental dependence on specific types. For instance, if a function accepts a sequence then the C code should default to using `PyObject_GetItem()` instead of something like `PyList_GetItem()`. C code is allowed to have a fast path if the proper `PyList_CheckExact()` is used, but otherwise APIs should work with any object that duck types to the proper interface instead of a specific type. Copyright ========= This document has been placed in the public domain. .. _IronPython: http://ironpython.net/ .. _Jython: http://www.jython.org/ .. _PyPy: http://pypy.org/ .. _C API of CPython: http://docs.python.org/py3k/c-api/index.html .. _sqlite3: http://docs.python.org/py3k/library/sqlite3.html