Some tweaks

This commit is contained in:
Antoine Pitrou 2013-07-06 22:48:32 +02:00
parent 41d43d2d53
commit 722bcf3c16
1 changed files with 63 additions and 65 deletions

View File

@ -13,7 +13,9 @@ Abstract
======== ========
This PEP proposes new Application Programming Interfaces (API) to customize This PEP proposes new Application Programming Interfaces (API) to customize
Python memory allocators. Python memory allocators. The only implementation required to conform to
this PEP is CPython, but other implementations may choose to be compatible,
or to re-use a similar scheme.
Rationale Rationale
@ -123,11 +125,12 @@ New Functions and Structures
``PY_SSIZE_T_MAX``. The check is done before calling the inner ``PY_SSIZE_T_MAX``. The check is done before calling the inner
function. function.
The *pymalloc* allocator is optimized for objects smaller than 512 bytes .. note::
with a short lifetime. It uses memory mappings with a fixed size of 256 The *pymalloc* allocator is optimized for objects smaller than 512 bytes
KB called "arenas". with a short lifetime. It uses memory mappings with a fixed size of 256
KB called "arenas".
Default allocators: Here is how the allocators are set up by default:
* ``PYMEM_DOMAIN_RAW``, ``PYMEM_DOMAIN_MEM``: ``malloc()``, * ``PYMEM_DOMAIN_RAW``, ``PYMEM_DOMAIN_MEM``: ``malloc()``,
``realloc()`` and ``free()``; call ``malloc(1)`` when requesting zero ``realloc()`` and ``free()``; call ``malloc(1)`` when requesting zero
@ -155,11 +158,11 @@ allocators in debug mode:
In Python 3.3, the checks are installed by replacing ``PyMem_Malloc()``, In Python 3.3, the checks are installed by replacing ``PyMem_Malloc()``,
``PyMem_Realloc()``, ``PyMem_Free()``, ``PyObject_Malloc()``, ``PyMem_Realloc()``, ``PyMem_Free()``, ``PyObject_Malloc()``,
``PyObject_Realloc()`` and ``PyObject_Free()`` using macros. The new ``PyObject_Realloc()`` and ``PyObject_Free()`` using macros. The new
allocator allocates a larger buffer and write a pattern to detect buffer allocator allocates a larger buffer and writes a pattern to detect buffer
underflow, buffer overflow and use after free (fill the buffer with the underflow, buffer overflow and use after free (by filling the buffer with
pattern ``0xDB``). It uses the original ``PyObject_Malloc()`` the byte ``0xDB``). It uses the original ``PyObject_Malloc()``
function to allocate memory. So ``PyMem_Malloc()`` and function to allocate memory. So ``PyMem_Malloc()`` and
``PyMem_Realloc()`` call indirectly ``PyObject_Malloc()`` and ``PyMem_Realloc()`` indirectly call``PyObject_Malloc()`` and
``PyObject_Realloc()``. ``PyObject_Realloc()``.
This PEP redesigns the debug checks as hooks on the existing allocators This PEP redesigns the debug checks as hooks on the existing allocators
@ -179,7 +182,7 @@ Call traces when the hooks are installed (debug mode):
=> ``_PyObject_Free()`` => ``_PyObject_Free()``
As a result, ``PyMem_Malloc()`` and ``PyMem_Realloc()`` now call As a result, ``PyMem_Malloc()`` and ``PyMem_Realloc()`` now call
``malloc()`` and ``realloc()`` in release mode and in debug mode, ``malloc()`` and ``realloc()`` in both release mode and debug mode,
instead of calling ``PyObject_Malloc()`` and ``PyObject_Realloc()`` in instead of calling ``PyObject_Malloc()`` and ``PyObject_Realloc()`` in
debug mode. debug mode.
@ -199,19 +202,15 @@ Don't call malloc() directly anymore
Direct calls to ``malloc()`` are replaced with ``PyMem_Malloc()``, or Direct calls to ``malloc()`` are replaced with ``PyMem_Malloc()``, or
``PyMem_RawMalloc()`` if the GIL is not held. ``PyMem_RawMalloc()`` if the GIL is not held.
Configure external libraries like zlib or OpenSSL to allocate memory External libraries like zlib or OpenSSL can be configured to allocate memory
using ``PyMem_Malloc()`` or ``PyMem_RawMalloc()``. If the allocator of a using ``PyMem_Malloc()`` or ``PyMem_RawMalloc()``. If the allocator of a
library can only be replaced globally, the allocator is not replaced if library can only be replaced globally (rather than on an object-by-object
Python is embedded in an application. basis), it shouldn't be replaced when Python is embedded in an application.
For the "track memory usage" use case, it is important to track memory For the "track memory usage" use case, it is important to track memory
allocated in external libraries to have accurate reports, because these allocated in external libraries to have accurate reports, because these
allocations can be large (can raise a ``MemoryError`` exception). allocations can be large (e.g. they can raise a ``MemoryError`` exception)
and would otherwise be missed in memory usage reports.
If an hook is used to the track memory usage, the memory allocated by
direct calls to ``malloc()`` will not be tracked. Remaining ``malloc()``
in external libraries like OpenSSL or bz2 can allocate large memory
blocks and so would be missed in memory usage reports.
Examples Examples
@ -282,9 +281,9 @@ and 10 bytes per *pymalloc* arena::
Use case 2: Replace Memory Allocators, override pymalloc Use case 2: Replace Memory Allocators, override pymalloc
-------------------------------------------------------- --------------------------------------------------------
If your allocator is optimized for allocations of objects smaller than If you have a dedicated allocator optimized for allocations of objects
512 bytes with a short lifetime, pymalloc can be overriden (replace smaller than 512 bytes with a short lifetime, pymalloc can be overriden
``PyObject_Malloc()``). (replace ``PyObject_Malloc()``).
Dummy example wasting 2 bytes per memory block:: Dummy example wasting 2 bytes per memory block::
@ -420,12 +419,8 @@ Rejected Alternatives
More specific functions to get/set memory allocators More specific functions to get/set memory allocators
---------------------------------------------------- ----------------------------------------------------
Replace the 2 functions: It was originally proposed a larger set of C API functions, with one pair
of functions for each allocator domain:
* ``void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
* ``void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
with:
* ``void PyMem_GetRawAllocator(PyMemAllocator *allocator)`` * ``void PyMem_GetRawAllocator(PyMemAllocator *allocator)``
* ``void PyMem_GetAllocator(PyMemAllocator *allocator)`` * ``void PyMem_GetAllocator(PyMemAllocator *allocator)``
@ -442,8 +437,8 @@ each memory allocator domain.
Make PyMem_Malloc() reuse PyMem_RawMalloc() by default Make PyMem_Malloc() reuse PyMem_RawMalloc() by default
------------------------------------------------------ ------------------------------------------------------
If ``PyMem_Malloc()`` would call ``PyMem_RawMalloc()`` by default, If ``PyMem_Malloc()`` called ``PyMem_RawMalloc()`` by default,
calling ``PyMem_SetAllocator(PYMEM_DOMAIN_RAW, alloc)`` would also also calling ``PyMem_SetAllocator(PYMEM_DOMAIN_RAW, alloc)`` would also
patch ``PyMem_Malloc()`` indirectly. patch ``PyMem_Malloc()`` indirectly.
This alternative was rejected because ``PyMem_SetAllocator()`` would This alternative was rejected because ``PyMem_SetAllocator()`` would
@ -454,17 +449,17 @@ same behaviour is less error-prone.
Add a new PYDEBUGMALLOC environment variable Add a new PYDEBUGMALLOC environment variable
-------------------------------------------- --------------------------------------------
Add a new ``PYDEBUGMALLOC`` environment variable to enable debug checks It was proposed to add a new ``PYDEBUGMALLOC`` environment variable to
on memory block allocators. The environment variable replaces the new enable debug checks on memory block allocators. It would have had the same
function ``PyMem_SetupDebugHooks()`` which is not needed anymore. effect as calling the ``PyMem_SetupDebugHooks()``, without the need
Another advantage is to allow to enable debug checks even in release to write any C code. Another advantage is to allow to enable debug checks
mode: debug checks are always compiled, but only enabled when the even in release mode: debug checks would always be compiled in, but only
environment variable is present and non-empty. enabled when the environment variable is present and non-empty.
This alternative was rejected because a new environment variable would This alternative was rejected because a new environment variable would
make the Python initialization even more complex. The `PEP 432 make Python initialization even more complex. `PEP 432
<http://www.python.org/dev/peps/pep-0432/>`_ tries to simply the CPython <http://www.python.org/dev/peps/pep-0432/>`_ tries to simplify the
startup sequence. CPython startup sequence.
Use macros to get customizable allocators Use macros to get customizable allocators
@ -474,7 +469,7 @@ To have no overhead in the default configuration, customizable
allocators would be an optional feature enabled by a configuration allocators would be an optional feature enabled by a configuration
option or by macros. option or by macros.
This alternative was rejected because the usage of macros implies having This alternative was rejected because the use of macros implies having
to recompile extensions modules to use the new allocator and allocator to recompile extensions modules to use the new allocator and allocator
hooks. Not having to recompile Python nor extension modules makes debug hooks. Not having to recompile Python nor extension modules makes debug
hooks easier to use in practice. hooks easier to use in practice.
@ -518,12 +513,12 @@ Example of ``PyMem_Malloc`` macro with the modified
The GC allocator functions would also have to be patched. For example, The GC allocator functions would also have to be patched. For example,
``_PyObject_GC_Malloc()`` is used in many C functions and so objects of ``_PyObject_GC_Malloc()`` is used in many C functions and so objects of
differenet types would have the same allocation location. different types would have the same allocation location.
This alternative was rejected because passing a filename and a line This alternative was rejected because passing a filename and a line
number to each allocator makes the API more complex: pass 3 new number to each allocator makes the API more complex: pass 3 new
arguments (ctx, filename, lineno) to each allocator function, instead of arguments (ctx, filename, lineno) to each allocator function, instead of
just a context argument (ctx). Having to modify also GC allocator just a context argument (ctx). Having to also modify GC allocator
functions adds too much complexity for a little gain. functions adds too much complexity for a little gain.
@ -531,23 +526,25 @@ GIL-free PyMem_Malloc()
----------------------- -----------------------
In Python 3.3, when Python is compiled in debug mode, ``PyMem_Malloc()`` In Python 3.3, when Python is compiled in debug mode, ``PyMem_Malloc()``
calls indirectly ``PyObject_Malloc()`` which requires the GIL to be indirectly calls ``PyObject_Malloc()`` which requires the GIL to be
held. That's why ``PyMem_Malloc()`` must be called with the GIL held. held (it isn't thread-safe). That's why ``PyMem_Malloc()`` must be called
with the GIL held.
This PEP changes ``PyMem_Malloc()``: it now always call ``malloc()``. This PEP changes ``PyMem_Malloc()``: it now always calls ``malloc()``
The "GIL must be held" restriction could be removed from rather than ``PyObject_Malloc()``. The "GIL must be held" restriction
``PyMem_Malloc()``. could therefore be removed from ``PyMem_Malloc()``.
This alternative was rejected because allowing to call This alternative was rejected because allowing to call
``PyMem_Malloc()`` without holding the GIL can break applications ``PyMem_Malloc()`` without holding the GIL can break applications
which setup their own allocators or allocator hooks. Holding the GIL is which setup their own allocators or allocator hooks. Holding the GIL is
convinient to develop a custom allocator: no need to care of other convenient to develop a custom allocator: no need to care about other
threads. It is also convinient for a debug allocator hook: Python threads. It is also convenient for a debug allocator hook: Python
internal objects can be safetly inspected. objects can be safely inspected, and the C API may be used for reporting.
Calling ``PyGILState_Ensure()`` in Moreover, calling ``PyGILState_Ensure()`` in a memory allocator has
a memory allocator has unexpected behaviour, especially at Python unexpected behaviour, especially at Python startup and when creating of a
startup and at creation of a new Python thread state. new Python thread state. It is better to free custom allocators of
the responsibility of acquiring the GIL.
Don't add PyMem_RawMalloc() Don't add PyMem_RawMalloc()
@ -566,13 +563,14 @@ This alternative was rejected because ``PyMem_RawMalloc()`` is required
for accurate reports of the memory usage. When a debug hook is used to for accurate reports of the memory usage. When a debug hook is used to
track the memory usage, the memory allocated by direct calls to track the memory usage, the memory allocated by direct calls to
``malloc()`` cannot be tracked. ``PyMem_RawMalloc()`` can be hooked and ``malloc()`` cannot be tracked. ``PyMem_RawMalloc()`` can be hooked and
so all the memory allocated by Python can be tracked. so all the memory allocated by Python can be tracked, including
memory allocated without holding the GIL.
Use existing debug tools to analyze the memory Use existing debug tools to analyze memory use
---------------------------------------------- ----------------------------------------------
There are many existing debug tools to analyze the memory. Some There are many existing debug tools to analyze memory use. Some
examples: `Valgrind <http://valgrind.org/>`_, `Purify examples: `Valgrind <http://valgrind.org/>`_, `Purify
<http://ibm.com/software/awdtools/purify/>`_, `Clang AddressSanitizer <http://ibm.com/software/awdtools/purify/>`_, `Clang AddressSanitizer
<http://code.google.com/p/address-sanitizer/>`_, `failmalloc <http://code.google.com/p/address-sanitizer/>`_, `failmalloc
@ -580,14 +578,14 @@ examples: `Valgrind <http://valgrind.org/>`_, `Purify
The problem is to retrieve the Python object related to a memory pointer The problem is to retrieve the Python object related to a memory pointer
to read its type and/or its content. Another issue is to retrieve the to read its type and/or its content. Another issue is to retrieve the
location of the memory allocation: the C backtrace is usually useless source of the memory allocation: the C backtrace is usually useless
(same reasoning than macros using ``__FILE__`` and ``__LINE__``, see (same reasoning than macros using ``__FILE__`` and ``__LINE__``, see
`Pass the C filename and line number`_), the Python filename and line `Pass the C filename and line number`_), the Python filename and line
number (or even the Python traceback) is more useful. number (or even the Python traceback) is more useful.
This alternative was rejected because classic tools are unable to This alternative was rejected because classic tools are unable to
introspect Python internals to collect such information. Being able to introspect Python internals to collect such information. Being able to
setup a hook on allocators called with the GIL held allow to collect a setup a hook on allocators called with the GIL held allows to collect a
lot of useful data from Python internals. lot of useful data from Python internals.
@ -606,7 +604,7 @@ is unknown (ex: NULL pointer).
On Windows, this function can be implemented using ``_msize()`` and On Windows, this function can be implemented using ``_msize()`` and
``VirtualQuery()``. ``VirtualQuery()``.
The function can be used to implement an hook tracking the memory usage. The function can be used to implement a hook tracking the memory usage.
The ``free()`` method of an allocator only gets the address of a memory The ``free()`` method of an allocator only gets the address of a memory
block, whereas the size of the memory block is required to update the block, whereas the size of the memory block is required to update the
memory usage. memory usage.
@ -614,9 +612,9 @@ memory usage.
The additional ``msize()`` function was rejected because only few The additional ``msize()`` function was rejected because only few
platforms implement it. For example, Linux with the GNU libc does not platforms implement it. For example, Linux with the GNU libc does not
provide a function to get the size of a memory block. ``msize()`` is not provide a function to get the size of a memory block. ``msize()`` is not
currently used in the Python source code. The function is only used to currently used in the Python source code. The function would only be
track the memory usage, and makes the API more complex. A debug hook can used to track memory use, and make the API more complex. A debug hook
implement the function internally, there is no need to add it to can implement the function internally, there is no need to add it to
``PyMemAllocator`` and ``PyObjectArenaAllocator`` structures. ``PyMemAllocator`` and ``PyObjectArenaAllocator`` structures.
@ -733,9 +731,9 @@ Other allocators:
<https://developer.gnome.org/glib/unstable/glib-Memory-Slices.html>`_: <https://developer.gnome.org/glib/unstable/glib-Memory-Slices.html>`_:
efficient way to allocate groups of equal-sized chunks of memory efficient way to allocate groups of equal-sized chunks of memory
This PEP permits to choose exactly which memory allocator is used for your This PEP allows to choose exactly which memory allocator is used for your
application depending on its usage of the memory (number of allocation, size of application depending on its usage of the memory (number of allocations,
allocations, lifetime of objects, etc.). size of allocations, lifetime of objects, etc.).
Links Links