python-peps/pep-0445.txt

548 lines
18 KiB
Plaintext
Raw Normal View History

PEP: 445
Title: Add new APIs to customize memory allocators
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 15-june-2013
Python-Version: 3.4
Abstract
========
2013-06-17 20:46:10 -04:00
Add new APIs to customize memory allocators.
Rationale
=========
Use cases:
2013-06-17 20:46:10 -04:00
* Application embedding Python may want to isolate Python memory from the
memory of the application, or may want to different memory allocator
optimized for its Python usage
* Python running on embedded devices with low memory and slow CPU.
A custom memory allocator may be required to use efficiently the memory
and/or to be able to use all memory of the device.
2013-06-17 20:46:10 -04:00
* Debug tool to:
- track memory leaks
- get the Python filename and line number where an object was allocated
- detect buffer underflow, buffer overflow and detect misuse of Python
allocator APIs (builtin Python debug hooks)
- force allocation to fail to test handling of ``MemoryError`` exception
Proposal
========
2013-06-17 19:02:16 -04:00
API changes
-----------
* Add new GIL-free memory allocator functions:
2013-06-17 19:02:16 -04:00
- ``void* PyMem_RawMalloc(size_t size)``
- ``void* PyMem_RawRealloc(void *ptr, size_t new_size)``
- ``void PyMem_RawFree(void *ptr)``
2013-06-18 08:14:17 -04:00
* Add a new ``PyMemAllocators`` structure::
typedef struct {
/* user context passed as the first argument to the 3 functions */
void *ctx;
/* allocate memory */
void* (*malloc) (void *ctx, size_t size);
/* allocate memory or resize a memory buffer */
void* (*realloc) (void *ctx, void *ptr, size_t new_size);
/* release memory */
void (*free) (void *ctx, void *ptr);
} PyMemAllocators;
* Add new functions to get and set memory block allocators:
2013-06-17 19:02:16 -04:00
- ``void PyMem_GetRawAllocators(PyMemAllocators *allocators)``
- ``void PyMem_SetRawAllocators(PyMemAllocators *allocators)``
- ``void PyMem_GetAllocators(PyMemAllocators *allocators)``
- ``void PyMem_SetAllocators(PyMemAllocators *allocators)``
- ``void PyObject_GetAllocators(PyMemAllocators *allocators)``
- ``void PyObject_SetAllocators(PyMemAllocators *allocators)``
2013-06-18 08:14:17 -04:00
* Add new functions to get and set memory mapping allocators:
2013-06-17 19:02:16 -04:00
- ``void _PyObject_GetArenaAllocators(void **ctx_p, void* (**malloc_p) (void *ctx, size_t size), void (**free_p) (void *ctx, void *ptr, size_t size))``
- ``void _PyObject_SetArenaAllocators(void *ctx, void* (*malloc) (void *ctx, size_t size), void (*free) (void *ctx, void *ptr, size_t size))``
2013-06-18 08:14:17 -04:00
* Add a new function to setup the builtin Python debug hooks when memory
2013-06-17 20:46:10 -04:00
allocators are replaced:
2013-06-17 19:02:16 -04:00
- ``void PyMem_SetupDebugHooks(void)``
2013-06-18 08:14:17 -04:00
.. note::
2013-06-17 19:52:14 -04:00
2013-06-18 08:14:17 -04:00
The builtin Python debug hooks were introduced in Python 2.3 and implement the
following checks:
* Newly allocated memory is filled with the byte 0xCB, freed memory is filled
with the byte 0xDB.
* Detect API violations, ex: ``PyObject_Free()`` called on a memory block
allocated by ``PyMem_Malloc()``
* Detect write before the start of the buffer (buffer underflow)
* Detect write after the end of the buffer (buffer overflow)
Make usage of these new APIs
----------------------------
2013-06-17 19:52:14 -04:00
2013-06-17 20:46:10 -04:00
* ``PyMem_Malloc()`` and ``PyMem_Realloc()`` always call ``malloc()`` and
2013-06-15 22:03:15 -04:00
``realloc()``, instead of calling ``PyObject_Malloc()`` and
``PyObject_Realloc()`` in debug mode
2013-06-17 20:46:10 -04:00
* ``PyObject_Malloc()`` falls back on ``PyMem_Malloc()`` instead of
``malloc()`` if size is greater or equal than ``SMALL_REQUEST_THRESHOLD``
(512 bytes), and ``PyObject_Realloc()`` falls back on ``PyMem_Realloc()``
instead of ``realloc()``
2013-06-17 19:52:14 -04:00
* Replace direct calls to ``malloc()`` with ``PyMem_Malloc()``, or
``PyMem_RawMalloc()`` if the GIL is not held
2013-06-17 20:46:10 -04:00
* Configure external libraries like zlib or OpenSSL to allocate memory using
2013-06-17 19:52:14 -04:00
``PyMem_RawMalloc()``
2013-06-17 19:02:16 -04:00
Examples
========
2013-06-17 19:30:05 -04:00
Use case 1: Replace Memory Allocators, keep pymalloc
----------------------------------------------------
2013-06-17 19:02:16 -04:00
2013-06-17 20:46:10 -04:00
Setup your custom memory allocators, keeping pymalloc. Dummy example wasting 2
bytes per allocation, and 10 bytes per arena::
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
#include <stdlib.h>
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
int alloc_padding = 2;
int arena_padding = 10;
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
void* my_alloc_arena(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void my_free_arena(void *ctx, void *ptr, size_t size)
{
free(ptr);
}
2013-06-17 19:02:16 -04:00
void setup_custom_allocators(void)
{
PyMemAllocators alloc;
2013-06-17 19:30:05 -04:00
alloc.ctx = &alloc_padding;
2013-06-17 19:02:16 -04:00
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetRawAllocators(&alloc);
PyMem_SetAllocators(&alloc);
2013-06-17 19:30:05 -04:00
_PyObject_SetArenaAllocators(&arena_padding,
my_alloc_arena, my_free_arena);
2013-06-17 19:02:16 -04:00
PyMem_SetupDebugHooks();
}
.. warning::
2013-06-17 19:52:14 -04:00
Remove the call ``PyMem_SetRawAllocators(&alloc)`` if the new allocators
2013-06-17 19:30:05 -04:00
are not thread-safe.
2013-06-17 19:02:16 -04:00
2013-06-18 08:14:17 -04:00
Use case 2: Replace Memory Allocators, override pymalloc
--------------------------------------------------------
2013-06-17 19:02:16 -04:00
If your allocator is optimized for allocation of small objects (less than 512
2013-06-18 08:14:17 -04:00
bytes) with a short lifetime, pymalloc can be overriden: replace
``PyObject_Malloc()``.
2013-06-17 20:46:10 -04:00
Dummy Example wasting 2 bytes per allocation::
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
#include <stdlib.h>
int padding = 2;
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
2013-06-17 19:02:16 -04:00
void setup_custom_allocators(void)
{
PyMemAllocators alloc;
2013-06-17 19:30:05 -04:00
alloc.ctx = &padding;
2013-06-17 19:02:16 -04:00
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetRawAllocators(&alloc);
PyMem_SetAllocators(&alloc);
2013-06-17 19:30:05 -04:00
PyObject_SetAllocators(&alloc);
2013-06-17 19:02:16 -04:00
PyMem_SetupDebugHooks();
}
2013-06-17 19:30:05 -04:00
.. warning::
2013-06-17 19:52:14 -04:00
Remove the call ``PyMem_SetRawAllocators(&alloc)`` if the new allocators
2013-06-17 19:30:05 -04:00
are not thread-safe.
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
Use case 3: Setup Allocator Hooks
---------------------------------
2013-06-17 19:02:16 -04:00
2013-06-18 08:14:17 -04:00
Example to setup hooks on all memory allocators::
2013-06-17 19:02:16 -04:00
struct {
PyMemAllocators pymem;
PyMemAllocators pymem_raw;
PyMemAllocators pyobj;
2013-06-17 19:30:05 -04:00
/* ... */
2013-06-17 19:02:16 -04:00
} hook;
2013-06-17 19:30:05 -04:00
static void* hook_malloc(void *ctx, size_t size)
{
PyMemAllocators *alloc = (PyMemAllocators *)ctx;
/* ... */
ptr = alloc->malloc(alloc->ctx, size);
/* ... */
return ptr;
}
2013-06-17 19:02:16 -04:00
2013-06-17 19:30:05 -04:00
static void* hook_realloc(void *ctx, void *ptr, size_t new_size)
{
PyMemAllocators *alloc = (PyMemAllocators *)ctx;
void *ptr2;
/* ... */
ptr2 = alloc->realloc(alloc->ctx, ptr, new_size);
/* ... */
return ptr2;
}
static void hook_free(void *ctx, void *ptr)
{
PyMemAllocators *alloc = (PyMemAllocators *)ctx;
/* ... */
alloc->free(alloc->ctx, ptr);
/* ... */
}
void setup_hooks(void)
2013-06-17 19:02:16 -04:00
{
PyMemAllocators alloc;
2013-06-18 08:14:17 -04:00
static int installed = 0;
2013-06-17 19:30:05 -04:00
2013-06-18 08:14:17 -04:00
if (installed)
2013-06-17 19:30:05 -04:00
return;
2013-06-18 08:14:17 -04:00
installed = 1;
2013-06-17 19:02:16 -04:00
alloc.malloc = hook_malloc;
alloc.realloc = hook_realloc;
alloc.free = hook_free;
2013-06-17 19:30:05 -04:00
PyMem_GetRawAllocators(&hook.pymem_raw);
alloc.ctx = &hook.pymem_raw;
2013-06-17 19:02:16 -04:00
PyMem_SetRawAllocators(&alloc);
2013-06-17 19:30:05 -04:00
PyMem_GetAllocators(&hook.pymem);
alloc.ctx = &hook.pymem;
2013-06-17 19:02:16 -04:00
PyMem_SetAllocators(&alloc);
2013-06-17 19:30:05 -04:00
PyObject_GetAllocators(&hook.pyobj);
alloc.ctx = &hook.pyobj;
2013-06-17 19:02:16 -04:00
PyObject_SetAllocators(&alloc);
}
2013-06-17 19:52:14 -04:00
.. warning::
Remove the call ``PyMem_SetRawAllocators(&alloc)`` if hooks are not
thread-safe.
2013-06-17 19:02:16 -04:00
.. note::
2013-06-17 19:52:14 -04:00
``PyMem_SetupDebugHooks()`` does not need to be called: Python debug hooks
are installed automatically at startup.
2013-06-17 19:02:16 -04:00
Performances
============
2013-06-18 08:14:17 -04:00
Results of the `Python benchmarks suite <http://hg.python.org/benchmarks>`_ (-b
2n3): some tests are 1.04x faster, some tests are 1.04 slower, significant is
between 115 and -191. I don't understand these output, but I guess that the
overhead cannot be seen with such test.
2013-06-18 08:14:17 -04:00
Results of pybench benchmark: "+0.1%" slower globally (diff between -4.9% and
+5.6%).
2013-06-18 08:14:17 -04:00
The full reports are attached to the issue #3329.
Alternatives
============
2013-06-18 08:14:17 -04:00
Only have one generic get/set function
--------------------------------------
Replace the 6 functions:
2013-06-18 08:14:17 -04:00
* ``PyMem_GetRawAllocators()``, ``PyMem_GetAllocators()``, ``PyObject_GetAllocators()``
* ``PyMem_SetRawAllocators(allocators)``, ``PyMem_SetAllocators(allocators)``, ``PyObject_SetAllocators(allocators)``
with 2 functions with an additional *domain* argument:
2013-06-15 22:01:00 -04:00
* ``Py_GetAllocators(domain)``
* ``Py_SetAllocators(domain, allocators)``
where domain is one of these values:
* ``PYALLOC_PYMEM``
* ``PYALLOC_PYMEM_RAW``
* ``PYALLOC_PYOBJECT``
2013-06-18 08:14:17 -04:00
``_PyObject_GetArenaAllocators()`` and ``_PyObject_SetArenaAllocators()`` are
not merged and kept private because their prototypes are different and they are
specific to pymalloc.
2013-06-17 20:46:10 -04:00
Add a new PYDEBUGMALLOC environment variable
--------------------------------------------
2013-06-18 08:14:17 -04:00
To be able to use the Python builtin debug hooks even when a custom memory
allocator replaces the default Python allocator, an environment variable
``PYDEBUGMALLOC`` can be added to setup these debug function hooks, instead of
adding the new function ``PyMem_SetupDebugHooks()``. If the environment
variable is present, ``PyMem_SetRawAllocators()``, ``PyMem_SetAllocators()``
and ``PyObject_SetAllocators()`` will reinstall automatically the hook on top
of the new allocator.
2013-06-17 20:46:10 -04:00
An new environment variable would make the Python initialization even more
complex. The `PEP 432 <http://www.python.org/dev/peps/pep-0432/>`_ tries to
simply the CPython startup sequence.
Use macros to get customizable allocators
-----------------------------------------
To have no overhead in the default configuration, customizable allocators would
be an optional feature enabled by a configuration option or by macros.
2013-06-18 08:14:17 -04:00
Not having to recompile Python makes debug hooks easier to use in practice.
Extensions modules don't have to be recompiled with macros.
2013-06-17 21:00:17 -04:00
Pass the C filename and line number
-----------------------------------
Use C macros using ``__FILE__`` and ``__LINE__`` to get the C filename
and line number of a memory allocation.
2013-06-17 20:46:10 -04:00
Passing a filename and a line number to each allocator makes the API more
2013-06-18 08:14:17 -04:00
complex: pass 3 new arguments, instead of just a context argument, to each
allocator function. The GC allocator functions should also be patched.
``_PyObject_GC_Malloc()`` is used in many C functions for example and so
objects of differenet types would have the same allocation location. Such
changes add too much complexity for a little gain.
2013-06-17 20:46:10 -04:00
No context argument
-------------------
Simplify the signature of allocator functions, remove the context argument:
* ``void* malloc(size_t size)``
* ``void* realloc(void *ptr, size_t new_size)``
* ``void free(void *ptr)``
2013-06-18 08:14:17 -04:00
It is likely for an allocator hook to be reused for ``PyMem_SetAllocators()``
and ``PyObject_SetAllocators()``, but the hook must call a different function
depending on the allocator. The context is a convenient way to reuse the same
allocator or hook for different APIs.
PyMem_Malloc() GIL-free
-----------------------
2013-06-18 08:14:17 -04:00
``PyMem_Malloc()`` must be called with the GIL held because in debug mode, it
calls indirectly ``PyObject_Malloc()`` which requires the GIL to be held. This
PEP proposes to "fix" ``PyMem_Malloc()`` to make it always call ``malloc()``.
So the "GIL must be held" restriction may be removed no ``PyMem_Malloc()``.
2013-06-17 19:52:14 -04:00
Allowing to call ``PyMem_Malloc()`` without holding the GIL might break
applications which setup their own allocator or their allocator hooks. Holding
2013-06-18 08:14:17 -04:00
the GIL is very convinient to develop a custom allocator: no need to care of
other threads nor mutexes. It is also convinient for an allocator hook: Python
internals can be safetly inspected.
Calling ``PyGILState_Ensure()`` in a memory allocator may have unexpected
behaviour, especially at Python startup and at creation of a new Python thread
state.
2013-06-17 19:52:14 -04:00
Don't add PyMem_RawMalloc()
---------------------------
2013-06-18 08:14:17 -04:00
Replace ``malloc()`` with ``PyMem_Malloc()``, but only if the GIL is held.
Otherwise, keep ``malloc()`` unchanged.
2013-06-17 19:52:14 -04:00
The ``PyMem_Malloc()`` is sometimes already misused. For example, the
``main()`` and ``Py_Main()`` functions of Python call ``PyMem_Malloc()``
whereas the GIL do not exist yet. In this case, ``PyMem_Malloc()`` should
2013-06-18 08:14:17 -04:00
be replaced with ``malloc()`` (or ``PyMem_RawMalloc()``).
2013-06-17 19:52:14 -04:00
If an hook is used to the track memory usage, the ``malloc()`` memory will not
be seen. Remaining ``malloc()`` may allocate a lot of memory and so would be
missed in reports.
2013-06-18 08:14:17 -04:00
Use existing debug tools to analyze the memory
----------------------------------------------
There are many existing debug tools to analyze the memory. Some examples:
`Valgrind <http://valgrind.org/>`_,
`Purify <http://ibm.com/software/awdtools/purify/>`_,
`Clang AddressSanitizer <http://code.google.com/p/address-sanitizer/>`_,
`failmalloc <http://www.nongnu.org/failmalloc/>`_,
etc.
The problem is retrieve the Python object related to a memory pointer to read
its type and/or content. Another issue is to retrieve the location of the
memory allocation: the C backtrace is usually useless (same reasoning than
macros using ``__FILE__`` and ``__LINE__``), the Python filename and line
number (or even the Python traceback) is more useful.
Classic tools are unable to introspect the Python internal to collect such
information. Being able to setup a hook on allocators called with the GIL held
allow to read a lot of useful data from Python internals.
2013-06-15 21:49:29 -04:00
External libraries
==================
* glib: `g_mem_set_vtable()
<http://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-mem-set-vtable>`_
2013-06-17 21:00:17 -04:00
* OpenSSL: `CRYPTO_set_mem_functions()
<http://git.openssl.org/gitweb/?p=openssl.git;a=blob;f=crypto/mem.c;h=f7984fa958eb1edd6c61f6667f3f2b29753be662;hb=HEAD#l124>`_
to set memory management functions globally
* expat: `parserCreate()
<http://hg.python.org/cpython/file/cc27d50bd91a/Modules/expat/xmlparse.c#l724>`_
has a per-instance memory handler
* libxml2: `xmlGcMemSetup() <http://xmlsoft.org/html/libxml-xmlmemory.html>`_,
global
2013-06-15 21:49:29 -04:00
2013-06-17 19:02:16 -04:00
See also the `GNU libc: Memory Allocation Hooks
<http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html>`_.
2013-06-15 21:49:29 -04:00
Memory allocators
=================
The C standard library provides the well known ``malloc()`` function. Its
implementation depends on the platform and of the C library. The GNU C library
uses a modified ptmalloc2, based on "Doug Lea's Malloc" (dlmalloc). FreeBSD
uses `jemalloc <http://www.canonware.com/jemalloc/>`_. Google provides
tcmalloc which is part of `gperftools <http://code.google.com/p/gperftools/>`_.
``malloc()`` uses two kinds of memory: heap and memory mappings. Memory
mappings are usually used for large allocations (ex: larger than 256 KB),
whereas the heap is used for small allocations.
The heap is handled by ``brk()`` and ``sbrk()`` system calls on Linux, and is
contiguous. Memory mappings are handled by ``mmap()`` on UNIX and
2013-06-17 20:02:27 -04:00
``VirtualAlloc()`` on Windows, they may be discontiguous.
Releasing a memory mapping gives back immediatly the memory to the system. For
the heap, memory is only given back to the system if it is at the end of the
heap. Otherwise, the memory will only be given back to the system when all the
memory located after the released memory are also released. To allocate memory
in the heap, the allocator tries to reuse free space. If there is no contiguous
space big enough, the heap must be increased, even if we have more free space
than required size. This issue is called the "memory fragmentation": the
memory usage seen by the system may be much higher than real usage.
CPython has a pymalloc allocator using arenas of 256 KB for allocations smaller
than 512 bytes. This allocator is optimized for small objects with a short
lifetime.
2013-06-15 21:49:29 -04:00
Windows provides a `Low-fragmentation Heap
<http://msdn.microsoft.com/en-us/library/windows/desktop/aa366750%28v=vs.85%29.aspx>`_.
The Linux kernel uses `slab allocation
<http://en.wikipedia.org/wiki/Slab_allocation>`_.
The glib library has a `Memory Slice API
<https://developer.gnome.org/glib/unstable/glib-Memory-Slices.html>`_:
efficient way to allocate groups of equal-sized chunks of memory
Links
=====
2013-06-15 21:49:29 -04:00
CPython issues related to memory allocation:
* `Issue #3329: Add new APIs to customize memory allocators
<http://bugs.python.org/issue3329>`_
2013-06-15 21:49:29 -04:00
* `Issue #13483: Use VirtualAlloc to allocate memory arenas
<http://bugs.python.org/issue13483>`_
* `Issue #16742: PyOS_Readline drops GIL and calls PyOS_StdioReadline, which
isn't thread safe <http://bugs.python.org/issue16742>`_
2013-06-17 20:46:10 -04:00
* `Issue #18203: Replace calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc()
2013-06-15 21:49:29 -04:00
<http://bugs.python.org/issue18203>`_
* `Issue #18227: Use Python memory allocators in external libraries like zlib
or OpenSSL <http://bugs.python.org/issue18227>`_
Projects analyzing the memory usage of Python applications:
* `pytracemalloc
<https://pypi.python.org/pypi/pytracemalloc>`_
* `Meliae: Python Memory Usage Analyzer
<https://pypi.python.org/pypi/meliae>`_
* `Guppy-PE: umbrella package combining Heapy and GSL
<http://guppy-pe.sourceforge.net/>`_
* `PySizer (developed for Python 2.4)
<http://pysizer.8325.org/>`_