PEP 445: textwidth=72 (for email)

This commit is contained in:
Victor Stinner 2013-06-18 22:05:17 +02:00
parent 2d81cffb61
commit d1c9cb312b
1 changed files with 163 additions and 145 deletions

View File

@ -20,19 +20,21 @@ Rationale
Use cases:
* Application embedding Python may want to isolate Python memory from the
memory of the application, or may want to different memory allocator
optimized for its Python usage
* Application embedding Python may want to isolate Python memory from
the memory of the application, or may want to different memory
allocator optimized for its Python usage
* Python running on embedded devices with low memory and slow CPU.
A custom memory allocator may be required to use efficiently the memory
and/or to be able to use all the memory of the device.
A custom memory allocator may be required to use efficiently the
memory and/or to be able to use all the memory of the device.
* Debug tool to:
- track memory usage (memory leaks)
- get the Python filename and line number where an object was allocated
- get the Python filename and line number where an object was
allocated
- detect buffer underflow, buffer overflow and detect misuse of Python
allocator APIs (builtin Python debug hooks)
- force allocation to fail to test handling of ``MemoryError`` exception
- force allocation to fail to test handling of ``MemoryError``
exception
Proposal
@ -46,13 +48,14 @@ API changes
- ``void* PyMem_RawMalloc(size_t size)``
- ``void* PyMem_RawRealloc(void *ptr, size_t new_size)``
- ``void PyMem_RawFree(void *ptr)``
- the behaviour of requesting zero bytes is not defined: return *NULL* or a
distinct non-*NULL* pointer depending on the platform.
- the behaviour of requesting zero bytes is not defined: return *NULL*
or a distinct non-*NULL* pointer depending on the platform.
* Add a new ``PyMemBlockAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 3 functions */
/* user context passed as the first argument
to the 3 functions */
void *ctx;
/* allocate a memory block */
@ -65,32 +68,34 @@ API changes
void (*free) (void *ctx, void *ptr);
} PyMemBlockAllocator;
* Add new functions to get and set internal functions of ``PyMem_RawMalloc()``,
``PyMem_RawRealloc()`` and ``PyMem_RawFree()``:
* Add new functions to get and set internal functions of
``PyMem_RawMalloc()``, ``PyMem_RawRealloc()`` and ``PyMem_RawFree()``:
- ``void PyMem_GetRawAllocator(PyMemBlockAllocator *allocator)``
- ``void PyMem_SetRawAllocator(PyMemBlockAllocator *allocator)``
* Add new functions to get and set internal functions of ``PyMem_Malloc()``,
``PyMem_Realloc()`` and ``PyMem_Free()``:
* Add new functions to get and set internal functions of
``PyMem_Malloc()``, ``PyMem_Realloc()`` and ``PyMem_Free()``:
- ``void PyMem_GetAllocator(PyMemBlockAllocator *allocator)``
- ``void PyMem_SetAllocator(PyMemBlockAllocator *allocator)``
- ``malloc(ctx, 0)`` and ``realloc(ctx, ptr, 0)`` must not return *NULL*:
it would be treated as an error.
- ``malloc(ctx, 0)`` and ``realloc(ctx, ptr, 0)`` must not return
*NULL*: it would be treated as an error.
* Add new functions to get and set internal functions of
``PyObject_Malloc()``,, ``PyObject_Realloc()`` and ``PyObject_Free()``:
``PyObject_Malloc()``,, ``PyObject_Realloc()`` and
``PyObject_Free()``:
- ``void PyObject_GetAllocator(PyMemBlockAllocator *allocator)``
- ``void PyObject_SetAllocator(PyMemBlockAllocator *allocator)``
- ``malloc(ctx, 0)`` and ``realloc(ctx, ptr, 0)`` must not return *NULL*:
it would be treated as an error.
- ``malloc(ctx, 0)`` and ``realloc(ctx, ptr, 0)`` must not return
*NULL*: it would be treated as an error.
* Add a new ``PyMemMappingAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 2 functions */
/* user context passed as the first argument
to the 2 functions */
void *ctx;
/* allocate a memory mapping */
@ -104,26 +109,26 @@ API changes
- ``void PyMem_GetMappingAllocator(PyMemMappingAllocator *allocator)``
- ``void PyMem_SetMappingAllocator(PyMemMappingAllocator *allocator)``
- Currently, this allocator is only used internally by *pymalloc* to allocate
arenas.
- Currently, this allocator is only used internally by *pymalloc* to
allocate arenas.
* Add a new function to setup the builtin Python debug hooks when memory
allocators are replaced:
- ``void PyMem_SetupDebugHooks(void)``
* The following memory allocators always returns *NULL* if size is greater
than ``PY_SSIZE_T_MAX``: ``PyMem_RawMalloc()``, ``PyMem_RawRealloc()``,
``PyMem_Malloc()``, ``PyMem_Realloc()``, ``PyObject_Malloc()``,
``PyObject_Realloc()``.
* The following memory allocators always returns *NULL* if size is
greater than ``PY_SSIZE_T_MAX``: ``PyMem_RawMalloc()``,
``PyMem_RawRealloc()``, ``PyMem_Malloc()``, ``PyMem_Realloc()``,
``PyObject_Malloc()``, ``PyObject_Realloc()``.
The builtin Python debug hooks were introduced in Python 2.3 and implement the
following checks:
The builtin Python debug hooks were introduced in Python 2.3 and
implement the following checks:
* Newly allocated memory is filled with the byte ``0xCB``, freed memory is
filled with the byte ``0xDB``.
* Detect API violations, ex: ``PyObject_Free()`` called on a memory block
allocated by ``PyMem_Malloc()``
* Newly allocated memory is filled with the byte ``0xCB``, freed memory
is filled with the byte ``0xDB``.
* Detect API violations, ex: ``PyObject_Free()`` called on a memory
block allocated by ``PyMem_Malloc()``
* Detect write before the start of the buffer (buffer underflow)
* Detect write after the end of the buffer (buffer overflow)
@ -134,20 +139,20 @@ The *pymalloc* allocator is used by default for:
Make usage of these new APIs
----------------------------
* ``PyMem_Malloc()`` and ``PyMem_Realloc()`` always call ``malloc()`` and
``realloc()``, instead of calling ``PyObject_Malloc()`` and
* ``PyMem_Malloc()`` and ``PyMem_Realloc()`` always call ``malloc()``
and ``realloc()``, instead of calling ``PyObject_Malloc()`` and
``PyObject_Realloc()`` in debug mode
* ``PyObject_Malloc()`` falls back on ``PyMem_Malloc()`` instead of
``malloc()`` if size is greater or equal than ``SMALL_REQUEST_THRESHOLD``
(512 bytes), and ``PyObject_Realloc()`` falls back on ``PyMem_Realloc()``
instead of ``realloc()``
``malloc()`` if size is greater or equal than
``SMALL_REQUEST_THRESHOLD`` (512 bytes), and ``PyObject_Realloc()``
falls back on ``PyMem_Realloc()`` instead of ``realloc()``
* Replace direct calls to ``malloc()`` with ``PyMem_Malloc()``, or
``PyMem_RawMalloc()`` if the GIL is not held
* Configure external libraries like zlib or OpenSSL to allocate memory using
``PyMem_RawMalloc()``
* Configure external libraries like zlib or OpenSSL to allocate memory
using ``PyMem_RawMalloc()``
Examples
@ -213,16 +218,16 @@ Dummy example wasting 2 bytes per allocation, and 10 bytes per arena::
}
.. warning::
Remove the call ``PyMem_SetRawAllocator(&alloc)`` if the new allocator
are not thread-safe.
Remove the call ``PyMem_SetRawAllocator(&alloc)`` if the new
allocator are not thread-safe.
Use case 2: Replace Memory Allocator, override pymalloc
--------------------------------------------------------
If your allocator is optimized for allocation of small objects (less than 512
bytes) with a short lifetime, pymalloc can be overriden: replace
``PyObject_Malloc()``.
If your allocator is optimized for allocation of small objects (less
than 512 bytes) with a short lifetime, pymalloc can be overriden:
replace ``PyObject_Malloc()``.
Dummy Example wasting 2 bytes per allocation::
@ -263,8 +268,8 @@ Dummy Example wasting 2 bytes per allocation::
}
.. warning::
Remove the call ``PyMem_SetRawAllocator(&alloc)`` if the new allocator
are not thread-safe.
Remove the call ``PyMem_SetRawAllocator(&alloc)`` if the new
allocator are not thread-safe.
@ -338,20 +343,21 @@ Example to setup hooks on all memory allocators::
thread-safe.
.. note::
``PyMem_SetupDebugHooks()`` does not need to be called: Python debug hooks
are installed automatically at startup.
``PyMem_SetupDebugHooks()`` does not need to be called: Python debug
hooks are installed automatically at startup.
Performances
============
Results of the `Python benchmarks suite <http://hg.python.org/benchmarks>`_ (-b
2n3): some tests are 1.04x faster, some tests are 1.04 slower, significant is
between 115 and -191. I don't understand these output, but I guess that the
overhead cannot be seen with such test.
Results of the `Python benchmarks suite
<http://hg.python.org/benchmarks>`_ (-b 2n3): some tests are 1.04x
faster, some tests are 1.04 slower, significant is between 115 and -191.
I don't understand these output, but I guess that the overhead cannot be
seen with such test.
Results of pybench benchmark: "+0.1%" slower globally (diff between -4.9% and
+5.6%).
Results of pybench benchmark: "+0.1%" slower globally (diff between
-4.9% and +5.6%).
The full reports are attached to the issue #3329.
@ -390,9 +396,9 @@ Drawback: the caller has to check if the result is 0, or handle the error.
PyMem_Malloc() reuses PyMem_RawMalloc() by default
--------------------------------------------------
``PyMem_Malloc()`` should call ``PyMem_RawMalloc()`` by default. So calling
``PyMem_SetRawAllocator()`` would also also patch ``PyMem_Malloc()``
indirectly.
``PyMem_Malloc()`` should call ``PyMem_RawMalloc()`` by default. So
calling ``PyMem_SetRawAllocator()`` would also also patch
``PyMem_Malloc()`` indirectly.
.. note::
@ -404,39 +410,42 @@ indirectly.
Add a new PYDEBUGMALLOC environment variable
--------------------------------------------
To be able to use the Python builtin debug hooks even when a custom memory
allocator replaces the default Python allocator, an environment variable
``PYDEBUGMALLOC`` can be added to setup these debug function hooks, instead of
adding the new function ``PyMem_SetupDebugHooks()``. If the environment
variable is present, ``PyMem_SetRawAllocator()``, ``PyMem_SetAllocator()``
and ``PyObject_SetAllocator()`` will reinstall automatically the hook on top
of the new allocator.
To be able to use the Python builtin debug hooks even when a custom
memory allocator replaces the default Python allocator, an environment
variable ``PYDEBUGMALLOC`` can be added to setup these debug function
hooks, instead of adding the new function ``PyMem_SetupDebugHooks()``.
If the environment variable is present, ``PyMem_SetRawAllocator()``,
``PyMem_SetAllocator()`` and ``PyObject_SetAllocator()`` will reinstall
automatically the hook on top of the new allocator.
An new environment variable would make the Python initialization even more
complex. The `PEP 432 <http://www.python.org/dev/peps/pep-0432/>`_ tries to
simply the CPython startup sequence.
An new environment variable would make the Python initialization even
more complex. The `PEP 432 <http://www.python.org/dev/peps/pep-0432/>`_
tries to simply the CPython startup sequence.
Use macros to get customizable allocators
-----------------------------------------
To have no overhead in the default configuration, customizable allocators would
be an optional feature enabled by a configuration option or by macros.
To have no overhead in the default configuration, customizable
allocators would be an optional feature enabled by a configuration
option or by macros.
Not having to recompile Python makes debug hooks easier to use in practice.
Extensions modules don't have to be recompiled with macros.
Not having to recompile Python makes debug hooks easier to use in
practice. Extensions modules don't have to be recompiled with macros.
Pass the C filename and line number
-----------------------------------
Define allocator functions using macros and use ``__FILE__`` and ``__LINE__``
to get the C filename and line number of a memory allocation.
Define allocator functions using macros and use ``__FILE__`` and
``__LINE__`` to get the C filename and line number of a memory
allocation.
Example::
typedef struct {
/* user context passed as the first argument to the 3 functions */
/* user context passed as the first argument
to the 3 functions */
void *ctx;
/* allocate a memory block */
@ -452,7 +461,8 @@ Example::
void *ptr);
} PyMemBlockAllocator;
void* _PyMem_MallocTrace(const char *filename, int lineno, size_t size);
void* _PyMem_MallocTrace(const char *filename, int lineno,
size_t size);
/* need also a function for the Python stable ABI */
void* PyMem_Malloc(size_t size);
@ -470,7 +480,8 @@ changes add too much complexity for a little gain.
GIL-free PyMem_Malloc()
-----------------------
When Python is compiled in debug mode, ``PyMem_Malloc()`` calls indirectly ``PyObject_Malloc()`` which requires the GIL to be held.
When Python is compiled in debug mode, ``PyMem_Malloc()`` calls
indirectly ``PyObject_Malloc()`` which requires the GIL to be held.
That's why ``PyMem_Malloc()`` must be called with the GIL held.
This PEP proposes to "fix" ``PyMem_Malloc()`` to make it always call
@ -478,64 +489,65 @@ This PEP proposes to "fix" ``PyMem_Malloc()`` to make it always call
``PyMem_Malloc()``.
Allowing to call ``PyMem_Malloc()`` without holding the GIL might break
applications which setup their own allocators or allocator hooks. Holding the
GIL is convinient to develop a custom allocator: no need to care of other
threads. It is also convinient for a debug allocator hook: Python internal
objects can be safetly inspected.
applications which setup their own allocators or allocator hooks.
Holding the GIL is convinient to develop a custom allocator: no need to
care of other threads. It is also convinient for a debug allocator hook:
Python internal objects can be safetly inspected.
Calling ``PyGILState_Ensure()`` in a memory allocator may have unexpected
behaviour, especially at Python startup and at creation of a new Python thread
state.
Calling ``PyGILState_Ensure()`` in a memory allocator may have
unexpected behaviour, especially at Python startup and at creation of a
new Python thread state.
Don't add PyMem_RawMalloc()
---------------------------
Replace ``malloc()`` with ``PyMem_Malloc()``, but only if the GIL is held.
Otherwise, keep ``malloc()`` unchanged.
Replace ``malloc()`` with ``PyMem_Malloc()``, but only if the GIL is
held. Otherwise, keep ``malloc()`` unchanged.
The ``PyMem_Malloc()`` is used without the GIL held in some Python functions.
For example, the ``main()`` and ``Py_Main()`` functions of Python call
``PyMem_Malloc()`` whereas the GIL do not exist yet. In this case,
``PyMem_Malloc()`` should be replaced with ``malloc()`` (or
The ``PyMem_Malloc()`` is used without the GIL held in some Python
functions. For example, the ``main()`` and ``Py_Main()`` functions of
Python call ``PyMem_Malloc()`` whereas the GIL do not exist yet. In this
case, ``PyMem_Malloc()`` should be replaced with ``malloc()`` (or
``PyMem_RawMalloc()``).
If an hook is used to the track memory usage, the ``malloc()`` memory will not
be seen. Remaining ``malloc()`` may allocate a lot of memory and so would be
missed in reports.
If an hook is used to the track memory usage, the ``malloc()`` memory
will not be seen. Remaining ``malloc()`` may allocate a lot of memory
and so would be missed in reports.
Use existing debug tools to analyze the memory
----------------------------------------------
There are many existing debug tools to analyze the memory. Some examples:
`Valgrind <http://valgrind.org/>`_,
`Purify <http://ibm.com/software/awdtools/purify/>`_,
`Clang AddressSanitizer <http://code.google.com/p/address-sanitizer/>`_,
`failmalloc <http://www.nongnu.org/failmalloc/>`_,
etc.
There are many existing debug tools to analyze the memory. Some
examples: `Valgrind <http://valgrind.org/>`_, `Purify
<http://ibm.com/software/awdtools/purify/>`_, `Clang AddressSanitizer
<http://code.google.com/p/address-sanitizer/>`_, `failmalloc
<http://www.nongnu.org/failmalloc/>`_, etc.
The problem is to retrieve the Python object related to a memory pointer to read
its type and/or content. Another issue is to retrieve the location of the
memory allocation: the C backtrace is usually useless (same reasoning than
macros using ``__FILE__`` and ``__LINE__``), the Python filename and line
number (or even the Python traceback) is more useful.
The problem is to retrieve the Python object related to a memory pointer
to read its type and/or content. Another issue is to retrieve the
location of the memory allocation: the C backtrace is usually useless
(same reasoning than macros using ``__FILE__`` and ``__LINE__``), the
Python filename and line number (or even the Python traceback) is more
useful.
Classic tools are unable to introspect Python internals to collect such
information. Being able to setup a hook on allocators called with the GIL held
allow to collect a lot of useful data from Python internals.
information. Being able to setup a hook on allocators called with the
GIL held allow to collect a lot of useful data from Python internals.
Add msize()
-----------
Add another field to ``PyMemBlockAllocator`` and ``PyMemMappingAllocator``::
Add another field to ``PyMemBlockAllocator`` and
``PyMemMappingAllocator``::
size_t msize(void *ptr);
This function returns the size of a memory block or a memory mapping. Return
(size_t)-1 if the function is not implemented or if the pointer is unknown
(ex: NULL pointer).
This function returns the size of a memory block or a memory mapping.
Return (size_t)-1 if the function is not implemented or if the pointer
is unknown (ex: NULL pointer).
On Windows, this function can be implemented using ``_msize()`` and
``VirtualQuery()``.
@ -544,24 +556,25 @@ On Windows, this function can be implemented using ``_msize()`` and
No context argument
-------------------
Simplify the signature of allocator functions, remove the context argument:
Simplify the signature of allocator functions, remove the context
argument:
* ``void* malloc(size_t size)``
* ``void* realloc(void *ptr, size_t new_size)``
* ``void free(void *ptr)``
It is likely for an allocator hook to be reused for ``PyMem_SetAllocator()``
and ``PyObject_SetAllocator()``, or even ``PyMem_SetRawAllocator()``, but the
hook must call a different function depending on the allocator. The context is
a convenient way to reuse the same custom allocator or hook for different
Python allocators.
It is likely for an allocator hook to be reused for
``PyMem_SetAllocator()`` and ``PyObject_SetAllocator()``, or even
``PyMem_SetRawAllocator()``, but the hook must call a different function
depending on the allocator. The context is a convenient way to reuse the
same custom allocator or hook for different Python allocators.
External libraries
==================
Python should try to reuse the same prototypes for allocator functions than
other libraries.
Python should try to reuse the same prototypes for allocator functions
than other libraries.
Libraries used by Python:
@ -586,36 +599,41 @@ See also the `GNU libc: Memory Allocation Hooks
Memory allocators
=================
The C standard library provides the well known ``malloc()`` function. Its
implementation depends on the platform and of the C library. The GNU C library
uses a modified ptmalloc2, based on "Doug Lea's Malloc" (dlmalloc). FreeBSD
uses `jemalloc <http://www.canonware.com/jemalloc/>`_. Google provides
tcmalloc which is part of `gperftools <http://code.google.com/p/gperftools/>`_.
The C standard library provides the well known ``malloc()`` function.
Its implementation depends on the platform and of the C library. The GNU
C library uses a modified ptmalloc2, based on "Doug Lea's Malloc"
(dlmalloc). FreeBSD uses `jemalloc
<http://www.canonware.com/jemalloc/>`_. Google provides tcmalloc which
is part of `gperftools <http://code.google.com/p/gperftools/>`_.
``malloc()`` uses two kinds of memory: heap and memory mappings. Memory
mappings are usually used for large allocations (ex: larger than 256 KB),
whereas the heap is used for small allocations.
mappings are usually used for large allocations (ex: larger than 256
KB), whereas the heap is used for small allocations.
On UNIX, the heap is handled by ``brk()`` and ``sbrk()`` system calls on Linux,
and it is contiguous. On Windows, the heap is handled by ``HeapAlloc()`` and
may be discontiguous. Memory mappings are handled by ``mmap()`` on UNIX and
``VirtualAlloc()`` on Windows, they may be discontiguous.
On UNIX, the heap is handled by ``brk()`` and ``sbrk()`` system calls on
Linux, and it is contiguous. On Windows, the heap is handled by
``HeapAlloc()`` and may be discontiguous. Memory mappings are handled by
``mmap()`` on UNIX and ``VirtualAlloc()`` on Windows, they may be
discontiguous.
Releasing a memory mapping gives back immediatly the memory to the system. On
UNIX, heap memory is only given back to the system if it is at the end of the
heap. Otherwise, the memory will only be given back to the system when all the
memory located after the released memory are also released.
Releasing a memory mapping gives back immediatly the memory to the
system. On UNIX, heap memory is only given back to the system if it is
at the end of the heap. Otherwise, the memory will only be given back to
the system when all the memory located after the released memory are
also released.
To allocate memory in the heap, the allocator tries to reuse free space. If
there is no contiguous space big enough, the heap must be increased, even if we
have more free space than required size. This issue is called the "memory
fragmentation": the memory usage seen by the system may be much higher than
real usage. On Windows, ``HeapAlloc()`` creates a new memory mapping with
``VirtualAlloc()`` if there is not enough free contiguous memory.
To allocate memory in the heap, the allocator tries to reuse free space.
If there is no contiguous space big enough, the heap must be increased,
even if we have more free space than required size. This issue is
called the "memory fragmentation": the memory usage seen by the system
may be much higher than real usage. On Windows, ``HeapAlloc()`` creates
a new memory mapping with ``VirtualAlloc()`` if there is not enough free
contiguous memory.
CPython has a *pymalloc* allocator for allocations smaller than 512 bytes. This
allocator is optimized for small objects with a short lifetime. It uses memory
mappings called "arenas" with a fixed size of 256 KB.
CPython has a *pymalloc* allocator for allocations smaller than 512
bytes. This allocator is optimized for small objects with a short
lifetime. It uses memory mappings called "arenas" with a fixed size of
256 KB.
Other allocators:
@ -641,10 +659,10 @@ CPython issues related to memory allocation:
<http://bugs.python.org/issue13483>`_
* `Issue #16742: PyOS_Readline drops GIL and calls PyOS_StdioReadline, which
isn't thread safe <http://bugs.python.org/issue16742>`_
* `Issue #18203: Replace calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc()
<http://bugs.python.org/issue18203>`_
* `Issue #18227: Use Python memory allocators in external libraries like zlib
or OpenSSL <http://bugs.python.org/issue18227>`_
* `Issue #18203: Replace calls to malloc() with PyMem_Malloc() or
PyMem_RawMalloc() <http://bugs.python.org/issue18203>`_
* `Issue #18227: Use Python memory allocators in external libraries like
zlib or OpenSSL <http://bugs.python.org/issue18227>`_
Projects analyzing the memory usage of Python applications: