python-peps/pep-0454.txt

735 lines
22 KiB
Plaintext

PEP: 454
Title: Add a new tracemalloc module to trace Python memory allocations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-September-2013
Python-Version: 3.4
Abstract
========
Add a new ``tracemalloc`` module to trace memory blocks allocated by Python.
Rationale
=========
Common debug tools tracing memory allocations read the C filename and
number. Using such tool to analyze Python memory allocations does not
help because most memory block are allocated in the same C function,
in ``PyMem_Malloc()`` for example.
There are debug tools dedicated to the Python language like ``Heapy``
and ``PySizer``. These projects analyze objects type and/or content.
These tools are useful when most memory leaks are instances of the
same type and this type is only instancied in a few functions. The
problem is when the object type is very common like ``str`` or
``tuple``, and it is hard to identify where these objects are
instancied.
Finding reference cycles is also a difficult problem. There are
different tools to draw a diagram of all references. These tools cannot
be used on large applications with thousands of objects because the
diagram is too huge to be analyzed manually.
Proposal
========
Using the PEP 445, it becomes easy to setup an hook on Python memory
allocators. The hook can inspect the current Python frame to get the
Python filename and line number.
This PEP proposes to add a new ``tracemalloc`` module. It is a debug
tool to trace memory allocations made by Python. The module provides the
following information:
* Compute the differences between two snapshots to detect memory leaks
* Statistics on allocated memory blocks per filename and per line number:
total size, number and average size of allocated memory blocks
* For each allocated memory block: its size and the traceback where the block
was allocated
The API of the tracemalloc module is similar to the API of the
faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()``
functions, an environment variable (``PYTHONFAULTHANDLER`` and
``PYTHONTRACEMALLOC``), a ``-X`` command line option (``-X
faulthandler`` and ``-X tracemalloc``). See the
`documentation of the faulthandler module
<http://docs.python.org/dev/library/faulthandler.html>`_.
The tracemalloc module has been written for CPython. Other
implementations of Python may not provide it.
API
===
To trace most memory blocks allocated by Python, the module should be
enabled as early as possible by calling ``tracemalloc.enable()``
function, by setting the ``PYTHONTRACEMALLOC`` environment variable to
``1``, or by using ``-X tracemalloc`` command line option.
By default, the ``Trace.traceback`` attribute only stores one ``Frame``
instance per allocated memory block. Use ``set_traceback_limit()`` to
store more frames.
Functions
---------
``add_filter(filter)`` function:
Add a new filter on Python memory allocations, *filter* is a
``Filter`` instance.
All inclusive filters are applied at once, a memory allocation is
only ignored if no inclusive filter match its trace. A memory
allocation is ignored if at least one exclusive filter matchs its
trace.
The new filter is not applied on already collected traces. Use
``clear_traces()`` to ensure that all traces match the new filter.
``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
Add an inclusive filter: helper for ``add_filter()`` creating a
``Filter`` instance with ``include`` attribute set to ``True``.
Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)``
only includes memory blocks allocated by the ``tracemalloc`` module.
``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
Add an exclusive filter: helper for ``add_filter()`` creating a
``Filter`` instance with ``include`` attribute set to ``False``.
Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)``
ignores memory blocks allocated by the ``tracemalloc`` module.
``clear_filters()`` function:
Reset the filter list.
``clear_traces()`` function:
Clear all traces and statistics on Python memory allocations, and
reset the ``get_traced_memory()`` counter.
``disable()`` function:
Stop tracing Python memory allocations and stop the timer started by
``start_timer()``.
See also ``enable()`` and ``is_enabled()`` functions.
``enable()`` function:
Start tracing Python memory allocations.
See also ``disable()`` and ``is_enabled()`` functions.
``get_filters()`` function:
Get the filters on Python memory allocations as list of ``Filter``
instances.
``get_traceback_limit()`` function:
Get the maximum number of ``Frame`` instances stored in the
``traceback`` attribute of a ``Trace`` instance.
Use ``set_traceback_limit()`` to change the limit.
``get_object_address(obj)`` function:
Get the address of the memory block of the specified Python object.
``get_object_trace(obj)`` function:
Get the trace of a Python object *obj* as a ``Trace`` instance.
The function only returns the trace of the memory block directly
holding to object. The ``size`` attribute of the trace is smaller
than the total size of the object if the object is composed of more
than one memory block.
Return ``None`` if the ``tracemalloc`` module did not trace the
allocation of the object.
See also ``gc.get_referrers()`` and ``sys.getsizeof()`` functions.
``get_process_memory()`` function:
Get the memory usage of the current process as a meminfo namedtuple
with two attributes:
* ``rss``: Resident Set Size in bytes
* ``vms``: size of the virtual memory in bytes
Return ``None`` if the platform is not supported.
``get_stats()`` function:
Get statistics on traced Python memory blocks as a dictionary
``{filename (str): {line_number (int): stats}}`` where *stats* in a
``TraceStats`` instance, *filename* and *line_number* can be
``None``.
Return an empty dictionary if the ``tracemalloc`` module is
disabled.
``get_traced_memory()`` function:
Get the total size of all traced memory blocks allocated by Python.
``get_tracemalloc_size()`` function:
Get the memory usage in bytes of the ``tracemalloc`` module.
``get_traces(obj)`` function:
Get all traces of Python memory allocations as a dictionary
``{address (int): trace}`` where *trace* is a ``Trace`` instance.
Return an empty dictionary if the ``tracemalloc`` module is
disabled.
``is_enabled()`` function:
``True`` if the ``tracemalloc`` module is tracing Python memory
allocations, ``False`` otherwise.
See also ``enable()`` and ``disable()`` functions.
``start_timer(delay: int, func: callable, args: tuple=(), kwargs: dict={})`` function:
Start a timer calling ``func(*args, **kwargs)`` every *delay*
seconds. Enable the ``tracemalloc`` module if it is disabled. The
timer is based on the Python memory allocator, it is not real time.
*func* is called after at least *delay* seconds, it is not called
exactly after *delay* seconds if no Python memory allocation
occurred. The timer has a resolution of 1 second.
If the ``start_timer()`` function is called twice, previous
parameters are replaced. Call the ``stop_timer()`` function to stop
the timer.
The ``DisplayTopTask.start()`` and ``TakeSnapshot.start()`` methods
use the ``start_timer()`` function to run regulary a task.
``set_traceback_limit(limit: int)`` function:
Set the maximum number of ``Frame`` instances stored in the
``traceback`` attribute of a ``Trace`` instance. Clear all traces
and statistics on Python memory allocations if the ``tracemalloc``
module is enabled,
Storing the traceback of each memory allocation has an important
overhead on the memory usage. Example with the Python test suite:
tracing all memory allocations increases the memory usage by
``+50%`` when storing only 1 frame and ``+150%`` when storing 10
frames. Use ``get_tracemalloc_size()`` to measure the overhead and
``add_filter()`` to select which memory allocations are traced.
Use ``get_traceback_limit()`` to get the current limit.
``stop_timer()`` function:
Stop the timer started by ``start_timer()``.
DisplayTop class
----------------
``DisplayTop()`` class:
Display the top of allocated memory blocks.
``display_snapshot(snapshot, count=10, group_by="filename_lineno", cumulative=False, file=None)`` method:
Display a snapshot of memory blocks allocated by Python, *snapshot*
is a ``Snapshot`` instance.
``display_top_diff(top_diff, count=10, file=None)`` method:
Display differences between two ``GroupedStats`` instances,
*top_diff* is a ``StatsDiff`` instance.
``display_top_stats(top_stats, count=10, file=None)`` method:
Display the top of allocated memory blocks grouped by the
``group_by`` attribute of *top_stats*, *top_stats* is a
``GroupedStats`` instance.
``color`` attribute:
If ``True``, always use colors. If ``False``, never use colors. The
default value is ``None``: use colors if the *file* parameter is a
TTY device.
``compare_with_previous`` attribute:
If ``True`` (default value), compare with the previous snapshot. If
``False``, compare with the first snapshot.
``filename_parts`` attribute:
Number of displayed filename parts (int, default: ``3``). Extra
parts are replaced with ``'...'``.
``show_average`` attribute:
If ``True`` (default value), display the average size of memory blocks.
``show_count`` attribute:
If ``True`` (default value), display the number of allocated memory
blocks.
``show_size`` attribute:
If ``True`` (default value), display the size of memory blocks.
DisplayTopTask class
--------------------
``DisplayTopTask(count=10, group_by="filename_lineno", cumulative=False, file=sys.stdout, user_data_callback=None)`` class:
Task taking temporary snapshots and displaying the top *count*
memory allocations grouped by *group_by*.
Call the ``start()`` method to start the task.
``display()`` method:
Take a snapshot and display the top *count* biggest allocated memory
blocks grouped by *group_by* using the ``display_top`` attribute.
Return the snapshot, a ``Snapshot`` instance.
``start(delay: int)`` method:
Start a task using the ``start_timer()`` function calling the
``display()`` method every *delay* seconds.
``stop()`` method:
Stop the task started by the ``start()`` method using the
``stop_timer()`` function.
``count`` attribute:
Maximum number of displayed memory blocks.
``cumulative`` attribute:
If ``True``, cumulate size and count of memory blocks of all frames
of each ``Trace`` instance, not only the most recent frame. The
default value is ``False``.
The option is ignored if the traceback limit is ``1``, see the
``get_traceback_limit()`` function.
``display_top`` attribute:
Instance of ``DisplayTop``.
``file`` attribute:
The top is written into *file*.
``group_by`` attribute:
Determine how memory allocations are grouped: see
``Snapshot.top_by`` for the available values.
``user_data_callback`` attribute:
Optional callback collecting user data (callable, default:
``None``). See ``Snapshot.create()``.
Filter class
------------
``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class:
Filter to select which memory allocations are traced. Filters can be
used to reduce the memory usage of the ``tracemalloc`` module, which
can be read using ``get_tracemalloc_size()``.
``match_trace(trace)`` method:
Return ``True`` if the ``Trace`` instance must be kept according to
the filter, ``False`` otherwise.
``match(filename: str, lineno: int)`` method:
Return ``True`` if the filename and line number must be kept
according to the filter, ``False`` otherwise.
``match_filename(filename: str)`` method:
Return ``True`` if the filename must be kept according to the
filter, ``False`` otherwise.
``match_lineno(lineno: int)`` method:
Return ``True`` if the line number must be kept according to the
filter, ``False`` otherwise.
``include`` attribute:
If *include* is ``True``, only trace memory blocks allocated in a
file with a name matching filename ``pattern`` at line number
``lineno``. If *include* is ``False``, ignore memory blocks
allocated in a file with a name matching filename :attr`pattern` at
line number ``lineno``.
``pattern`` attribute:
The filename *pattern* can contain one or many ``*`` joker
characters which match any substring, including an empty string. The
``.pyc`` and ``.pyo`` suffixes are replaced with ``.py``. On
Windows, the comparison is case insensitive and the alternative
separator ``/`` is replaced with the standard separator ``\``.
``lineno`` attribute:
Line number (``int``). If is is ``None`` or lesser than ``1``, it
matches any line number.
``traceback`` attribute:
If *traceback* is ``True``, all frames of the ``traceback``
attribute of ``Trace`` instances are checked. If *traceback* is
``False``, only the most recent frame is checked.
This attribute only has an effect on the ``match_trace()`` method
and only if the traceback limit is greater than ``1``. See the
``get_traceback_limit()`` function.
Frame class
-----------
``Frame`` class:
Trace of a Python frame, used by ``Trace.traceback`` attribute.
``filename`` attribute:
Python filename, ``None`` if unknown.
``lineno`` attribute:
Python line number, ``None`` if unknown.
GroupedStats class
------------------
``GroupedStats(stats: dict, group_by: str, cumulative=False, timestamp=None, process_memory=None, tracemalloc_size=None)`` class:
Top of allocated memory blocks grouped by on *group_by* as a
dictionary.
The ``Snapshot.top_by()`` method creates a ``GroupedStats`` instance.
``compare_to(old_stats: GroupedStats=None)`` method:
Compare to an older ``GroupedStats`` instance. Return a
``StatsDiff`` instance.
``cumulative`` attribute:
If ``True``, cumulate size and count of memory blocks of all frames
of ``Trace``, not only the most recent frame.
``group_by`` attribute:
Determine how memory allocations were grouped. The type of ``stats``
keys depends on *group_by*:
===================== ======================== ==============
group_by description key type
===================== ======================== ==============
``'filename'`` filename ``str``
``'filename_lineno'`` filename and line number ``(str, str)``
``'address'`` memory block address ``int``
===================== ======================== ==============
See the *group_by* parameter of the ``Snapshot.top_by()`` method.
``stats`` attribute:
Dictionary ``{key: stats}`` where the *key* type depends on the
``group_by`` attribute and *stats* type is ``TraceStats``.
``process_memory`` attribute:
Result of the ``get_process_memory()`` function, can be ``None``.
``timestamp`` attribute:
Creation date and time of the snapshot, ``datetime.datetime``
instance.
``tracemalloc_size`` attribute:
The memory usage in bytes of the ``tracemalloc`` module, result of
the ``get_tracemalloc_size()`` function.
Snapshot class
--------------
``Snapshot`` class:
Snapshot of memory blocks allocated by Python.
Use ``TakeSnapshot`` to take regulary snapshots.
``apply_filters(filters)`` method:
Apply a list filters on the ``traces`` and ``stats`` dictionaries,
*filters* is a list of ``Filter`` instances.
``create(\*, with_traces=False, with_stats=True, user_data_callback=None)`` classmethod:
Take a snapshot of traces and/or statistics of allocated memory
blocks.
If *with_traces* is ``True``, ``get_traces()`` is called and its
result is stored in the ``traces`` attribute. This attribute
contains more information than ``stats`` and uses more memory and
more disk space. If *with_traces* is ``False``, ``traces`` is set to
``None``.
If *with_stats* is ``True``, ``get_stats()`` is called and its
result is stored in the ``Snapshot.stats`` attribute. If
*with_stats* is ``False``, ``Snapshot.stats`` is set to ``None``.
*with_traces* and *with_stats* cannot be ``False`` at the same time.
*user_data_callback* is an optional callable object. Its result
should be serializable by the ``pickle`` module, or
``Snapshot.write()`` would fail. If *user_data_callback* is set, it
is called and the result is stored in the ``Snapshot.user_data``
attribute. Otherwise, ``Snapshot.user_data`` is set to ``None``.
The ``tracemalloc`` module must be enabled to take a snapshot. See
the ``enable()`` function.
``load(filename)`` classmethod:
Load a snapshot from a file.
``top_by(group_by: str, cumulative: bool=False)`` method:
Compute top statistics grouped by *group_by* as a ``GroupedStats``
instance:
===================== ======================== ==============
group_by description key type
===================== ======================== ==============
``'filename'`` filename ``str``
``'filename_lineno'`` filename and line number ``(str, str)``
``'address'`` memory block address ``int``
===================== ======================== ==============
If *cumulative* is ``True``, cumulate size and count of memory
blocks of all frames of each ``Trace`` instance, not only the most
recent frame. The *cumulative* parameter is ignored if *group_by* is
``'address'`` or if the traceback limit is ``1``. See the
``traceback_limit`` attribute.
``write(filename)`` method:
Write the snapshot into a file.
``pid`` attribute:
Identifier of the process which created the snapshot, result of
``os.getpid()``.
``process_memory`` attribute:
Memory usage of the current process, result of the
``get_process_memory()`` function. It can be ``None``.
``stats`` attribute:
Statistics on traced Python memory, result of the ``get_stats()``
function, if ``create()`` was called with *with_stats* equals to
``True``, ``None`` otherwise.
``tracemalloc_size`` attribute:
The memory usage in bytes of the ``tracemalloc`` module, result of
the ``get_tracemalloc_size()`` function.
``traceback_limit`` attribute:
The maximum number of frames stored in the ``traceback`` attribute
of a ``Trace``, result of the ``get_traceback_limit()`` function.
``traces`` attribute:
Traces of Python memory allocations, result of the ``get_traces()``
function, if ``create()`` was called with *with_traces* equals to
``True``, ``None`` otherwise.
The ``traceback`` attribute of each ``Trace`` instance is limited to
``traceback_limit`` frames.
``timestamp`` attribute:
Creation date and time of the snapshot, ``datetime.datetime``
instance.
``user_data`` attribute:
Result of *user_data_callback* called in ``Snapshot.create()``
(default: ``None``).
StatsDiff class
---------------
``StatsDiff(differences, old_stats, new_stats)`` class:
Differences between two ``GroupedStats`` instances. By default, the
``differences`` list is unsorted: call ``sort()`` to sort it.
The ``GroupedStats.compare_to()`` method creates a ``StatsDiff``
instance.
``sort()`` method:
Sort the ``differences`` list from the biggest allocation to the
smallest. Sort by *size_diff*, *size*, *count_diff*, *count* and
then by *key*.
``differences`` attribute:
Differences between ``old_stats`` and ``new_stats`` as a list of
``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*,
*size*, *count_diff* and *count* are ``int``. The key type depends
on the ``group_by`` attribute of ``new_stats``:
===================== ======================== ==============
group_by description key type
===================== ======================== ==============
``'filename'`` filename ``str``
``'filename_lineno'`` filename and line number ``(str, str)``
``'address'`` memory block address ``int``
===================== ======================== ==============
See the ``group_by`` attribute of the ``GroupedStats`` class.
``old_stats`` attribute:
Old ``GroupedStats`` instance, can be ``None``.
``new_stats`` attribute:
New ``GroupedStats`` instance.
Trace class
-----------
``Trace`` class:
Debug information of a memory block allocated by Python.
``size`` attribute:
Size in bytes of the memory block.
``traceback`` attribute:
Traceback where the memory block was allocated as a list of
``Frame`` instances, most recent first.
The list can be empty or incomplete if the ``tracemalloc`` module
was unable to retrieve the full traceback.
The traceback is limited to ``get_traceback_limit()`` frames. Use
``set_traceback_limit()`` to store more frames.
TraceStats class
----------------
``TraceStats`` class:
Statistics on Python memory allocations.
``size`` attribute:
Total size in bytes of allocated memory blocks.
``count`` attribute:
Number of allocated memory blocks.
Links
=====
tracemalloc:
* `#18874: Add a new tracemalloc module to trace Python
memory allocations <http://bugs.python.org/issue18874>`_
* `pytracemalloc on PyPI
<https://pypi.python.org/pypi/pytracemalloc>`_
Similar projects:
* `Meliae: Python Memory Usage Analyzer
<https://pypi.python.org/pypi/meliae>`_
* `Guppy-PE: umbrella package combining Heapy and GSL
<http://guppy-pe.sourceforge.net/>`_
* `PySizer <http://pysizer.8325.org/>`_: developed for Python 2.4
* `memory_profiler <https://pypi.python.org/pypi/memory_profiler>`_
* `pympler <http://code.google.com/p/pympler/>`_
* `Dozer <https://pypi.python.org/pypi/Dozer>`_: WSGI Middleware version of
the CherryPy memory leak debugger
* `objgraph <http://mg.pov.lt/objgraph/>`_
* `caulk <https://github.com/smartfile/caulk/>`_
Copyright
=========
This document has been placed into the public domain.