614 lines
19 KiB
Plaintext
614 lines
19 KiB
Plaintext
PEP: 454
|
||
Title: Add a new tracemalloc module to trace Python memory allocations
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Victor Stinner <victor.stinner@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 3-September-2013
|
||
Python-Version: 3.4
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes to add a new ``tracemalloc`` module to trace memory
|
||
blocks allocated by Python.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Classic generic tools like Valgrind can get the C traceback where a
|
||
memory block was allocated. Using such tools to analyze Python memory
|
||
allocations does not help because most memory blocks are allocated in
|
||
the same C function, in ``PyMem_Malloc()`` for example. Moreover, Python
|
||
has an allocator for small object called "pymalloc" which keeps free
|
||
blocks for efficiency. This is not well handled by these tools.
|
||
|
||
There are debug tools dedicated to the Python language like ``Heapy``
|
||
``Pympler`` and ``Meliae`` which lists all live objects using the
|
||
garbage module (functions like ``gc.get_objects()``,
|
||
``gc.get_referrers()`` and ``gc.get_referents()``), compute their size
|
||
(ex: using ``sys.getsizeof()``) and group objects by type. These tools
|
||
provide a better estimation of the memory usage of an application. They
|
||
are useful when most memory leaks are instances of the same type and
|
||
this type is only instantiated in a few functions. Problems arise when
|
||
the object type is very common like ``str`` or ``tuple``, and it is hard
|
||
to identify where these objects are instantiated.
|
||
|
||
Finding reference cycles is also a difficult problem. There are
|
||
different tools to draw a diagram of all references. These tools
|
||
cannot be used on large applications with thousands of objects because
|
||
the diagram is too huge to be analyzed manually.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
Using the customized allocation API from PEP 445, it becomes easy to
|
||
set up a hook on Python memory allocators. A hook can inspect Python
|
||
internals to retrieve Python tracebacks.
|
||
|
||
This PEP proposes to add a new ``tracemalloc`` module, as a debug tool
|
||
to trace memory blocks allocated by Python. The module provides the
|
||
following information:
|
||
|
||
* Computed differences between two snapshots to detect memory leaks
|
||
* Statistics on allocated memory blocks per filename and per line
|
||
number: total size, number and average size of allocated memory blocks
|
||
* Traceback where a memory block was allocated
|
||
|
||
The API of the tracemalloc module is similar to the API of the
|
||
faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()``
|
||
functions, an environment variable (``PYTHONFAULTHANDLER`` and
|
||
``PYTHONTRACEMALLOC``), and a ``-X`` command line option (``-X
|
||
faulthandler`` and ``-X tracemalloc``). See the
|
||
`documentation of the faulthandler module
|
||
<http://docs.python.org/3/library/faulthandler.html>`_.
|
||
|
||
The idea of tracing memory allocations is not new. It was first
|
||
implemented in the PySizer project in 2005. PySizer was implemented
|
||
differently: the traceback was stored in frame objects and some Python
|
||
types were linked the trace with the name of object type. PySizer patch
|
||
on CPython adds a overhead on performances and memory footprint, even if
|
||
the PySizer was not used. tracemalloc attachs a traceback to the
|
||
underlying layer, to memory blocks, and has no overhead when the module
|
||
is disabled.
|
||
|
||
The tracemalloc module has been written for CPython. Other
|
||
implementations of Python may not be able to provide it.
|
||
|
||
|
||
API
|
||
===
|
||
|
||
Main Functions
|
||
--------------
|
||
|
||
``clear_traces()`` function:
|
||
|
||
Clear traces and statistics on Python memory allocations, and reset
|
||
the ``get_traced_memory()`` counter.
|
||
|
||
|
||
``disable()`` function:
|
||
|
||
Stop tracing Python memory allocations.
|
||
|
||
See also ``enable()`` and ``is_enabled()`` functions.
|
||
|
||
|
||
``enable()`` function:
|
||
|
||
Start tracing Python memory allocations.
|
||
|
||
At fork, the module is automatically disabled in the child process.
|
||
|
||
See also ``disable()`` and ``is_enabled()`` functions.
|
||
|
||
|
||
``get_stats()`` function:
|
||
|
||
Get statistics on traced Python memory blocks as a dictionary
|
||
``{filename (str): {line_number (int): stats}}`` where *stats* in a
|
||
``(size: int, count: int)`` tuple, *filename* and *line_number* can
|
||
be ``None``.
|
||
|
||
Return an empty dictionary if the ``tracemalloc`` module is
|
||
disabled.
|
||
|
||
See also the ``get_traces()`` function.
|
||
|
||
|
||
``get_traced_memory()`` function:
|
||
|
||
Get the current size and maximum size of memory blocks traced by the
|
||
``tracemalloc`` module as a tuple: ``(size: int, max_size: int)``.
|
||
|
||
|
||
``get_tracemalloc_memory()`` function:
|
||
|
||
Get the memory usage in bytes of the ``tracemalloc`` module as a
|
||
tuple: ``(size: int, free: int)``.
|
||
|
||
* *size*: total size of bytes allocated by the module,
|
||
including *free* bytes
|
||
* *free*: number of free bytes available to store data
|
||
|
||
|
||
``is_enabled()`` function:
|
||
|
||
``True`` if the ``tracemalloc`` module is tracing Python memory
|
||
allocations, ``False`` otherwise.
|
||
|
||
See also ``enable()`` and ``disable()`` functions.
|
||
|
||
|
||
Trace Functions
|
||
---------------
|
||
|
||
``get_traceback_limit()`` function:
|
||
|
||
Get the maximum number of frames stored in the traceback of a trace
|
||
of a memory block.
|
||
|
||
Use the ``set_traceback_limit()`` function to change the limit.
|
||
|
||
|
||
``get_object_address(obj)`` function:
|
||
|
||
Get the address of the main memory block of the specified Python object.
|
||
|
||
A Python object can be composed by multiple memory blocks, the
|
||
function only returns the address of the main memory block.
|
||
|
||
See also ``get_object_trace()`` and ``gc.get_referrers()`` functions.
|
||
|
||
|
||
``get_object_trace(obj)`` function:
|
||
|
||
Get the trace of a Python object *obj* as a ``(size: int,
|
||
traceback)`` tuple where *traceback* is a tuple of ``(filename: str,
|
||
lineno: int)`` tuples, *filename* and *lineno* can be ``None``.
|
||
|
||
The function only returns the trace of the main memory block of the
|
||
object. The *size* of the trace is smaller than the total size of
|
||
the object if the object is composed by more than one memory block.
|
||
|
||
Return ``None`` if the ``tracemalloc`` module did not trace the
|
||
allocation of the object.
|
||
|
||
See also ``get_object_address()``, ``get_trace()``,
|
||
``get_traces()``, ``gc.get_referrers()`` and ``sys.getsizeof()``
|
||
functions.
|
||
|
||
|
||
``get_trace(address)`` function:
|
||
|
||
Get the trace of a memory block as a ``(size: int, traceback)``
|
||
tuple where *traceback* is a tuple of ``(filename: str, lineno:
|
||
int)`` tuples, *filename* and *lineno* can be ``None``.
|
||
|
||
Return ``None`` if the ``tracemalloc`` module did not trace the
|
||
allocation of the memory block.
|
||
|
||
See also ``get_object_trace()``, ``get_stats()`` and
|
||
``get_traces()`` functions.
|
||
|
||
|
||
``get_traces()`` function:
|
||
|
||
Get traces of Python memory allocations as a dictionary ``{address
|
||
(int): trace}`` where *trace* is a ``(size: int, traceback)`` and
|
||
*traceback* is a list of ``(filename: str, lineno: int)``.
|
||
*traceback* can be empty, *filename* and *lineno* can be None.
|
||
|
||
Return an empty dictionary if the ``tracemalloc`` module is disabled.
|
||
|
||
See also ``get_object_trace()``, ``get_stats()`` and ``get_trace()``
|
||
functions.
|
||
|
||
|
||
``set_traceback_limit(nframe: int)`` function:
|
||
|
||
Set the maximum number of frames stored in the traceback of a trace
|
||
of a memory block.
|
||
|
||
Storing the traceback of each memory allocation has an important
|
||
overhead on the memory usage. Use the ``get_tracemalloc_memory()``
|
||
function to measure the overhead and the ``add_filter()`` function
|
||
to select which memory allocations are traced.
|
||
|
||
Use the ``get_traceback_limit()`` function to get the current limit.
|
||
|
||
|
||
Filter Functions
|
||
----------------
|
||
|
||
``add_filter(filter)`` function:
|
||
|
||
Add a new filter on Python memory allocations, *filter* is a
|
||
``Filter`` instance.
|
||
|
||
All inclusive filters are applied at once, a memory allocation is
|
||
only ignored if no inclusive filters match its trace. A memory
|
||
allocation is ignored if at least one exclusive filter matchs its
|
||
trace.
|
||
|
||
The new filter is not applied on already collected traces. Use the
|
||
``clear_traces()`` function to ensure that all traces match the new
|
||
filter.
|
||
|
||
``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
|
||
|
||
Add an inclusive filter: helper for the ``add_filter()`` method
|
||
creating a ``Filter`` instance with the ``Filter.include`` attribute
|
||
set to ``True``.
|
||
|
||
Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)``
|
||
only includes memory blocks allocated by the ``tracemalloc`` module.
|
||
|
||
|
||
``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
|
||
|
||
Add an exclusive filter: helper for the ``add_filter()`` method
|
||
creating a ``Filter`` instance with the ``Filter.include`` attribute
|
||
set to ``False``.
|
||
|
||
Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)``
|
||
ignores memory blocks allocated by the ``tracemalloc`` module.
|
||
|
||
|
||
``clear_filters()`` function:
|
||
|
||
Reset the filter list.
|
||
|
||
See also the ``get_filters()`` function.
|
||
|
||
|
||
``get_filters()`` function:
|
||
|
||
Get the filters on Python memory allocations as list of ``Filter``
|
||
instances.
|
||
|
||
See also the ``clear_filters()`` function.
|
||
|
||
|
||
Filter
|
||
------
|
||
|
||
``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class:
|
||
|
||
Filter to select which memory allocations are traced. Filters can be
|
||
used to reduce the memory usage of the ``tracemalloc`` module, which
|
||
can be read using the ``get_tracemalloc_memory()`` function.
|
||
|
||
``match(filename: str, lineno: int)`` method:
|
||
|
||
Return ``True`` if the filter matchs the filename and line number,
|
||
``False`` otherwise.
|
||
|
||
``match_filename(filename: str)`` method:
|
||
|
||
Return ``True`` if the filter matchs the filename, ``False`` otherwise.
|
||
|
||
``match_lineno(lineno: int)`` method:
|
||
|
||
Return ``True`` if the filter matchs the line number, ``False``
|
||
otherwise.
|
||
|
||
``match_traceback(traceback)`` method:
|
||
|
||
Return ``True`` if the filter matchs the *traceback*, ``False``
|
||
otherwise.
|
||
|
||
*traceback* is a tuple of ``(filename: str, lineno: int)`` tuples.
|
||
|
||
``include`` attribute:
|
||
|
||
If *include* is ``True``, only trace memory blocks allocated in a
|
||
file with a name matching filename ``pattern`` at line number
|
||
``lineno``.
|
||
|
||
If *include* is ``False``, ignore memory blocks allocated in a file
|
||
with a name matching filename ``pattern`` at line number ``lineno``.
|
||
|
||
``lineno`` attribute:
|
||
|
||
Line number (``int``). If is is ``None`` or less than ``1``, it
|
||
matches any line number.
|
||
|
||
``pattern`` attribute:
|
||
|
||
The filename *pattern* can contain one or many ``*`` joker
|
||
characters which match any substring, including an empty string. The
|
||
``.pyc`` and ``.pyo`` file extensions are replaced with ``.py``. On
|
||
Windows, the comparison is case insensitive and the alternative
|
||
separator ``/`` is replaced with the standard separator ``\``.
|
||
|
||
``traceback`` attribute:
|
||
|
||
If *traceback* is ``True``, all frames of the traceback are checked.
|
||
If *traceback* is ``False``, only the most recent frame is checked.
|
||
|
||
This attribute is ignored if the traceback limit is less than ``2``.
|
||
See the ``get_traceback_limit()`` function.
|
||
|
||
|
||
GroupedStats
|
||
------------
|
||
|
||
``GroupedStats(timestamp: datetime.datetime, stats: dict, group_by: str, cumulative=False, metrics: dict=None)`` class:
|
||
|
||
Top of allocated memory blocks grouped by *group_by* as a
|
||
dictionary.
|
||
|
||
The ``Snapshot.top_by()`` method creates a ``GroupedStats``
|
||
instance.
|
||
|
||
``compare_to(old_stats: GroupedStats=None)`` method:
|
||
|
||
Compare to an older ``GroupedStats`` instance. Return a
|
||
``StatsDiff`` instance.
|
||
|
||
The ``StatsDiff.differences`` list is not sorted: call the
|
||
``StatsDiff.sort()`` method to sort the list.
|
||
|
||
``None`` values are replaced with an empty string for filenames or
|
||
zero for line numbers, because ``str`` and ``int`` cannot be
|
||
compared to ``None``.
|
||
|
||
``cumulative`` attribute:
|
||
|
||
If ``True``, cumulate size and count of memory blocks of all frames
|
||
of the traceback of a trace, not only the most recent frame.
|
||
|
||
``metrics`` attribute:
|
||
|
||
Dictionary storing metrics read when the snapshot was created:
|
||
``{name (str): metric}`` where *metric* type is ``Metric``.
|
||
|
||
``group_by`` attribute:
|
||
|
||
Determine how memory allocations were grouped: see
|
||
``Snapshot.top_by()`` for the available values.
|
||
|
||
``stats`` attribute:
|
||
|
||
Dictionary ``{key: stats}`` where the *key* type depends on the
|
||
``group_by`` attribute and *stats* is a ``(size: int, count: int)``
|
||
tuple.
|
||
|
||
See the ``Snapshot.top_by()`` method.
|
||
|
||
``timestamp`` attribute:
|
||
|
||
Creation date and time of the snapshot, ``datetime.datetime``
|
||
instance.
|
||
|
||
|
||
Metric
|
||
------
|
||
|
||
``Metric(name: str, value: int, format: str)`` class:
|
||
|
||
Value of a metric when a snapshot is created.
|
||
|
||
``name`` attribute:
|
||
|
||
Name of the metric.
|
||
|
||
``value`` attribute:
|
||
|
||
Value of the metric.
|
||
|
||
``format`` attribute:
|
||
|
||
Format of the metric (``str``).
|
||
|
||
|
||
Snapshot
|
||
--------
|
||
|
||
``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: dict=None)`` class:
|
||
|
||
Snapshot of traces and statistics on memory blocks allocated by Python.
|
||
|
||
``add_metric(name: str, value: int, format: str)`` method:
|
||
|
||
Helper to add a ``Metric`` instance to ``Snapshot.metrics``. Return
|
||
the newly created ``Metric`` instance.
|
||
|
||
Raise an exception if the name is already present in
|
||
``Snapshot.metrics``.
|
||
|
||
|
||
``apply_filters(filters)`` method:
|
||
|
||
Apply filters on the ``traces`` and ``stats`` dictionaries,
|
||
*filters* is a list of ``Filter`` instances.
|
||
|
||
|
||
``create(traces=False)`` classmethod:
|
||
|
||
Take a snapshot of traces and/or statistics of allocated memory blocks.
|
||
|
||
If *traces* is ``True``, ``get_traces()`` is called and its result
|
||
is stored in the ``Snapshot.traces`` attribute. This attribute
|
||
contains more information than ``Snapshot.stats`` and uses more
|
||
memory and more disk space. If *traces* is ``False``,
|
||
``Snapshot.traces`` is set to ``None``.
|
||
|
||
Tracebacks of traces are limited to ``traceback_limit`` frames. Call
|
||
``set_traceback_limit()`` before calling ``Snapshot.create()`` to
|
||
store more frames.
|
||
|
||
The ``tracemalloc`` module must be enabled to take a snapshot. See
|
||
the the ``enable()`` function.
|
||
|
||
``get_metric(name, default=None)`` method:
|
||
|
||
Get the value of the metric called *name*. Return *default* if the
|
||
metric does not exist.
|
||
|
||
|
||
``load(filename, traces=True)`` classmethod:
|
||
|
||
Load a snapshot from a file.
|
||
|
||
If *traces* is ``False``, don't load traces.
|
||
|
||
|
||
``top_by(group_by: str, cumulative: bool=False)`` method:
|
||
|
||
Compute top statistics grouped by *group_by* as a ``GroupedStats``
|
||
instance:
|
||
|
||
===================== ======================== ================================
|
||
group_by description key type
|
||
===================== ======================== ================================
|
||
``'filename'`` filename ``str``
|
||
``'line'`` filename and line number ``(filename: str, lineno: int)``
|
||
``'address'`` memory block address ``int``
|
||
``'traceback'`` traceback ``(address: int, traceback)``
|
||
===================== ======================== ================================
|
||
|
||
The ``traceback`` type is a tuple of ``(filename: str, lineno:
|
||
int)`` tuples, *filename* and *lineno* can be ``None``.
|
||
|
||
If *cumulative* is ``True``, cumulate size and count of memory
|
||
blocks of all frames of the traceback of a trace, not only the most
|
||
recent frame. The *cumulative* parameter is ignored if *group_by*
|
||
is ``'address'`` or if the traceback limit is less than ``2``.
|
||
|
||
|
||
``write(filename)`` method:
|
||
|
||
Write the snapshot into a file.
|
||
|
||
|
||
``metrics`` attribute:
|
||
|
||
Dictionary storing metrics read when the snapshot was created:
|
||
``{name (str): metric}`` where *metric* type is ``Metric``.
|
||
|
||
``stats`` attribute:
|
||
|
||
Statistics on traced Python memory, result of the ``get_stats()``
|
||
function.
|
||
|
||
``traceback_limit`` attribute:
|
||
|
||
Maximum number of frames stored in a trace of a memory block
|
||
allocated by Python.
|
||
|
||
``traces`` attribute:
|
||
|
||
Traces of Python memory allocations, result of the ``get_traces()``
|
||
function, can be ``None``.
|
||
|
||
``timestamp`` attribute:
|
||
|
||
Creation date and time of the snapshot, ``datetime.datetime``
|
||
instance.
|
||
|
||
|
||
StatsDiff
|
||
---------
|
||
|
||
``StatsDiff(differences, old_stats, new_stats)`` class:
|
||
|
||
Differences between two ``GroupedStats`` instances.
|
||
|
||
The ``GroupedStats.compare_to()`` method creates a ``StatsDiff``
|
||
instance.
|
||
|
||
``sort()`` method:
|
||
|
||
Sort the ``differences`` list from the biggest difference to the
|
||
smallest difference. Sort by ``abs(size_diff)``, *size*,
|
||
``abs(count_diff)``, *count* and then by *key*.
|
||
|
||
``differences`` attribute:
|
||
|
||
Differences between ``old_stats`` and ``new_stats`` as a list of
|
||
``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*,
|
||
*size*, *count_diff* and *count* are ``int``. The key type depends
|
||
on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the
|
||
``Snapshot.top_by()`` method.
|
||
|
||
``old_stats`` attribute:
|
||
|
||
Old ``GroupedStats`` instance, can be ``None``.
|
||
|
||
``new_stats`` attribute:
|
||
|
||
New ``GroupedStats`` instance.
|
||
|
||
|
||
Prior Work
|
||
==========
|
||
|
||
* `Python Memory Validator
|
||
<http://www.softwareverify.com/python/memory/index.html>`_ (2005-2013):
|
||
commercial Python memory validator developed by Software Verification.
|
||
It uses the Python Reflection API.
|
||
* `PySizer <http://pysizer.8325.org/>`_: Google Summer of Code 2005 project by
|
||
Nick Smallbone.
|
||
* `Heapy
|
||
<http://guppy-pe.sourceforge.net/>`_ (2006-2013):
|
||
part of the Guppy-PE project written by Sverker Nilsson.
|
||
* Draft PEP: `Support Tracking Low-Level Memory Usage in CPython
|
||
<http://svn.python.org/projects/python/branches/bcannon-sandboxing/PEP.txt>`_
|
||
(Brett Canon, 2006)
|
||
* Muppy: project developed in 2008 by Robert Schuppenies.
|
||
* `asizeof <http://code.activestate.com/recipes/546530/>`_:
|
||
a pure Python module to estimate the size of objects by Jean
|
||
Brouwers (2008).
|
||
* `Heapmonitor <http://www.scons.org/wiki/LudwigHaehne/HeapMonitor>`_:
|
||
It provides facilities to size individual objects and can track all objects
|
||
of certain classes. It was developed in 2008 by Ludwig Haehne.
|
||
* `Pympler <http://code.google.com/p/pympler/>`_ (2008-2011):
|
||
project based on asizeof, muppy and HeapMonitor
|
||
* `objgraph <http://mg.pov.lt/objgraph/>`_ (2008-2012)
|
||
* `Dozer <https://pypi.python.org/pypi/Dozer>`_: WSGI Middleware version
|
||
of the CherryPy memory leak debugger, written by Marius Gedminas (2008-2013)
|
||
* `Meliae
|
||
<https://pypi.python.org/pypi/meliae>`_:
|
||
Python Memory Usage Analyzer developed by John A Meinel since 2009
|
||
* `caulk <https://github.com/smartfile/caulk/>`_: written by Ben Timby in 2012
|
||
* `memory_profiler <https://pypi.python.org/pypi/memory_profiler>`_:
|
||
written by Fabian Pedregosa (2011-2013)
|
||
|
||
See also `Pympler Related Work
|
||
<http://pythonhosted.org/Pympler/related.html>`_.
|
||
|
||
|
||
Links
|
||
=====
|
||
|
||
tracemalloc:
|
||
|
||
* `#18874: Add a new tracemalloc module to trace Python
|
||
memory allocations <http://bugs.python.org/issue18874>`_
|
||
* `pytracemalloc on PyPI
|
||
<https://pypi.python.org/pypi/pytracemalloc>`_
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|