python-peps/pep-0454.txt

614 lines
19 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 454
Title: Add a new tracemalloc module to trace Python memory allocations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-September-2013
Python-Version: 3.4
Abstract
========
This PEP proposes to add a new ``tracemalloc`` module to trace memory
blocks allocated by Python.
Rationale
=========
Classic generic tools like Valgrind can get the C traceback where a
memory block was allocated. Using such tools to analyze Python memory
allocations does not help because most memory blocks are allocated in
the same C function, in ``PyMem_Malloc()`` for example. Moreover, Python
has an allocator for small object called "pymalloc" which keeps free
blocks for efficiency. This is not well handled by these tools.
There are debug tools dedicated to the Python language like ``Heapy``
``Pympler`` and ``Meliae`` which lists all live objects using the
garbage module (functions like ``gc.get_objects()``,
``gc.get_referrers()`` and ``gc.get_referents()``), compute their size
(ex: using ``sys.getsizeof()``) and group objects by type. These tools
provide a better estimation of the memory usage of an application. They
are useful when most memory leaks are instances of the same type and
this type is only instantiated in a few functions. Problems arise when
the object type is very common like ``str`` or ``tuple``, and it is hard
to identify where these objects are instantiated.
Finding reference cycles is also a difficult problem. There are
different tools to draw a diagram of all references. These tools
cannot be used on large applications with thousands of objects because
the diagram is too huge to be analyzed manually.
Proposal
========
Using the customized allocation API from PEP 445, it becomes easy to
set up a hook on Python memory allocators. A hook can inspect Python
internals to retrieve Python tracebacks.
This PEP proposes to add a new ``tracemalloc`` module, as a debug tool
to trace memory blocks allocated by Python. The module provides the
following information:
* Computed differences between two snapshots to detect memory leaks
* Statistics on allocated memory blocks per filename and per line
number: total size, number and average size of allocated memory blocks
* Traceback where a memory block was allocated
The API of the tracemalloc module is similar to the API of the
faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()``
functions, an environment variable (``PYTHONFAULTHANDLER`` and
``PYTHONTRACEMALLOC``), and a ``-X`` command line option (``-X
faulthandler`` and ``-X tracemalloc``). See the
`documentation of the faulthandler module
<http://docs.python.org/3/library/faulthandler.html>`_.
The idea of tracing memory allocations is not new. It was first
implemented in the PySizer project in 2005. PySizer was implemented
differently: the traceback was stored in frame objects and some Python
types were linked the trace with the name of object type. PySizer patch
on CPython adds a overhead on performances and memory footprint, even if
the PySizer was not used. tracemalloc attachs a traceback to the
underlying layer, to memory blocks, and has no overhead when the module
is disabled.
The tracemalloc module has been written for CPython. Other
implementations of Python may not be able to provide it.
API
===
Main Functions
--------------
``clear_traces()`` function:
Clear traces and statistics on Python memory allocations, and reset
the ``get_traced_memory()`` counter.
``disable()`` function:
Stop tracing Python memory allocations.
See also ``enable()`` and ``is_enabled()`` functions.
``enable()`` function:
Start tracing Python memory allocations.
At fork, the module is automatically disabled in the child process.
See also ``disable()`` and ``is_enabled()`` functions.
``get_stats()`` function:
Get statistics on traced Python memory blocks as a dictionary
``{filename (str): {line_number (int): stats}}`` where *stats* in a
``(size: int, count: int)`` tuple, *filename* and *line_number* can
be ``None``.
Return an empty dictionary if the ``tracemalloc`` module is
disabled.
See also the ``get_traces()`` function.
``get_traced_memory()`` function:
Get the current size and maximum size of memory blocks traced by the
``tracemalloc`` module as a tuple: ``(size: int, max_size: int)``.
``get_tracemalloc_memory()`` function:
Get the memory usage in bytes of the ``tracemalloc`` module as a
tuple: ``(size: int, free: int)``.
* *size*: total size of bytes allocated by the module,
including *free* bytes
* *free*: number of free bytes available to store data
``is_enabled()`` function:
``True`` if the ``tracemalloc`` module is tracing Python memory
allocations, ``False`` otherwise.
See also ``enable()`` and ``disable()`` functions.
Trace Functions
---------------
``get_traceback_limit()`` function:
Get the maximum number of frames stored in the traceback of a trace
of a memory block.
Use the ``set_traceback_limit()`` function to change the limit.
``get_object_address(obj)`` function:
Get the address of the main memory block of the specified Python object.
A Python object can be composed by multiple memory blocks, the
function only returns the address of the main memory block.
See also ``get_object_trace()`` and ``gc.get_referrers()`` functions.
``get_object_trace(obj)`` function:
Get the trace of a Python object *obj* as a ``(size: int,
traceback)`` tuple where *traceback* is a tuple of ``(filename: str,
lineno: int)`` tuples, *filename* and *lineno* can be ``None``.
The function only returns the trace of the main memory block of the
object. The *size* of the trace is smaller than the total size of
the object if the object is composed by more than one memory block.
Return ``None`` if the ``tracemalloc`` module did not trace the
allocation of the object.
See also ``get_object_address()``, ``get_trace()``,
``get_traces()``, ``gc.get_referrers()`` and ``sys.getsizeof()``
functions.
``get_trace(address)`` function:
Get the trace of a memory block as a ``(size: int, traceback)``
tuple where *traceback* is a tuple of ``(filename: str, lineno:
int)`` tuples, *filename* and *lineno* can be ``None``.
Return ``None`` if the ``tracemalloc`` module did not trace the
allocation of the memory block.
See also ``get_object_trace()``, ``get_stats()`` and
``get_traces()`` functions.
``get_traces()`` function:
Get traces of Python memory allocations as a dictionary ``{address
(int): trace}`` where *trace* is a ``(size: int, traceback)`` and
*traceback* is a list of ``(filename: str, lineno: int)``.
*traceback* can be empty, *filename* and *lineno* can be None.
Return an empty dictionary if the ``tracemalloc`` module is disabled.
See also ``get_object_trace()``, ``get_stats()`` and ``get_trace()``
functions.
``set_traceback_limit(nframe: int)`` function:
Set the maximum number of frames stored in the traceback of a trace
of a memory block.
Storing the traceback of each memory allocation has an important
overhead on the memory usage. Use the ``get_tracemalloc_memory()``
function to measure the overhead and the ``add_filter()`` function
to select which memory allocations are traced.
Use the ``get_traceback_limit()`` function to get the current limit.
Filter Functions
----------------
``add_filter(filter)`` function:
Add a new filter on Python memory allocations, *filter* is a
``Filter`` instance.
All inclusive filters are applied at once, a memory allocation is
only ignored if no inclusive filters match its trace. A memory
allocation is ignored if at least one exclusive filter matchs its
trace.
The new filter is not applied on already collected traces. Use the
``clear_traces()`` function to ensure that all traces match the new
filter.
``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
Add an inclusive filter: helper for the ``add_filter()`` method
creating a ``Filter`` instance with the ``Filter.include`` attribute
set to ``True``.
Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)``
only includes memory blocks allocated by the ``tracemalloc`` module.
``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function:
Add an exclusive filter: helper for the ``add_filter()`` method
creating a ``Filter`` instance with the ``Filter.include`` attribute
set to ``False``.
Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)``
ignores memory blocks allocated by the ``tracemalloc`` module.
``clear_filters()`` function:
Reset the filter list.
See also the ``get_filters()`` function.
``get_filters()`` function:
Get the filters on Python memory allocations as list of ``Filter``
instances.
See also the ``clear_filters()`` function.
Filter
------
``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class:
Filter to select which memory allocations are traced. Filters can be
used to reduce the memory usage of the ``tracemalloc`` module, which
can be read using the ``get_tracemalloc_memory()`` function.
``match(filename: str, lineno: int)`` method:
Return ``True`` if the filter matchs the filename and line number,
``False`` otherwise.
``match_filename(filename: str)`` method:
Return ``True`` if the filter matchs the filename, ``False`` otherwise.
``match_lineno(lineno: int)`` method:
Return ``True`` if the filter matchs the line number, ``False``
otherwise.
``match_traceback(traceback)`` method:
Return ``True`` if the filter matchs the *traceback*, ``False``
otherwise.
*traceback* is a tuple of ``(filename: str, lineno: int)`` tuples.
``include`` attribute:
If *include* is ``True``, only trace memory blocks allocated in a
file with a name matching filename ``pattern`` at line number
``lineno``.
If *include* is ``False``, ignore memory blocks allocated in a file
with a name matching filename ``pattern`` at line number ``lineno``.
``lineno`` attribute:
Line number (``int``). If is is ``None`` or less than ``1``, it
matches any line number.
``pattern`` attribute:
The filename *pattern* can contain one or many ``*`` joker
characters which match any substring, including an empty string. The
``.pyc`` and ``.pyo`` file extensions are replaced with ``.py``. On
Windows, the comparison is case insensitive and the alternative
separator ``/`` is replaced with the standard separator ``\``.
``traceback`` attribute:
If *traceback* is ``True``, all frames of the traceback are checked.
If *traceback* is ``False``, only the most recent frame is checked.
This attribute is ignored if the traceback limit is less than ``2``.
See the ``get_traceback_limit()`` function.
GroupedStats
------------
``GroupedStats(timestamp: datetime.datetime, stats: dict, group_by: str, cumulative=False, metrics: dict=None)`` class:
Top of allocated memory blocks grouped by *group_by* as a
dictionary.
The ``Snapshot.top_by()`` method creates a ``GroupedStats``
instance.
``compare_to(old_stats: GroupedStats=None)`` method:
Compare to an older ``GroupedStats`` instance. Return a
``StatsDiff`` instance.
The ``StatsDiff.differences`` list is not sorted: call the
``StatsDiff.sort()`` method to sort the list.
``None`` values are replaced with an empty string for filenames or
zero for line numbers, because ``str`` and ``int`` cannot be
compared to ``None``.
``cumulative`` attribute:
If ``True``, cumulate size and count of memory blocks of all frames
of the traceback of a trace, not only the most recent frame.
``metrics`` attribute:
Dictionary storing metrics read when the snapshot was created:
``{name (str): metric}`` where *metric* type is ``Metric``.
``group_by`` attribute:
Determine how memory allocations were grouped: see
``Snapshot.top_by()`` for the available values.
``stats`` attribute:
Dictionary ``{key: stats}`` where the *key* type depends on the
``group_by`` attribute and *stats* is a ``(size: int, count: int)``
tuple.
See the ``Snapshot.top_by()`` method.
``timestamp`` attribute:
Creation date and time of the snapshot, ``datetime.datetime``
instance.
Metric
------
``Metric(name: str, value: int, format: str)`` class:
Value of a metric when a snapshot is created.
``name`` attribute:
Name of the metric.
``value`` attribute:
Value of the metric.
``format`` attribute:
Format of the metric (``str``).
Snapshot
--------
``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: dict=None)`` class:
Snapshot of traces and statistics on memory blocks allocated by Python.
``add_metric(name: str, value: int, format: str)`` method:
Helper to add a ``Metric`` instance to ``Snapshot.metrics``. Return
the newly created ``Metric`` instance.
Raise an exception if the name is already present in
``Snapshot.metrics``.
``apply_filters(filters)`` method:
Apply filters on the ``traces`` and ``stats`` dictionaries,
*filters* is a list of ``Filter`` instances.
``create(traces=False)`` classmethod:
Take a snapshot of traces and/or statistics of allocated memory blocks.
If *traces* is ``True``, ``get_traces()`` is called and its result
is stored in the ``Snapshot.traces`` attribute. This attribute
contains more information than ``Snapshot.stats`` and uses more
memory and more disk space. If *traces* is ``False``,
``Snapshot.traces`` is set to ``None``.
Tracebacks of traces are limited to ``traceback_limit`` frames. Call
``set_traceback_limit()`` before calling ``Snapshot.create()`` to
store more frames.
The ``tracemalloc`` module must be enabled to take a snapshot. See
the the ``enable()`` function.
``get_metric(name, default=None)`` method:
Get the value of the metric called *name*. Return *default* if the
metric does not exist.
``load(filename, traces=True)`` classmethod:
Load a snapshot from a file.
If *traces* is ``False``, don't load traces.
``top_by(group_by: str, cumulative: bool=False)`` method:
Compute top statistics grouped by *group_by* as a ``GroupedStats``
instance:
===================== ======================== ================================
group_by description key type
===================== ======================== ================================
``'filename'`` filename ``str``
``'line'`` filename and line number ``(filename: str, lineno: int)``
``'address'`` memory block address ``int``
``'traceback'`` traceback ``(address: int, traceback)``
===================== ======================== ================================
The ``traceback`` type is a tuple of ``(filename: str, lineno:
int)`` tuples, *filename* and *lineno* can be ``None``.
If *cumulative* is ``True``, cumulate size and count of memory
blocks of all frames of the traceback of a trace, not only the most
recent frame. The *cumulative* parameter is ignored if *group_by*
is ``'address'`` or if the traceback limit is less than ``2``.
``write(filename)`` method:
Write the snapshot into a file.
``metrics`` attribute:
Dictionary storing metrics read when the snapshot was created:
``{name (str): metric}`` where *metric* type is ``Metric``.
``stats`` attribute:
Statistics on traced Python memory, result of the ``get_stats()``
function.
``traceback_limit`` attribute:
Maximum number of frames stored in a trace of a memory block
allocated by Python.
``traces`` attribute:
Traces of Python memory allocations, result of the ``get_traces()``
function, can be ``None``.
``timestamp`` attribute:
Creation date and time of the snapshot, ``datetime.datetime``
instance.
StatsDiff
---------
``StatsDiff(differences, old_stats, new_stats)`` class:
Differences between two ``GroupedStats`` instances.
The ``GroupedStats.compare_to()`` method creates a ``StatsDiff``
instance.
``sort()`` method:
Sort the ``differences`` list from the biggest difference to the
smallest difference. Sort by ``abs(size_diff)``, *size*,
``abs(count_diff)``, *count* and then by *key*.
``differences`` attribute:
Differences between ``old_stats`` and ``new_stats`` as a list of
``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*,
*size*, *count_diff* and *count* are ``int``. The key type depends
on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the
``Snapshot.top_by()`` method.
``old_stats`` attribute:
Old ``GroupedStats`` instance, can be ``None``.
``new_stats`` attribute:
New ``GroupedStats`` instance.
Prior Work
==========
* `Python Memory Validator
<http://www.softwareverify.com/python/memory/index.html>`_ (2005-2013):
commercial Python memory validator developed by Software Verification.
It uses the Python Reflection API.
* `PySizer <http://pysizer.8325.org/>`_: Google Summer of Code 2005 project by
Nick Smallbone.
* `Heapy
<http://guppy-pe.sourceforge.net/>`_ (2006-2013):
part of the Guppy-PE project written by Sverker Nilsson.
* Draft PEP: `Support Tracking Low-Level Memory Usage in CPython
<http://svn.python.org/projects/python/branches/bcannon-sandboxing/PEP.txt>`_
(Brett Canon, 2006)
* Muppy: project developed in 2008 by Robert Schuppenies.
* `asizeof <http://code.activestate.com/recipes/546530/>`_:
a pure Python module to estimate the size of objects by Jean
Brouwers (2008).
* `Heapmonitor <http://www.scons.org/wiki/LudwigHaehne/HeapMonitor>`_:
It provides facilities to size individual objects and can track all objects
of certain classes. It was developed in 2008 by Ludwig Haehne.
* `Pympler <http://code.google.com/p/pympler/>`_ (2008-2011):
project based on asizeof, muppy and HeapMonitor
* `objgraph <http://mg.pov.lt/objgraph/>`_ (2008-2012)
* `Dozer <https://pypi.python.org/pypi/Dozer>`_: WSGI Middleware version
of the CherryPy memory leak debugger, written by Marius Gedminas (2008-2013)
* `Meliae
<https://pypi.python.org/pypi/meliae>`_:
Python Memory Usage Analyzer developed by John A Meinel since 2009
* `caulk <https://github.com/smartfile/caulk/>`_: written by Ben Timby in 2012
* `memory_profiler <https://pypi.python.org/pypi/memory_profiler>`_:
written by Fabian Pedregosa (2011-2013)
See also `Pympler Related Work
<http://pythonhosted.org/Pympler/related.html>`_.
Links
=====
tracemalloc:
* `#18874: Add a new tracemalloc module to trace Python
memory allocations <http://bugs.python.org/issue18874>`_
* `pytracemalloc on PyPI
<https://pypi.python.org/pypi/pytracemalloc>`_
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: