PEP: 454 Title: Add a new tracemalloc module to trace Python memory allocations Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 3-September-2013 Python-Version: 3.4 Abstract ======== This PEP proposes to add a new ``tracemalloc`` module to trace memory blocks allocated by Python. Rationale ========= Common debug tools tracing memory allocations record the C filename and line number where the allocation occurs. Using such tools to analyze Python memory allocations does not help because most memory blocks are allocated in the same C function, in ``PyMem_Malloc()`` for example. There are debug tools dedicated to the Python language like ``Heapy`` and ``PySizer``. These tools analyze objects type and/or content. They are useful when most memory leaks are instances of the same type and this type is only instantiated in a few functions. Problems arise when the object type is very common like ``str`` or ``tuple``, and it is hard to identify where these objects are instantiated. Finding reference cycles is also a difficult problem. There are different tools to draw a diagram of all references. These tools cannot be used on large applications with thousands of objects because the diagram is too huge to be analyzed manually. Proposal ======== Using the customized allocation API from PEP 445, it becomes easy to set up a hook on Python memory allocators. A hook can inspect Python internals to retrieve Python tracebacks. This PEP proposes to add a new ``tracemalloc`` module, as a debug tool to trace memory blocks allocated by Python. The module provides the following information: * Computed differences between two snapshots to detect memory leaks * Statistics on allocated memory blocks per filename and per line number: total size, number and average size of allocated memory blocks * Traceback where a memory block was allocated The API of the tracemalloc module is similar to the API of the faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()`` functions, an environment variable (``PYTHONFAULTHANDLER`` and ``PYTHONTRACEMALLOC``), and a ``-X`` command line option (``-X faulthandler`` and ``-X tracemalloc``). See the `documentation of the faulthandler module `_. The tracemalloc module has been written for CPython. Other implementations of Python may not be able to provide it. API === Main Functions -------------- ``clear_traces()`` function: Clear traces and statistics on Python memory allocations, and reset the ``get_traced_memory()`` counter. ``disable()`` function: Stop tracing Python memory allocations. See also ``enable()`` and ``is_enabled()`` functions. ``enable()`` function: Start tracing Python memory allocations. At fork, the module is automatically disabled in the child process. See also ``disable()`` and ``is_enabled()`` functions. ``get_stats()`` function: Get statistics on traced Python memory blocks as a dictionary ``{filename (str): {line_number (int): stats}}`` where *stats* in a ``(size: int, count: int)`` tuple, *filename* and *line_number* can be ``None``. Return an empty dictionary if the ``tracemalloc`` module is disabled. See also the ``get_traces()`` function. ``get_traced_memory()`` function: Get the current size and maximum size of memory blocks traced by the ``tracemalloc`` module as a tuple: ``(size: int, max_size: int)``. ``get_tracemalloc_memory()`` function: Get the memory usage in bytes of the ``tracemalloc`` module as a tuple: ``(size: int, free: int)``. * *size*: total size of bytes allocated by the module, including *free* bytes * *free*: number of free bytes available to store data ``is_enabled()`` function: ``True`` if the ``tracemalloc`` module is tracing Python memory allocations, ``False`` otherwise. See also ``enable()`` and ``disable()`` functions. Trace Functions --------------- ``get_traceback_limit()`` function: Get the maximum number of frames stored in the traceback of a trace of a memory block. Use the ``set_traceback_limit()`` function to change the limit. ``get_object_address(obj)`` function: Get the address of the main memory block of the specified Python object. A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block. See also ``get_object_trace()`` and ``gc.get_referrers()`` functions. ``get_object_trace(obj)`` function: Get the trace of a Python object *obj* as a ``(size: int, traceback)`` tuple where *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. The function only returns the trace of the main memory block of the object. The *size* of the trace is smaller than the total size of the object if the object is composed by more than one memory block. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the object. See also ``get_object_address()``, ``get_trace()``, ``get_traces()``, ``gc.get_referrers()`` and ``sys.getsizeof()`` functions. ``get_trace(address)`` function: Get the trace of a memory block as a ``(size: int, traceback)`` tuple where *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the memory block. See also ``get_object_trace()``, ``get_stats()`` and ``get_traces()`` functions. ``get_traces()`` function: Get traces of Python memory allocations as a dictionary ``{address (int): trace}`` where *trace* is a ``(size: int, traceback)`` and *traceback* is a list of ``(filename: str, lineno: int)``. *traceback* can be empty, *filename* and *lineno* can be None. Return an empty dictionary if the ``tracemalloc`` module is disabled. See also ``get_object_trace()``, ``get_stats()`` and ``get_trace()`` functions. ``set_traceback_limit(nframe: int)`` function: Set the maximum number of frames stored in the traceback of a trace of a memory block. Storing the traceback of each memory allocation has an important overhead on the memory usage. Use the ``get_tracemalloc_memory()`` function to measure the overhead and the ``add_filter()`` function to select which memory allocations are traced. Use the ``get_traceback_limit()`` function to get the current limit. Filter Functions ---------------- ``add_filter(filter)`` function: Add a new filter on Python memory allocations, *filter* is a ``Filter`` instance. All inclusive filters are applied at once, a memory allocation is only ignored if no inclusive filters match its trace. A memory allocation is ignored if at least one exclusive filter matchs its trace. The new filter is not applied on already collected traces. Use the ``clear_traces()`` function to ensure that all traces match the new filter. ``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an inclusive filter: helper for the ``add_filter()`` method creating a ``Filter`` instance with the ``Filter.include`` attribute set to ``True``. Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)`` only includes memory blocks allocated by the ``tracemalloc`` module. ``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an exclusive filter: helper for the ``add_filter()`` method creating a ``Filter`` instance with the ``Filter.include`` attribute set to ``False``. Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)`` ignores memory blocks allocated by the ``tracemalloc`` module. ``clear_filters()`` function: Reset the filter list. See also the ``get_filters()`` function. ``get_filters()`` function: Get the filters on Python memory allocations as list of ``Filter`` instances. See also the ``clear_filters()`` function. Filter ------ ``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class: Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the ``tracemalloc`` module, which can be read using the ``get_tracemalloc_memory()`` function. ``match(filename: str, lineno: int)`` method: Return ``True`` if the filter matchs the filename and line number, ``False`` otherwise. ``match_filename(filename: str)`` method: Return ``True`` if the filter matchs the filename, ``False`` otherwise. ``match_lineno(lineno: int)`` method: Return ``True`` if the filter matchs the line number, ``False`` otherwise. ``match_traceback(traceback)`` method: Return ``True`` if the filter matchs the *traceback*, ``False`` otherwise. *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples. ``include`` attribute: If *include* is ``True``, only trace memory blocks allocated in a file with a name matching filename ``pattern`` at line number ``lineno``. If *include* is ``False``, ignore memory blocks allocated in a file with a name matching filename ``pattern`` at line number ``lineno``. ``lineno`` attribute: Line number (``int``). If is is ``None`` or less than ``1``, it matches any line number. ``pattern`` attribute: The filename *pattern* can contain one or many ``*`` joker characters which match any substring, including an empty string. The ``.pyc`` and ``.pyo`` file extensions are replaced with ``.py``. On Windows, the comparison is case insensitive and the alternative separator ``/`` is replaced with the standard separator ``\``. ``traceback`` attribute: If *traceback* is ``True``, all frames of the traceback are checked. If *traceback* is ``False``, only the most recent frame is checked. This attribute is ignored if the traceback limit is less than ``2``. See the ``get_traceback_limit()`` function. GroupedStats ------------ ``GroupedStats(timestamp: datetime.datetime, stats: dict, group_by: str, cumulative=False, metrics: dict=None)`` class: Top of allocated memory blocks grouped by *group_by* as a dictionary. The ``Snapshot.top_by()`` method creates a ``GroupedStats`` instance. ``compare_to(old_stats: GroupedStats=None)`` method: Compare to an older ``GroupedStats`` instance. Return a ``StatsDiff`` instance. The ``StatsDiff.differences`` list is not sorted: call the ``StatsDiff.sort()`` method to sort the list. ``None`` values are replaced with an empty string for filenames or zero for line numbers, because ``str`` and ``int`` cannot be compared to ``None``. ``cumulative`` attribute: If ``True``, cumulate size and count of memory blocks of all frames of the traceback of a trace, not only the most recent frame. ``metrics`` attribute: Dictionary storing metrics read when the snapshot was created: ``{name (str): metric}`` where *metric* type is ``Metric``. ``group_by`` attribute: Determine how memory allocations were grouped: see ``Snapshot.top_by()`` for the available values. ``stats`` attribute: Dictionary ``{key: stats}`` where the *key* type depends on the ``group_by`` attribute and *stats* is a ``(size: int, count: int)`` tuple. See the ``Snapshot.top_by()`` method. ``timestamp`` attribute: Creation date and time of the snapshot, ``datetime.datetime`` instance. Metric ------ ``Metric(name: str, value: int, format: str)`` class: Value of a metric when a snapshot is created. ``name`` attribute: Name of the metric. ``value`` attribute: Value of the metric. ``format`` attribute: Format of the metric (``str``). Snapshot -------- ``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: dict=None)`` class: Snapshot of traces and statistics on memory blocks allocated by Python. ``add_metric(name: str, value: int, format: str)`` method: Helper to add a ``Metric`` instance to ``Snapshot.metrics``. Return the newly created ``Metric`` instance. Raise an exception if the name is already present in ``Snapshot.metrics``. ``apply_filters(filters)`` method: Apply filters on the ``traces`` and ``stats`` dictionaries, *filters* is a list of ``Filter`` instances. ``create(traces=False)`` classmethod: Take a snapshot of traces and/or statistics of allocated memory blocks. If *traces* is ``True``, ``get_traces()`` is called and its result is stored in the ``Snapshot.traces`` attribute. This attribute contains more information than ``Snapshot.stats`` and uses more memory and more disk space. If *traces* is ``False``, ``Snapshot.traces`` is set to ``None``. Tracebacks of traces are limited to ``traceback_limit`` frames. Call ``set_traceback_limit()`` before calling ``Snapshot.create()`` to store more frames. The ``tracemalloc`` module must be enabled to take a snapshot. See the the ``enable()`` function. ``get_metric(name, default=None)`` method: Get the value of the metric called *name*. Return *default* if the metric does not exist. ``load(filename, traces=True)`` classmethod: Load a snapshot from a file. If *traces* is ``False``, don't load traces. ``top_by(group_by: str, cumulative: bool=False)`` method: Compute top statistics grouped by *group_by* as a ``GroupedStats`` instance: ===================== ======================== ================================ group_by description key type ===================== ======================== ================================ ``'filename'`` filename ``str`` ``'line'`` filename and line number ``(filename: str, lineno: int)`` ``'address'`` memory block address ``int`` ``'traceback'`` traceback ``(address: int, traceback)`` ===================== ======================== ================================ The ``traceback`` type is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. If *cumulative* is ``True``, cumulate size and count of memory blocks of all frames of the traceback of a trace, not only the most recent frame. The *cumulative* parameter is ignored if *group_by* is ``'address'`` or if the traceback limit is less than ``2``. ``write(filename)`` method: Write the snapshot into a file. ``metrics`` attribute: Dictionary storing metrics read when the snapshot was created: ``{name (str): metric}`` where *metric* type is ``Metric``. ``stats`` attribute: Statistics on traced Python memory, result of the ``get_stats()`` function. ``traceback_limit`` attribute: Maximum number of frames stored in a trace of a memory block allocated by Python. ``traces`` attribute: Traces of Python memory allocations, result of the ``get_traces()`` function, can be ``None``. ``timestamp`` attribute: Creation date and time of the snapshot, ``datetime.datetime`` instance. StatsDiff --------- ``StatsDiff(differences, old_stats, new_stats)`` class: Differences between two ``GroupedStats`` instances. The ``GroupedStats.compare_to()`` method creates a ``StatsDiff`` instance. ``sort()`` method: Sort the ``differences`` list from the biggest difference to the smallest difference. Sort by ``abs(size_diff)``, *size*, ``abs(count_diff)``, *count* and then by *key*. ``differences`` attribute: Differences between ``old_stats`` and ``new_stats`` as a list of ``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*, *size*, *count_diff* and *count* are ``int``. The key type depends on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the ``Snapshot.top_by()`` method. ``old_stats`` attribute: Old ``GroupedStats`` instance, can be ``None``. ``new_stats`` attribute: New ``GroupedStats`` instance. Prior Work ========== * `Python Memory Validator `_ (2005-2013): commercial Python memory validator developed by Software Verification. It uses the Python Reflection API. * `PySizer `_: Google Summer of Code 2005 project by Nick Smallbone. * `Heapy `_ (2006-2013): part of the Guppy-PE project written by Sverker Nilsson. * Draft PEP: `Support Tracking Low-Level Memory Usage in CPython `_ (Brett Canon, 2006) * Muppy: project developed in 2008 by Robert Schuppenies. * `asizeof `_: a pure Python module to estimate the size of objects by Jean Brouwers (2008). * `Heapmonitor `_: It provides facilities to size individual objects and can track all objects of certain classes. It was developed in 2008 by Ludwig Haehne. * `Pympler `_ (2008-2011): project based on asizeof, muppy and HeapMonitor * `objgraph `_ (2008-2012) * `Dozer `_: WSGI Middleware version of the CherryPy memory leak debugger, written by Marius Gedminas (2008-2013) * `Meliae `_: Python Memory Usage Analyzer developed by John A Meinel since 2009 * `caulk `_: written by Ben Timby in 2012 * `memory_profiler `_: written by Fabian Pedregosa (2011-2013) See also `Pympler Related Work `_. Links ===== tracemalloc: * `#18874: Add a new tracemalloc module to trace Python memory allocations `_ * `pytracemalloc on PyPI `_ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: