PEP: 454 Title: Add a new tracemalloc module to trace Python memory allocations Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 3-September-2013 Python-Version: 3.4 Abstract ======== Add a new ``tracemalloc`` module to trace memory blocks allocated by Python. Rationale ========= Common debug tools tracing memory allocations read the C filename and number. Using such tool to analyze Python memory allocations does not help because most memory block are allocated in the same C function, in ``PyMem_Malloc()`` for example. There are debug tools dedicated to the Python language like ``Heapy`` and ``PySizer``. These projects analyze objects type and/or content. These tools are useful when most memory leaks are instances of the same type and this type is only instancied in a few functions. The problem is when the object type is very common like ``str`` or ``tuple``, and it is hard to identify where these objects are instancied. Finding reference cycles is also a difficult problem. There are different tools to draw a diagram of all references. These tools cannot be used on large applications with thousands of objects because the diagram is too huge to be analyzed manually. Proposal ======== Using the PEP 445, it becomes easy to setup an hook on Python memory allocators. The hook can inspect the current Python frame to get the Python filename and line number. This PEP proposes to add a new ``tracemalloc`` module. It is a debug tool to trace memory allocations made by Python. The module provides the following information: * Compute the differences between two snapshots to detect memory leaks * Statistics on allocated memory blocks per filename and per line number: total size, number and average size of allocated memory blocks * For each allocated memory block: its size and the traceback where the block was allocated The API of the tracemalloc module is similar to the API of the faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()`` functions, an environment variable (``PYTHONFAULTHANDLER`` and ``PYTHONTRACEMALLOC``), a ``-X`` command line option (``-X faulthandler`` and ``-X tracemalloc``). See the `documentation of the faulthandler module `_. The tracemalloc module has been written for CPython. Other implementations of Python may not provide it. API === To trace most memory blocks allocated by Python, the module should be enabled as early as possible by calling ``tracemalloc.enable()`` function, by setting the ``PYTHONTRACEMALLOC`` environment variable to ``1``, or by using ``-X tracemalloc`` command line option. By default, the ``Trace.traceback`` attribute only stores one ``Frame`` instance per allocated memory block. Use ``set_traceback_limit()`` to store more frames. Functions --------- ``add_filter(filter)`` function: Add a new filter on Python memory allocations, *filter* is a ``Filter`` instance. All inclusive filters are applied at once, a memory allocation is only ignored if no inclusive filter match its trace. A memory allocation is ignored if at least one exclusive filter matchs its trace. The new filter is not applied on already collected traces. Use ``clear_traces()`` to ensure that all traces match the new filter. ``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an inclusive filter: helper for ``add_filter()`` creating a ``Filter`` instance with ``include`` attribute set to ``True``. Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)`` only includes memory blocks allocated by the ``tracemalloc`` module. ``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an exclusive filter: helper for ``add_filter()`` creating a ``Filter`` instance with ``include`` attribute set to ``False``. Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)`` ignores memory blocks allocated by the ``tracemalloc`` module. ``clear_filters()`` function: Reset the filter list. ``clear_traces()`` function: Clear all traces and statistics on Python memory allocations, and reset the ``get_traced_memory()`` counter. ``disable()`` function: Stop tracing Python memory allocations and stop the timer started by ``start_timer()``. See also ``enable()`` and ``is_enabled()`` functions. ``enable()`` function: Start tracing Python memory allocations. See also ``disable()`` and ``is_enabled()`` functions. ``get_filters()`` function: Get the filters on Python memory allocations as list of ``Filter`` instances. ``get_traceback_limit()`` function: Get the maximum number of ``Frame`` instances stored in the ``traceback`` attribute of a ``Trace`` instance. Use ``set_traceback_limit()`` to change the limit. ``get_object_address(obj)`` function: Get the address of the memory block of the specified Python object. ``get_object_trace(obj)`` function: Get the trace of a Python object *obj* as a ``Trace`` instance. The function only returns the trace of the memory block directly holding to object. The ``size`` attribute of the trace is smaller than the total size of the object if the object is composed of more than one memory block. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the object. See also ``gc.get_referrers()`` and ``sys.getsizeof()`` functions. ``get_process_memory()`` function: Get the memory usage of the current process as a meminfo namedtuple with two attributes: * ``rss``: Resident Set Size in bytes * ``vms``: size of the virtual memory in bytes Return ``None`` if the platform is not supported. ``get_stats()`` function: Get statistics on traced Python memory blocks as a dictionary ``{filename (str): {line_number (int): stats}}`` where *stats* in a ``TraceStats`` instance, *filename* and *line_number* can be ``None``. Return an empty dictionary if the ``tracemalloc`` module is disabled. ``get_traced_memory()`` function: Get the total size of all traced memory blocks allocated by Python. ``get_tracemalloc_size()`` function: Get the memory usage in bytes of the ``tracemalloc`` module. ``get_traces(obj)`` function: Get all traces of Python memory allocations as a dictionary ``{address (int): trace}`` where *trace* is a ``Trace`` instance. Return an empty dictionary if the ``tracemalloc`` module is disabled. ``is_enabled()`` function: ``True`` if the ``tracemalloc`` module is tracing Python memory allocations, ``False`` otherwise. See also ``enable()`` and ``disable()`` functions. ``start_timer(delay: int, func: callable, args: tuple=(), kwargs: dict={})`` function: Start a timer calling ``func(*args, **kwargs)`` every *delay* seconds. Enable the ``tracemalloc`` module if it is disabled. The timer is based on the Python memory allocator, it is not real time. *func* is called after at least *delay* seconds, it is not called exactly after *delay* seconds if no Python memory allocation occurred. The timer has a resolution of 1 second. If the ``start_timer()`` function is called twice, previous parameters are replaced. Call the ``stop_timer()`` function to stop the timer. The ``DisplayTopTask.start()`` and ``TakeSnapshot.start()`` methods use the ``start_timer()`` function to run regulary a task. ``set_traceback_limit(limit: int)`` function: Set the maximum number of ``Frame`` instances stored in the ``traceback`` attribute of a ``Trace`` instance. Clear all traces and statistics on Python memory allocations if the ``tracemalloc`` module is enabled, Storing the traceback of each memory allocation has an important overhead on the memory usage. Example with the Python test suite: tracing all memory allocations increases the memory usage by ``+50%`` when storing only 1 frame and ``+150%`` when storing 10 frames. Use ``get_tracemalloc_size()`` to measure the overhead and ``add_filter()`` to select which memory allocations are traced. Use ``get_traceback_limit()`` to get the current limit. ``stop_timer()`` function: Stop the timer started by ``start_timer()``. DisplayTop class ---------------- ``DisplayTop()`` class: Display the top of allocated memory blocks. ``display_snapshot(snapshot, count=10, group_by="filename_lineno", cumulative=False, file=None)`` method: Display a snapshot of memory blocks allocated by Python, *snapshot* is a ``Snapshot`` instance. ``display_top_diff(top_diff, count=10, file=None)`` method: Display differences between two ``GroupedStats`` instances, *top_diff* is a ``StatsDiff`` instance. ``display_top_stats(top_stats, count=10, file=None)`` method: Display the top of allocated memory blocks grouped by the ``group_by`` attribute of *top_stats*, *top_stats* is a ``GroupedStats`` instance. ``color`` attribute: If ``True``, always use colors. If ``False``, never use colors. The default value is ``None``: use colors if the *file* parameter is a TTY device. ``compare_with_previous`` attribute: If ``True`` (default value), compare with the previous snapshot. If ``False``, compare with the first snapshot. ``filename_parts`` attribute: Number of displayed filename parts (int, default: ``3``). Extra parts are replaced with ``'...'``. ``show_average`` attribute: If ``True`` (default value), display the average size of memory blocks. ``show_count`` attribute: If ``True`` (default value), display the number of allocated memory blocks. ``show_size`` attribute: If ``True`` (default value), display the size of memory blocks. DisplayTopTask class -------------------- ``DisplayTopTask(count=10, group_by="filename_lineno", cumulative=False, file=sys.stdout, user_data_callback=None)`` class: Task taking temporary snapshots and displaying the top *count* memory allocations grouped by *group_by*. Call the ``start()`` method to start the task. ``display()`` method: Take a snapshot and display the top *count* biggest allocated memory blocks grouped by *group_by* using the ``display_top`` attribute. Return the snapshot, a ``Snapshot`` instance. ``start(delay: int)`` method: Start a task using the ``start_timer()`` function calling the ``display()`` method every *delay* seconds. ``stop()`` method: Stop the task started by the ``start()`` method using the ``stop_timer()`` function. ``count`` attribute: Maximum number of displayed memory blocks. ``cumulative`` attribute: If ``True``, cumulate size and count of memory blocks of all frames of each ``Trace`` instance, not only the most recent frame. The default value is ``False``. The option is ignored if the traceback limit is ``1``, see the ``get_traceback_limit()`` function. ``display_top`` attribute: Instance of ``DisplayTop``. ``file`` attribute: The top is written into *file*. ``group_by`` attribute: Determine how memory allocations are grouped: see ``Snapshot.top_by`` for the available values. ``user_data_callback`` attribute: Optional callback collecting user data (callable, default: ``None``). See ``Snapshot.create()``. Filter class ------------ ``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class: Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the ``tracemalloc`` module, which can be read using ``get_tracemalloc_size()``. ``match_trace(trace)`` method: Return ``True`` if the ``Trace`` instance must be kept according to the filter, ``False`` otherwise. ``match(filename: str, lineno: int)`` method: Return ``True`` if the filename and line number must be kept according to the filter, ``False`` otherwise. ``match_filename(filename: str)`` method: Return ``True`` if the filename must be kept according to the filter, ``False`` otherwise. ``match_lineno(lineno: int)`` method: Return ``True`` if the line number must be kept according to the filter, ``False`` otherwise. ``include`` attribute: If *include* is ``True``, only trace memory blocks allocated in a file with a name matching filename ``pattern`` at line number ``lineno``. If *include* is ``False``, ignore memory blocks allocated in a file with a name matching filename :attr`pattern` at line number ``lineno``. ``pattern`` attribute: The filename *pattern* can contain one or many ``*`` joker characters which match any substring, including an empty string. The ``.pyc`` and ``.pyo`` suffixes are replaced with ``.py``. On Windows, the comparison is case insensitive and the alternative separator ``/`` is replaced with the standard separator ``\``. ``lineno`` attribute: Line number (``int``). If is is ``None`` or lesser than ``1``, it matches any line number. ``traceback`` attribute: If *traceback* is ``True``, all frames of the ``traceback`` attribute of ``Trace`` instances are checked. If *traceback* is ``False``, only the most recent frame is checked. This attribute only has an effect on the ``match_trace()`` method and only if the traceback limit is greater than ``1``. See the ``get_traceback_limit()`` function. Frame class ----------- ``Frame`` class: Trace of a Python frame, used by ``Trace.traceback`` attribute. ``filename`` attribute: Python filename, ``None`` if unknown. ``lineno`` attribute: Python line number, ``None`` if unknown. GroupedStats class ------------------ ``GroupedStats(stats: dict, group_by: str, cumulative=False, timestamp=None, process_memory=None, tracemalloc_size=None)`` class: Top of allocated memory blocks grouped by on *group_by* as a dictionary. The ``Snapshot.top_by()`` method creates a ``GroupedStats`` instance. ``compare_to(old_stats: GroupedStats=None)`` method: Compare to an older ``GroupedStats`` instance. Return a ``StatsDiff`` instance. ``cumulative`` attribute: If ``True``, cumulate size and count of memory blocks of all frames of ``Trace``, not only the most recent frame. ``group_by`` attribute: Determine how memory allocations were grouped. The type of ``stats`` keys depends on *group_by*: ===================== ======================== ============== group_by description key type ===================== ======================== ============== ``'filename'`` filename ``str`` ``'filename_lineno'`` filename and line number ``(str, str)`` ``'address'`` memory block address ``int`` ===================== ======================== ============== See the *group_by* parameter of the ``Snapshot.top_by()`` method. ``stats`` attribute: Dictionary ``{key: stats}`` where the *key* type depends on the ``group_by`` attribute and *stats* type is ``TraceStats``. ``process_memory`` attribute: Result of the ``get_process_memory()`` function, can be ``None``. ``timestamp`` attribute: Creation date and time of the snapshot, ``datetime.datetime`` instance. ``tracemalloc_size`` attribute: The memory usage in bytes of the ``tracemalloc`` module, result of the ``get_tracemalloc_size()`` function. Snapshot class -------------- ``Snapshot`` class: Snapshot of memory blocks allocated by Python. Use ``TakeSnapshot`` to take regulary snapshots. ``apply_filters(filters)`` method: Apply a list filters on the ``traces`` and ``stats`` dictionaries, *filters* is a list of ``Filter`` instances. ``create(\*, with_traces=False, with_stats=True, user_data_callback=None)`` classmethod: Take a snapshot of traces and/or statistics of allocated memory blocks. If *with_traces* is ``True``, ``get_traces()`` is called and its result is stored in the ``traces`` attribute. This attribute contains more information than ``stats`` and uses more memory and more disk space. If *with_traces* is ``False``, ``traces`` is set to ``None``. If *with_stats* is ``True``, ``get_stats()`` is called and its result is stored in the ``Snapshot.stats`` attribute. If *with_stats* is ``False``, ``Snapshot.stats`` is set to ``None``. *with_traces* and *with_stats* cannot be ``False`` at the same time. *user_data_callback* is an optional callable object. Its result should be serializable by the ``pickle`` module, or ``Snapshot.write()`` would fail. If *user_data_callback* is set, it is called and the result is stored in the ``Snapshot.user_data`` attribute. Otherwise, ``Snapshot.user_data`` is set to ``None``. The ``tracemalloc`` module must be enabled to take a snapshot. See the ``enable()`` function. ``load(filename)`` classmethod: Load a snapshot from a file. ``top_by(group_by: str, cumulative: bool=False)`` method: Compute top statistics grouped by *group_by* as a ``GroupedStats`` instance: ===================== ======================== ============== group_by description key type ===================== ======================== ============== ``'filename'`` filename ``str`` ``'filename_lineno'`` filename and line number ``(str, str)`` ``'address'`` memory block address ``int`` ===================== ======================== ============== If *cumulative* is ``True``, cumulate size and count of memory blocks of all frames of each ``Trace`` instance, not only the most recent frame. The *cumulative* parameter is ignored if *group_by* is ``'address'`` or if the traceback limit is ``1``. See the ``traceback_limit`` attribute. ``write(filename)`` method: Write the snapshot into a file. ``pid`` attribute: Identifier of the process which created the snapshot, result of ``os.getpid()``. ``process_memory`` attribute: Memory usage of the current process, result of the ``get_process_memory()`` function. It can be ``None``. ``stats`` attribute: Statistics on traced Python memory, result of the ``get_stats()`` function, if ``create()`` was called with *with_stats* equals to ``True``, ``None`` otherwise. ``tracemalloc_size`` attribute: The memory usage in bytes of the ``tracemalloc`` module, result of the ``get_tracemalloc_size()`` function. ``traceback_limit`` attribute: The maximum number of frames stored in the ``traceback`` attribute of a ``Trace``, result of the ``get_traceback_limit()`` function. ``traces`` attribute: Traces of Python memory allocations, result of the ``get_traces()`` function, if ``create()`` was called with *with_traces* equals to ``True``, ``None`` otherwise. The ``traceback`` attribute of each ``Trace`` instance is limited to ``traceback_limit`` frames. ``timestamp`` attribute: Creation date and time of the snapshot, ``datetime.datetime`` instance. ``user_data`` attribute: Result of *user_data_callback* called in ``Snapshot.create()`` (default: ``None``). StatsDiff class --------------- ``StatsDiff(differences, old_stats, new_stats)`` class: Differences between two ``GroupedStats`` instances. By default, the ``differences`` list is unsorted: call ``sort()`` to sort it. The ``GroupedStats.compare_to()`` method creates a ``StatsDiff`` instance. ``sort()`` method: Sort the ``differences`` list from the biggest allocation to the smallest. Sort by *size_diff*, *size*, *count_diff*, *count* and then by *key*. ``differences`` attribute: Differences between ``old_stats`` and ``new_stats`` as a list of ``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*, *size*, *count_diff* and *count* are ``int``. The key type depends on the ``group_by`` attribute of ``new_stats``: ===================== ======================== ============== group_by description key type ===================== ======================== ============== ``'filename'`` filename ``str`` ``'filename_lineno'`` filename and line number ``(str, str)`` ``'address'`` memory block address ``int`` ===================== ======================== ============== See the ``group_by`` attribute of the ``GroupedStats`` class. ``old_stats`` attribute: Old ``GroupedStats`` instance, can be ``None``. ``new_stats`` attribute: New ``GroupedStats`` instance. Trace class ----------- ``Trace`` class: Debug information of a memory block allocated by Python. ``size`` attribute: Size in bytes of the memory block. ``traceback`` attribute: Traceback where the memory block was allocated as a list of ``Frame`` instances, most recent first. The list can be empty or incomplete if the ``tracemalloc`` module was unable to retrieve the full traceback. The traceback is limited to ``get_traceback_limit()`` frames. Use ``set_traceback_limit()`` to store more frames. TraceStats class ---------------- ``TraceStats`` class: Statistics on Python memory allocations. ``size`` attribute: Total size in bytes of allocated memory blocks. ``count`` attribute: Number of allocated memory blocks. Links ===== tracemalloc: * `#18874: Add a new tracemalloc module to trace Python memory allocations `_ * `pytracemalloc on PyPI `_ Similar projects: * `Meliae: Python Memory Usage Analyzer `_ * `Guppy-PE: umbrella package combining Heapy and GSL `_ * `PySizer `_: developed for Python 2.4 * `memory_profiler `_ * `pympler `_ * `Dozer `_: WSGI Middleware version of the CherryPy memory leak debugger * `objgraph `_ * `caulk `_ Copyright ========= This document has been placed into the public domain.