diff --git a/pep-0454.txt b/pep-0454.txt index 00a8b1daf..a0e0e92a2 100644 --- a/pep-0454.txt +++ b/pep-0454.txt @@ -13,43 +13,44 @@ Python-Version: 3.4 Abstract ======== -Add a new ``tracemalloc`` module to trace memory blocks allocated by Python. - +This PEP proposes to add a new ``tracemalloc`` module to trace memory +blocks allocated by Python. Rationale ========= -Common debug tools tracing memory allocations read the C filename and -line number. Using such tool to analyze Python memory allocations does -not help because most memory block are allocated in the same C function, -in ``PyMem_Malloc()`` for example. +Common debug tools tracing memory allocations record the C filename +and line number where the allocation occurs. Using such a tool to +analyze Python memory allocations does not help because most memory +blocks are allocated in the same C function, in ``PyMem_Malloc()`` for +example. There are debug tools dedicated to the Python language like ``Heapy`` -and ``PySizer``. These tools analyze objects type and/or content. They -are useful when most memory leaks are instances of the same type and -this type is only instantiated in a few functions. The problem is when -the object type is very common like ``str`` or ``tuple``, and it is hard -to identify where these objects are instantiated. +and ``PySizer``. These tools analyze objects type and/or content. +They are useful when most memory leaks are instances of the same type +and this type is only instantiated in a few functions. Problems arise +when the object type is very common like ``str`` or ``tuple``, and it +is hard to identify where these objects are instantiated. -Finding reference cycles is also a difficult problem. There are -different tools to draw a diagram of all references. These tools cannot -be used on large applications with thousands of objects because the -diagram is too huge to be analyzed manually. +Finding reference cycles is also a difficult problem. There are +different tools to draw a diagram of all references. These tools +cannot be used on large applications with thousands of objects because +the diagram is too huge to be analyzed manually. Proposal ======== -Using the PEP 445, it becomes easy to setup an hook on Python memory -allocators. A hook can inspect Python internals to retrieve the Python -tracebacks. +Using the customized allocation API from PEP 445, it becomes easy to +set up a hook on Python memory allocators. A hook can inspect Python +internals to retrieve Python tracebacks. -This PEP proposes to add a new ``tracemalloc`` module. It is a debug +This PEP proposes to add a new ``tracemalloc`` module, as a debug tool to trace memory blocks allocated by Python. The module provides the following information: -* Compute the differences between two snapshots to detect memory leaks +* Computed differences between two snapshots to detect memory leaks * Statistics on allocated memory blocks per filename and per line number: total size, number and average size of allocated memory blocks * Traceback where a memory block was allocated @@ -57,13 +58,13 @@ following information: The API of the tracemalloc module is similar to the API of the faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()`` functions, an environment variable (``PYTHONFAULTHANDLER`` and -``PYTHONTRACEMALLOC``), a ``-X`` command line option (``-X +``PYTHONTRACEMALLOC``), and a ``-X`` command line option (``-X faulthandler`` and ``-X tracemalloc``). See the `documentation of the faulthandler module -`_. +`_. The tracemalloc module has been written for CPython. Other -implementations of Python may not provide it. +implementations of Python may not be able to provide it. API @@ -72,15 +73,16 @@ API To trace most memory blocks allocated by Python, the module should be enabled as early as possible by setting the ``PYTHONTRACEMALLOC`` environment variable to ``1``, or by using ``-X tracemalloc`` command -line option. The ``tracemalloc.enable()`` function can also be called to -start tracing Python memory allocations. +line option. The ``tracemalloc.enable()`` function can also be called +to start tracing Python memory allocations. -By default, a trace of an allocated memory block only stores one frame. -Use the ``set_traceback_limit()`` function to store more frames. +By default, a trace of an allocated memory block only stores one +frame. Use the ``set_traceback_limit()`` function to store additional +frames. -Python memory blocks allocated in the ``tracemalloc`` module are also -traced by default. Use ``add_exclude_filter(tracemalloc.__file__)`` to -ignore these these memory allocations. +Python memory blocks allocated in the ``tracemalloc`` module itself are +also traced by default. Use ``add_exclude_filter(tracemalloc.__file__)`` +to ignore these these memory allocations. At fork, the module is automatically disabled in the child process. @@ -98,7 +100,8 @@ Main Functions ``clear_traces()`` function: Clear all traces and statistics on Python memory allocations, and - reset the ``get_arena_size()`` and ``get_traced_memory()`` counters. + reset the ``get_arena_size()`` and ``get_traced_memory()`` + counters. ``disable()`` function: @@ -148,31 +151,35 @@ Trace Functions ``get_traceback_limit()`` function: - Get the maximum number of frames stored in the traceback of a trace - of a memory block. + Get the maximum number of frames stored in the traceback of a + trace of a memory block. Use the ``set_traceback_limit()`` function to change the limit. ``get_object_address(obj)`` function: - Get the address of the memory block of the specified Python object. + Get the address of the memory block of the specified Python + object. A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block. - See also ``get_object_trace()`` and ``gc.get_referrers()`` functions. + See also ``get_object_trace()`` and ``gc.get_referrers()`` + functions. ``get_object_trace(obj)`` function: Get the trace of a Python object *obj* as a ``(size: int, - traceback)`` tuple where *traceback* is a tuple of ``(filename: str, - lineno: int)`` tuples, *filename* and *lineno* can be ``None``. + traceback)`` tuple where *traceback* is a tuple of ``(filename: + str, lineno: int)`` tuples, *filename* and *lineno* can be + ``None``. - The function only returns the trace of the main memory block of the - object. The *size* of the trace is smaller than the total size of - the object if the object is composed by more than one memory block. + The function only returns the trace of the main memory block of + the object. The *size* of the trace is smaller than the total + size of the object if the object is composed by more than one + memory block. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the object. @@ -199,21 +206,21 @@ Trace Functions Get all traces of Python memory allocations as a dictionary ``{address (int): trace}`` where *trace* is a ``(size: int, - traceback)`` and *traceback* is a list of ``(filename: str, lineno: - int)``. *traceback* can be empty, *filename* and *lineno* can be - None. + traceback)`` and *traceback* is a list of ``(filename: str, + lineno: int)``. *traceback* can be empty, *filename* and *lineno* + can be ``None``. Return an empty dictionary if the ``tracemalloc`` module is disabled. - See also ``get_object_trace()``, ``get_stats()`` and ``get_trace()`` - functions. + See also ``get_object_trace()``, ``get_stats()`` and + ``get_trace()`` functions. ``set_traceback_limit(nframe: int)`` function: - Set the maximum number of frames stored in the traceback of a trace - of a memory block. + Set the maximum number of frames stored in the traceback of a + trace of a memory block. Storing the traceback of each memory allocation has an important overhead on the memory usage. Use the ``get_tracemalloc_memory()`` @@ -237,24 +244,25 @@ Filter Functions trace. The new filter is not applied on already collected traces. Use the - ``clear_traces()`` function to ensure that all traces match the new - filter. + ``clear_traces()`` function to ensure that all traces match the + new filter. ``add_include_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an inclusive filter: helper for the ``add_filter()`` method - creating a ``Filter`` instance with the ``Filter.include`` attribute - set to ``True``. + creating a ``Filter`` instance with the ``Filter.include`` + attribute set to ``True``. Example: ``tracemalloc.add_include_filter(tracemalloc.__file__)`` - only includes memory blocks allocated by the ``tracemalloc`` module. + only includes memory blocks allocated by the ``tracemalloc`` + module. ``add_exclude_filter(filename: str, lineno: int=None, traceback: bool=False)`` function: Add an exclusive filter: helper for the ``add_filter()`` method - creating a ``Filter`` instance with the ``Filter.include`` attribute - set to ``False``. + creating a ``Filter`` instance with the ``Filter.include`` + attribute set to ``False``. Example: ``tracemalloc.add_exclude_filter(tracemalloc.__file__)`` ignores memory blocks allocated by the ``tracemalloc`` module. @@ -343,15 +351,18 @@ the ``Snapshot.add_metric()`` method. | ``arena_alignment`` | Number of bytes for arena alignment padding | +---------------------+-------------------------------------------------------+ - The function is not available if Python is compiled without ``pymalloc``. + The function is not available if Python is compiled without + ``pymalloc``. - See also ``get_arena_size()`` and ``sys._debugmallocstats()`` functions. + See also the ``get_arena_size()`` and ``sys._debugmallocstats()`` + functions. ``get_traced_memory()`` function: - Get the current size and maximum size of memory blocks traced by the - ``tracemalloc`` module as a tuple: ``(size: int, max_size: int)``. + Get the current size and maximum size of memory blocks traced by + the ``tracemalloc`` module as a tuple: ``(size: int, max_size: + int)``. ``get_tracemalloc_memory()`` function: @@ -378,12 +389,12 @@ DisplayTop ``DisplayTop()`` class: - Display the top of allocated memory blocks. + Display the "top-n" biggest allocated memory blocks. ``display(count=10, group_by="line", cumulative=False, file=None, callback=None)`` method: - Take a snapshot and display the top *count* biggest allocated memory - blocks grouped by *group_by*. + Take a snapshot and display the top *count* biggest allocated + memory blocks grouped by *group_by*. *callback* is an optional callable object which can be used to add metrics to a snapshot. It is called with only one parameter: the @@ -405,8 +416,8 @@ DisplayTop ``display_top_stats(top_stats, count=10, file=None)`` method: Display the top of allocated memory blocks grouped by the - ``GroupedStats.group_by`` attribute of *top_stats*, *top_stats* is a - ``GroupedStats`` instance. + ``GroupedStats.group_by`` attribute of *top_stats*, *top_stats* is + a ``GroupedStats`` instance. ``average`` attribute: @@ -851,8 +862,8 @@ StatsDiff Differences between ``old_stats`` and ``new_stats`` as a list of ``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*, *size*, *count_diff* and *count* are ``int``. The key type depends - on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the - ``Snapshot.top_by()`` method. + on the ``GroupedStats.group_by`` attribute of ``new_stats``: see + the ``Snapshot.top_by()`` method. ``old_stats`` attribute: @@ -869,8 +880,8 @@ Task ``Task(func, *args, **kw)`` class: Task calling ``func(*args, **kw)``. When scheduled, the task is - called when the traced memory is increased or decreased by more than - *threshold* bytes, or after *delay* seconds. + called when the traced memory is increased or decreased by more + than *threshold* bytes, or after *delay* seconds. ``call()`` method: @@ -892,10 +903,10 @@ Task ``get_memory_threshold()`` method: - Get the threshold of the traced memory. When scheduled, the task is - called when the traced memory is increased or decreased by more than - *threshold* bytes. The memory threshold is disabled if *threshold* - is ``None``. + Get the threshold of the traced memory. When scheduled, the task + is called when the traced memory is increased or decreased by more + than *threshold* bytes. The memory threshold is disabled if + *threshold* is ``None``. See also the ``set_memory_threshold()`` method and the ``get_traced_memory()`` function. @@ -903,20 +914,20 @@ Task ``schedule(repeat: int=None)`` method: - Schedule the task *repeat* times. If *repeat* is ``None``, the task - is rescheduled after each call until it is cancelled. + Schedule the task *repeat* times. If *repeat* is ``None``, the + task is rescheduled after each call until it is cancelled. - If the method is called twice, the task is rescheduled with the new - *repeat* parameter. + If the method is called twice, the task is rescheduled with the + new *repeat* parameter. The task must have a memory threshold or a delay: see ``set_delay()`` and ``set_memory_threshold()`` methods. The ``tracemalloc`` must be enabled to schedule a task: see the ``enable`` function. - The task is cancelled if the ``call()`` method raises an exception. - The task can be cancelled using the ``cancel()`` method or the - ``cancel_tasks()`` function. + The task is cancelled if the ``call()`` method raises an + exception. The task can be cancelled using the ``cancel()`` + method or the ``cancel_tasks()`` function. ``set_delay(seconds: int)`` method: @@ -925,18 +936,19 @@ Task delay to ``None`` to disable the timer. The timer is based on the Python memory allocator, it is not real - time. The task is called after at least *delay* seconds, it is not - called exactly after *delay* seconds if no Python memory allocation - occurred. The timer has a resolution of 1 second. + time. The task is called after at least *delay* seconds, it is + not called exactly after *delay* seconds if no Python memory + allocation occurred. The timer has a resolution of 1 second. The task is rescheduled if it was scheduled. ``set_memory_threshold(size: int)`` method: - Set the threshold of the traced memory. When scheduled, the task is - called when the traced memory is increased or decreased by more than - *threshold* bytes. Set the threshold to ``None`` to disable it. + Set the threshold of the traced memory. When scheduled, the task + is called when the traced memory is increased or decreased by more + than *threshold* bytes. Set the threshold to ``None`` to disable + it. The task is rescheduled if it was scheduled. @@ -976,8 +988,8 @@ TakeSnapshotTask ``take_snapshot()`` method: Take a snapshot and write it into a file. Return ``(snapshot, - filename)`` where *snapshot* is a ``Snapshot`` instance and filename - type is ``str``. + filename)`` where *snapshot* is a ``Snapshot`` instance and + filename type is ``str``. ``callback`` attribute: @@ -993,8 +1005,8 @@ TakeSnapshotTask * ``$pid``: identifier of the current process * ``$timestamp``: current date and time - * ``$counter``: counter starting at 1 and incremented at each snapshot, - formatted as 4 decimal digits + * ``$counter``: counter starting at 1 and incremented at each + snapshot, formatted as 4 decimal digits The default template is ``'tracemalloc-$counter.pickle'``. @@ -1034,5 +1046,15 @@ Similar projects: Copyright ========= -This document has been placed into the public domain. +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: