python-peps/pep-0454.txt

515 lines
14 KiB
Plaintext

PEP: 454
Title: Add a new tracemalloc module to trace Python memory allocations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-September-2013
Python-Version: 3.4
Abstract
========
Add a new ``tracemalloc`` module to trace Python memory allocations.
Rationale
=========
Common debug tools tracing memory allocations read the C filename and
number. Using such tool to analyze Python memory allocations does not
help because most memory block are allocated in the same C function,
in ``PyMem_Malloc()`` for example.
There are debug tools dedicated to the Python language like ``Heapy``
and ``PySizer``. These projects analyze objects type and/or content.
These tools are useful when most memory leaks are instances of the
same type and this type is only instancied in a few functions. The
problem is when the object type is very common like ``str`` or
``tuple``, and it is hard to identify where these objects are
instancied.
Finding reference cycles is also a difficult problem. There are
different tools to draw a diagram of all references. These tools cannot
be used on large applications with thousands of objects because the
diagram is too huge to be analyzed manually.
Proposal
========
Using the PEP 445, it becomes easy to setup an hook on Python memory
allocators. The hook can inspect the current Python frame to get the
Python filename and line number.
This PEP proposes to add a new ``tracemalloc`` module. It is a debug
tool to trace memory allocations made by Python. The module provides the
following information:
* Statistics on Python memory allocations per Python filename and line
number: size, number, and average size of allocations
* Compute differences between two snapshots of Python memory allocations
* Location of a Python memory allocation: size in bytes, Python filename
and line number
The API of the tracemalloc module is similar to the API of the
faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()``
functions, an environment variable (``PYTHONFAULTHANDLER`` and
``PYTHONTRACEMALLOC``), a ``-X`` command line option (``-X
faulthandler`` and ``-X tracemalloc``). See the
`documentation of the faulthandler module
<http://docs.python.org/dev/library/faulthandler.html>`_.
The tracemalloc module has been written for CPython. Other
implementations of Python may not provide it.
API
===
To trace the most Python memory allocations, the module should be
enabled as early as possible in your application by calling
``tracemalloc.enable()`` function, by setting the ``PYTHONTRACEMALLOC``
environment variable to ``1``, or by using ``-X tracemalloc`` command
line option.
By default, tracemalloc only stores one ``frame`` instance per memory
allocation. Use ``tracemalloc.set_number_frame()`` to store more frames.
Functions
---------
``add_filter(include: bool, filename: str, lineno: int=None)`` function:
Add a filter. If *include* is ``True``, only trace memory blocks
allocated in a file with a name matching *filename*. If
*include* is ``False``, don't trace memory blocks allocated in a
file with a name matching *filename*.
The match is done using *filename* as a prefix. For example,
``'/usr/bin/'`` only matchs files the ``/usr/bin`` directories. The
``.pyc`` and ``.pyo`` suffixes are automatically replaced with
``.py`` when matching the filename.
*lineno* is a line number. If *lineno* is ``None`` or lesser than
``1``, it matches any line number.
``clear_filters()`` function:
Reset the filter list.
``clear_traces()`` function:
Clear all traces and statistics of memory allocations.
``disable()`` function:
Stop tracing Python memory allocations and stop the timer started by
``start_timer()``.
``enable()`` function:
Start tracing Python memory allocations.
``get_filters()`` function:
Get the filters as list of
``(include: bool, filename: str, lineno: int)`` tuples.
If *lineno* is ``None``, a filter matchs any line number.
By default, the filename of the Python tracemalloc module
(``tracemalloc.py``) is excluded.
``get_number_frame()`` function:
Get the maximum number of frames stored in a trace of a memory
allocation.
``get_object_address(obj)`` function:
Get the address of the memory block of the specified Python object.
``get_object_trace(obj)`` function:
Get the trace of a Python object *obj* as a ``trace`` instance.
Return ``None`` if the tracemalloc module did not save the location
when the object was allocated, for example if the module was
disabled.
``get_process_memory()`` function:
Get the memory usage of the current process as a meminfo namedtuple
with two attributes:
* ``rss``: Resident Set Size in bytes
* ``vms``: size of the virtual memory in bytes
Return ``None`` if the platform is not supported.
Use the ``psutil`` module if available.
``get_stats()`` function:
Get statistics on Python memory allocations per Python filename and
per Python line number.
Return a dictionary
``{filename: str -> {line_number: int -> stats: line_stat}}``
where *stats* in a ``line_stat`` instance. *filename* and
*line_number* can be ``None``.
Return an empty dictionary if the tracemalloc module is disabled.
``get_tracemalloc_size()`` function:
Get the memory usage in bytes of the ``tracemalloc`` module.
``get_traces(obj)`` function:
Get all traces of a Python memory allocations.
Return a dictionary ``{pointer: int -> trace}`` where *trace*
is a ``trace`` instance.
Return an empty dictionary if the ``tracemalloc`` module is disabled.
``is_enabled()`` function:
Get the status of the module: ``True`` if it is enabled, ``False``
otherwise.
``set_number_frame(nframe: int)`` function:
Set the maximum number of frames stored in a trace of a memory
allocation.
All traces and statistics of memory allocations are cleared.
``start_timer(delay: int, func: callable, args: tuple=(), kwargs: dict={})`` function:
Start a timer calling ``func(*args, **kwargs)`` every *delay*
seconds.
The timer is based on the Python memory allocator, it is not real
time. *func* is called after at least *delay* seconds, it is not
called exactly after *delay* seconds if no Python memory allocation
occurred.
If ``start_timer()`` is called twice, previous parameters are
replaced. The timer has a resolution of 1 second.
``start_timer()`` is used by ``DisplayTop`` and ``TakeSnapshot`` to
run regulary a task.
``stop_timer()`` function:
Stop the timer started by ``start_timer()``.
frame class
-----------
``frame`` class:
Trace of a Python frame.
``filename`` attribute (``str``):
Python filename, ``None`` if unknown.
``lineno`` attribute (``int``):
Python line number, ``None`` if unknown.
trace class
-----------
``trace`` class:
This class represents debug information of an allocated memory block.
``size`` attribute (``int``):
Size in bytes of the memory block.
``frames`` attribute (``list``):
Traceback where the memory block was allocated as a list of
``frame`` instances (most recent first).
The list can be empty or incomplete if the tracemalloc module was
unable to retrieve the full traceback.
For efficiency, the traceback is truncated to 10 frames.
line_stat class
----------------
``line_stat`` class:
Statistics on Python memory allocations of a specific line number.
``size`` attribute (``int``):
Total size in bytes of all memory blocks allocated on the line.
``count`` attribute (``int``):
Number of memory blocks allocated on the line.
DisplayTop class
----------------
``DisplayTop(count: int=10, file=sys.stdout)`` class:
Display the list of the *count* biggest memory allocations into
*file*.
``display()`` method:
Display the top once.
``start(delay: int)`` method:
Start a task using ``tracemalloc`` timer to display the top every
*delay* seconds.
``stop()`` method:
Stop the task started by the ``DisplayTop.start()`` method
``color`` attribute (``bool``, default: ``file.isatty()``):
If ``True``, ``display()`` uses color.
``compare_with_previous`` attribute (``bool``, default: ``True``):
If ``True``, ``display()`` compares with the
previous snapshot. If ``False``, compare with the first snapshot.
``filename_parts`` attribute (``int``, default: ``3``):
Number of displayed filename parts. Extra parts are replaced
with ``"..."``.
``group_per_file`` attribute (``bool``, default: ``False``):
If ``True``, group memory allocations per Python filename. If
``False``, group allocation per Python line number.
``show_average`` attribute (``bool``, default: ``True``):
If ``True``, ``display()`` shows the average size
of allocations.
``show_count`` attribute (``bool``, default: ``True``):
If ``True``, ``display()`` shows the number of
allocations.
``show_size`` attribute (``bool``, default: ``True``):
If ``True``, ``display()`` shows the size of
allocations.
``user_data_callback`` attribute (``callable``, default: ``None``):
Optional callback collecting user data. See ``Snapshot.create()``.
Snapshot class
--------------
``Snapshot()`` class:
Snapshot of Python memory allocations.
Use ``TakeSnapshot`` to take regulary snapshots.
``create(user_data_callback=None)`` method:
Take a snapshot. If *user_data_callback* is specified, it must be a
callable object returning a list of
``(title: str, format: str, value: int)``.
*format* must be ``'size'``. The list must always have the same
length and the same order to be able to compute differences between
values.
Example: ``[('Video memory', 'size', 234902)]``.
``filter_filenames(patterns: list, include: bool)`` method:
Remove filenames not matching any pattern of *patterns* if *include*
is ``True``, or remove filenames matching a pattern of *patterns* if
*include* is ``False`` (exclude).
See ``fnmatch.fnmatch()`` for the syntax of a pattern.
``load(filename)`` classmethod:
Load a snapshot from a file.
``write(filename)`` method:
Write the snapshot into a file.
``pid`` attribute (``int``):
Identifier of the process which created the snapshot.
``process_memory`` attribute:
Result of the ``get_process_memory()`` function, can be ``None``.
``stats`` attribute (``dict``):
Result of the ``get_stats()`` function.
``tracemalloc_size`` attribute (``int``):
The memory usage in bytes of the ``tracemalloc`` module,
result of the ``get_tracemalloc_size()`` function.
``timestamp`` attribute (``datetime.datetime``):
Creation date and time of the snapshot.
``user_data`` attribute (``list``, default: ``None``):
Optional list of user data, result of *user_data_callback* in
``Snapshot.create()``.
TakeSnapshot class
------------------
``TakeSnapshot`` class:
Task taking snapshots of Python memory allocations: write them into
files. By default, snapshots are written in the current directory.
``start(delay: int)`` method:
Start a task taking a snapshot every delay seconds.
``stop()`` method:
Stop the task started by the ``TakeSnapshot.start()`` method.
``take_snapshot()`` method:
Take a snapshot.
``filename_template`` attribute (``str``,
default: ``'tracemalloc-$counter.pickle'``):
Template used to create a filename. The following variables can be
used in the template:
* ``$pid``: identifier of the current process
* ``$timestamp``: current date and time
* ``$counter``: counter starting at 1 and incremented at each snapshot
``user_data_callback`` attribute (``callable``, default: ``None``):
Optional callback collecting user data. See ``Snapshot.create()``.
Command line options
====================
The ``python -m tracemalloc`` command can be used to analyze and compare
snapshots. The command takes a list of snapshot filenames and has the
following options.
``-g``, ``--group-per-file``
Group allocations per filename, instead of grouping per line number.
``-n NTRACES``, ``--number NTRACES``
Number of traces displayed per top (default: 10).
``--first``
Compare with the first snapshot, instead of comparing with the
previous snapshot.
``--include PATTERN``
Only include filenames matching pattern *PATTERN*. The option can be
specified multiple times.
See ``fnmatch.fnmatch()`` for the syntax of patterns.
``--exclude PATTERN``
Exclude filenames matching pattern *PATTERN*. The option can be
specified multiple times.
See ``fnmatch.fnmatch()`` for the syntax of patterns.
``-S``, ``--hide-size``
Hide the size of allocations.
``-C``, ``--hide-count``
Hide the number of allocations.
``-A``, ``--hide-average``
Hide the average size of allocations.
``-P PARTS``, ``--filename-parts=PARTS``
Number of displayed filename parts (default: 3).
``--color``
Force usage of colors even if ``sys.stdout`` is not a TTY device.
``--no-color``
Disable colors if ``sys.stdout`` is a TTY device.
Links
=====
tracemalloc:
* `#18874: Add a new tracemalloc module to trace Python
memory allocations <http://bugs.python.org/issue18874>`_
* `pytracemalloc on PyPI
<https://pypi.python.org/pypi/pytracemalloc>`_
Similar projects:
* `Meliae: Python Memory Usage Analyzer
<https://pypi.python.org/pypi/meliae>`_
* `Guppy-PE: umbrella package combining Heapy and GSL
<http://guppy-pe.sourceforge.net/>`_
* `PySizer <http://pysizer.8325.org/>`_: developed for Python 2.4
* `memory_profiler <https://pypi.python.org/pypi/memory_profiler>`_
* `pympler <http://code.google.com/p/pympler/>`_
* `Dozer <https://pypi.python.org/pypi/Dozer>`_: WSGI Middleware version of
the CherryPy memory leak debugger
* `objgraph <http://mg.pov.lt/objgraph/>`_
* `caulk <https://github.com/smartfile/caulk/>`_
Copyright
=========
This document has been placed into the public domain.