python-peps/peps/pep-0737.rst

PEP: 737
Title: Unify type name formatting
Author: Victor Stinner <vstinner@python.org>
Discussions-To: https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872
Status: Draft
Type: Standards Track
Created: 29-Nov-2023
Python-Version: 3.13
Post-History: `29-Nov-2023 <https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`__


Abstract
========

Add new convenient APIs to format type names the same way in Python and
in C. No longer format type names differently depending on how types are
implemented.

Recommend using the type fully qualified name in error messages and in
``__repr__()`` methods in new code. Recommend not truncating type names
in new code.

Add ``N`` and ``#N`` formats to ``type.__format__()`` to format a type
fully qualified name. For example, ``f"{type(obj):N}"`` formats the
fully qualified name of an object *obj*.

Add ``%T``, ``%#T``, ``%N`` and ``%#N`` formats to
``PyUnicode_FromFormat()`` to format the fully qualified, respectively,
of an object type and of a type.

Make C code safer by avoiding borrowed reference which can lead to
crashes. The new C API is compatible with the limited C API.


Rationale
=========

Standard library
----------------

In the Python standard library, formatting a type name or the type name
of an object is a common operation to format an error message and to
implement a ``__repr__()`` method. There are different ways to format a
type name which give different outputs.

Example with the ``datetime.timedelta`` type:

* The type short name (``type.__name__``) and the type qualified name
  (``type.__qualname__``) are ``'timedelta'``.
* The type module (``type.__module__``) is ``'datetime'``.
* The type fully qualified name is ``'datetime.timedelta'``.
* The type representation (``repr(type)``) contains the fully qualified
  name: ``<class 'datetime.timedelta'>``.


Python code
^^^^^^^^^^^

In Python, ``type.__name__`` gets the type short name, whereas
``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully
qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to
get the type of the object *obj*. Sometimes, the type name is put
between quotes.

Examples:

* ``raise TypeError("str expected, not %s" % type(value).__name__)``
* ``raise TypeError("can't serialize %s" % self.__class__.__name__)``
* ``name = "%s.%s" % (obj.__module__, obj.__qualname__)``

Qualified names were added to types (``type.__qualname__``) in Python
3.3 by :pep:`3155` "Qualified name for classes and functions".

C code
^^^^^^

In C, the most common way to format a type name is to get the
``PyTypeObject.tp_name`` member of the type. Example:

.. code-block:: c

    PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
                 Py_TYPE(globals)->tp_name);

The type "fully qualified name" is used in a few places:
``PyErr_Display()``, ``type.__repr__()`` implementation, and
``sys.unraisablehook`` implementation.

Using ``Py_TYPE(obj)->tp_name`` is preferred since it is more convenient
than calling ``PyType_GetQualName()`` which requires ``Py_DECREF()``.
Moreover, ``PyType_GetQualName()`` was only added recently, in Python
3.11.

Some functions use ``%R`` (``repr(type)``) to format a type name, the
output contains the type fully qualified name. Example:

.. code-block:: c

    PyErr_Format(PyExc_TypeError,
                 "calling %R should have returned an instance "
                 "of BaseException, not %R",
                 type, Py_TYPE(value));


Using PyTypeObject.tp_name is inconsistent with Python
------------------------------------------------------

The ``PyTypeObject.tp_name`` member is different depending on the type
implementation:

* Static types and heap types in C: *tp_name* is the type fully
  qualified name.
* Python class: *tp_name* is the type short name (``type.__name__``).

So using ``Py_TYPE(obj)->tp_name`` to format an object type name gives
a different output depending if a type is implemented in C or in Python.

It goes against :pep:`399` "Pure Python/C Accelerator Module
Compatibility Requirements" principles which recommends code behaves
the same way if written in Python or in C.

Example:

.. code-block:: pycon

    $ python3.12
    >>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
    >>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
    >>> my_list = list(range(3))

    >>> my_list[c_obj]  # C type
    TypeError: list indices must be integers or slices, not datetime.date

    >>> my_list[py_obj]  # Python type
    TypeError: list indices must be integers or slices, not date

The error message contains the type fully qualified name
(``datetime.date``) if the type is implemented in C, or the type short
name (``date``) if the type is implemented in Python.


Limited C API
-------------

The ``Py_TYPE(obj)->tp_name`` code cannot be used with the limited C
API, since the ``PyTypeObject`` members are excluded from the limited C
API.

The type name should be read using ``PyType_GetName()``,
``PyType_GetQualName()`` and ``PyType_GetModule()`` functions which are
less convenient to use.


Truncating type names in C
--------------------------

In 1998, when the ``PyErr_Format()`` function was added, the
implementation used a fixed buffer of 500 bytes. The function had the
following comment:

.. code-block:: c

    /* Caller is responsible for limiting the format */

In 2001, the function was modified to allocate a dynamic buffer on the
heap. Too late, the practice of truncating type names, like using the
``%.100s`` format, already became a habit, and developers forgot why
type names are truncated. In Python, type names are not truncated.

Truncating type names in C but not in Python goes against :pep:`399`
"Pure Python/C Accelerator Module Compatibility Requirements" principles
which recommends code behaves the same way if written in Python or in
C.

See the issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary
limit of 500 bytes is outdated
<https://github.com/python/cpython/issues/55042>`__ (2011).


Specification
=============

* Add ``type.__fully_qualified_name__`` attribute.
* Add ``type.__format__()`` method.
* Add formats to ``PyUnicode_FromFormat()``.
* Add ``PyType_GetModuleName()`` function.
* Add ``PyType_GetFullyQualifiedName()`` function.
* Recommend using the type fully qualified name in error messages and
  in ``__repr__()`` methods in new code.
* Recommend not truncating type names in new code.


Add type.__fully_qualified_name__ attribute
-------------------------------------------

Add ``type.__fully_qualified_name__`` read-only attribute, the fully
qualified name of a type: similar to
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
``type.__module__`` is not a string or is equal to ``"builtins"`` or is
equal to ``"__main__"``.

The ``type.__repr__()`` is left unchanged, it only omits the module if
the module is equal to ``"builtins"``.


Add type.__format__() method
----------------------------

Add ``type.__format__()`` method with the following formats:

* ``N`` formats the type **fully qualified name**
  (``type.__fully_qualified_name__``);
  ``N`` stands for **N**\ ame.
* ``#N`` (alternative form) formats the type **fully qualified name**
  using the **colon** (``:``) separator, instead of the dot separator
  (``.``), between the module name and the qualified name.

Examples using f-string::

    >>> import datetime
    >>> f"{datetime.timedelta:N}"  # fully qualified name
    'datetime.timedelta'
    >>> f"{datetime.timedelta:#N}" # fully qualified name, colon separator
    'datetime:timedelta'

The colon (``:``) separator used by the ``#N`` format eliminates
guesswork when you want to import the name, see
``pkgutil.resolve_name()``, ``python -m inspect`` command line
interface, and ``setuptools`` entry points.


Add formats to PyUnicode_FromFormat()
-------------------------------------

Add the following formats to ``PyUnicode_FromFormat()``:

* ``%N`` formats the **fully qualified name** of a **type**
  (``type.__fully_qualified_name__``); **N** stands for type **N**\ ame.
* ``%T`` formats the type **fully qualified name** of an **object**
  (``type(obj).__fully_qualified_name__``); **T** stands for object
  **T**\ ype.
* ``%#N`` and ``%#T``: the alternative form uses the **colon** separator
  (``:``), instead of the dot separator (``.``), between the module name
  and the qualified name.

For example, the existing code using *tp_name*:

.. code-block:: c

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %.200s",
                 Py_TYPE(result)->tp_name);

can be replaced with the ``%T`` format:

.. code-block:: c

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %T", result);

Advantages of the updated code:

* Safer C code: avoid ``Py_TYPE()`` which returns a borrowed reference.
* The ``PyTypeObject.tp_name`` member is no longer read explicitly: the
  code becomes compatible with the limited C API.
* The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded
  from UTF-8 at each ``PyErr_Format()`` call, since
  ``type.__fully_qualified_name__`` is already a Unicode string.
* The formatted type name no longer depends on the type implementation.
* The type name is no longer truncated.

Note: The ``%T`` format is used by ``time.strftime()``, but not by
``printf()``.


Formats Summary
---------------

.. list-table::
   :header-rows: 1

   * - C object
     - C type
     - Python
     - Format
   * - ``%T``
     - ``%N``
     - ``:N``
     - Type **fully qualified** name.
   * - ``%#T``
     - ``%#N``
     - ``:#N``
     - Type **fully qualified** name, **colon** separator.

Add PyType_GetModuleName() function
-----------------------------------

Add the ``PyType_GetModuleName()`` function to get the module name of a
type (``type.__module__``). API:

.. code-block:: c

    PyObject* PyType_GetModuleName(PyTypeObject *type)

On success, return a new reference to the string. On error, raise an
exception and return ``NULL``.


Add PyType_GetFullyQualifiedName() function
-------------------------------------------

Add the ``PyType_GetFullyQualifiedName()`` function to get the fully
qualified name of a type (``type.__fully_qualified_name__``). API:

.. code-block:: c

    PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)

On success, return a new reference to the string. On error, raise an
exception and return ``NULL``.


Recommend using the type fully qualified name
---------------------------------------------

The type fully qualified name is recommended in error messages and in
``__repr__()`` methods in new code.

In non-trivial applications, it is likely to have two types with the
same short name defined in two different modules, especially with
generic names. Using the fully qualified name helps identifying the type
in an unambiguous way.


Recommend not truncating type names
-----------------------------------

Type names should not be truncated in new code. For example, the
``%.100s`` format should be avoided: use the ``%s`` format instead (or
``%T`` format in C).


Implementation
==============

* Pull request: `Add type.__fully_qualified_name__ attribute <https://github.com/python/cpython/pull/112133>`_.
* Pull request: `Add %T format to PyUnicode_FromFormat() <https://github.com/python/cpython/pull/111703>`_.


Backwards Compatibility
=======================

Changes proposed in this PEP are backward compatible.

Adding new APIs has no effect on the backward compatibility. Existing
APIs are left unchanged.

Replacing the type short name with the type fully qualified name is only
recommended in new code. No longer truncating type names is only
recommended in new code. Existing code should be left unchanged and so
remains backward compatible.


Rejected Ideas
==============

Change str(type)
----------------

The ``type.__str__()`` method can be modified to format a type name
differently. For example, it can return the type fully qualified name.

The problem is that it's a backward incompatible change. For example,
``enum``, ``functools``, ``optparse``, ``pdb`` and ``xmlrpc.server``
modules of the standard library have to be updated.
``test_dataclasses``, ``test_descrtut`` and ``test_cmd_line_script``
tests have to be updated as well.

See the `pull request: type(str) returns the fully qualified name
<https://github.com/python/cpython/pull/112129>`_.


Add !t formatter to get an object type
--------------------------------------

Use ``f"{obj!t:T}"`` to format ``type(obj).__fully_qualified_name__``,
similar to ``f"{type(obj):T}"``.

When the ``!t`` formatter was proposed in 2018, `Eric Smith was stronly
opposed to this
<https://mail.python.org/archives/list/python-dev@python.org/message/BMIW3FEB77OS7OB3YYUUDUBITPWLRG3U/>`_;
Eric is the author of the f-string :pep:`498` "Literal String Interpolation".


Add formats to str % args
-------------------------

It was proposed to add formats to format a type name in ``str % arg``.
For example, add the ``%T`` format to format a type fully qualified
name.

Nowadays, f-strings are preferred for new code.


Other ways to format type names in C
------------------------------------

The ``printf()`` function supports multiple size modifiers: ``hh``
(``char``), ``h`` (``short``), ``l`` (``long``), ``ll`` (``long long``),
``z`` (``size_t``), ``t`` (``ptrdiff_t``) and ``j`` (``intmax_t``).
The ``PyUnicode_FromFormat()`` function supports most of them.

Proposed formats using ``h`` and ``hh`` length modifiers:

* ``%hhT`` formats ``type.__name__``.
* ``%hT`` formats ``type.__qualname__``.
* ``%T`` formats ``type.__fully_qualified_name__``.

Length modifiers are used to specify the C type of the argument, not to
change how an argument is formatted. The alternate form (``#``) changes
how an argument is formatted. Here the argument C type is always
``PyObject*``.

Other proposed formats:

* ``%Q``
* ``%t``.
* ``%lT`` formats ``type.__fully_qualified_name__``.
* ``%Tn`` formats ``type.__name__``.
* ``%Tq`` formats ``type.__qualname__``.
* ``%Tf`` formats ``type.__fully_qualified_name__``.

Having more options to format type names can lead to inconsistencies
between different modules and make the API more error prone.

About the ``%t`` format, ``printf()`` now uses ``t`` as a length
modifier for ``ptrdiff_t`` argument.

The following APIs to be used to format a type:

.. list-table::
   :header-rows: 1

   * - C API
     - Python API
     - Format
   * - ``PyType_GetName()``
     - ``type.__name__``
     - Type **short** name.
   * - ``PyType_GetQualName()``
     - ``type.__qualname__``
     - Type **qualified** name.
   * - ``PyType_GetModuleName()``
     - ``type.__module__``
     - Type **module** name.


Use %T format with Py_TYPE(): pass a type
-----------------------------------------

It was proposed to pass a type to the ``%T`` format, like:

.. code-block:: c

    PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));

The ``Py_TYPE()`` functions returns a borrowed reference. Just to format
an error, using a borrowed reference to a type looks safe. In practice,
it can lead to crash. Example::

    import gc
    import my_cext

    class ClassA:
        pass

    def create_object():
         class ClassB:
              def __repr__(self):
                    self.__class__ = ClassA
                    gc.collect()
                    return "ClassB repr"
         return ClassB()

    obj = create_object()
    my_cext.func(obj)

where ``my_cext.func()`` is a C function which calls::

    PyErr_Format(PyExc_ValueError,
                 "Unexpected value %R of type %T",
                 obj, Py_TYPE(obj));

``PyErr_Format()`` is called with a borrowed reference to ``ClassB``.
When ``repr(obj)`` is called by the ``%R`` format, the last reference to
``ClassB`` is removed and the class is deallocated. When the ``%T``
format is proceed, ``Py_TYPE(obj)`` is already a dangling pointer and
Python does crash.


Other proposed APIs to get a type fully qualified name
------------------------------------------------------

* Add ``type.__fullyqualname__`` attribute: name without underscore
  between words. Several dunders, including some of the most recently
  added ones, include an underscore in the word:
  ``__class_getitem__``, ``__release_buffer__``, ``__type_params__``,
  ``__init_subclass__`` and ``__text_signature__``.
* Add ``type.__fqn__`` attribute: FQN name stands for **F**\ ully
  **Q**\ ualified **N**\ ame.
* Add ``type.fully_qualified_name()`` method. Methods added to ``type``
  are inherited by all types and so can affect existing code.
* Add a function to the ``inspect`` module. Need to import the
  ``inspect`` module to use it.


Include the __main__ module in the type fully qualified name
------------------------------------------------------------

Format ``type.__fully_qualified_name__`` as
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
``type.__module__`` is not a string or is equal to ``"builtins"``.  Do
not treat the ``__main__`` module differently: include it in the name.

Existing code such as ``type.__repr__()``, ``collections.abc`` and
``unittest`` modules format a type name with
``f'{obj.__module__}.{obj.__qualname__}'`` and only omit the module part
if the module is equal to ``builtins``.

Only the ``traceback`` and ``pdb`` modules also omit the module if it's
equal to ``"builtins"`` or ``"__main__"``.

The ``type.__fully_qualified_name__`` attribute omits the ``__main__``
module to produce shorter names for a common case: types defined in a
script run with ``python script.py``. For debugging, the ``repr()``
function can be used on a type, it includes the ``__main__`` module in
the type name. Or use ``f"{type.__module__}.{type.__qualname__}"``
format to always include the module name, even for the ``"builtins"``
module.

Example of script::

    class MyType:
        pass

    print(f"name: {MyType.__fully_qualified_name__}")
    print(f"repr: {repr(MyType)}")

Output::

    name: MyType
    repr: <class '__main__.MyType'>


Discussions
===========

* Discourse: `PEP 737 – Unify type name formatting
  <https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`_
  (2023).
* Discourse: `Enhance type name formatting when raising an exception:
  add %T format in C, and add type.__fullyqualname__
  <https://discuss.python.org/t/enhance-type-name-formatting-when-raising-an-exception-add-t-format-in-c-and-add-type-fullyqualname/38129>`_
  (2023).
* Issue: `PyUnicode_FromFormat(): Add %T format to format the type name
  of an object <https://github.com/python/cpython/issues/111696>`_
  (2023).
* Issue: `C API: Investigate how the PyTypeObject members can be removed
  from the public C API
  <https://github.com/python/cpython/issues/105970>`_ (2023).
* python-dev thread: `bpo-34595: How to format a type name?
  <https://mail.python.org/archives/list/python-dev@python.org/thread/HKYUMTVHNBVB5LJNRMZ7TPUQKGKAERCJ/>`_
  (2018).
* Issue: `PyUnicode_FromFormat(): add %T format for an object type name
  <https://github.com/python/cpython/issues/78776>`_ (2018).
* Issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary limit of
  500 bytes is outdated
  <https://github.com/python/cpython/issues/55042>`__ (2011).


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.