586 lines
19 KiB
ReStructuredText
586 lines
19 KiB
ReStructuredText
PEP: 737
|
||
Title: Unify type name formatting
|
||
Author: Victor Stinner <vstinner@python.org>
|
||
Discussions-To: https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 29-Nov-2023
|
||
Python-Version: 3.13
|
||
Post-History: `29-Nov-2023 <https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`__
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
Add new convenient APIs to format type names the same way in Python and
|
||
in C. No longer format type names differently depending on how types are
|
||
implemented.
|
||
|
||
Recommend using the type fully qualified name in error messages and in
|
||
``__repr__()`` methods in new code. Recommend not truncating type names
|
||
in new code.
|
||
|
||
Add ``N`` and ``#N`` formats to ``type.__format__()`` to format a type
|
||
fully qualified name. For example, ``f"{type(obj):N}"`` formats the
|
||
fully qualified name of an object *obj*.
|
||
|
||
Add ``%T``, ``%#T``, ``%N`` and ``%#N`` formats to
|
||
``PyUnicode_FromFormat()`` to format the fully qualified, respectively,
|
||
of an object type and of a type.
|
||
|
||
Make C code safer by avoiding borrowed reference which can lead to
|
||
crashes. The new C API is compatible with the limited C API.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Standard library
|
||
----------------
|
||
|
||
In the Python standard library, formatting a type name or the type name
|
||
of an object is a common operation to format an error message and to
|
||
implement a ``__repr__()`` method. There are different ways to format a
|
||
type name which give different outputs.
|
||
|
||
Example with the ``datetime.timedelta`` type:
|
||
|
||
* The type short name (``type.__name__``) and the type qualified name
|
||
(``type.__qualname__``) are ``'timedelta'``.
|
||
* The type module (``type.__module__``) is ``'datetime'``.
|
||
* The type fully qualified name is ``'datetime.timedelta'``.
|
||
* The type representation (``repr(type)``) contains the fully qualified
|
||
name: ``<class 'datetime.timedelta'>``.
|
||
|
||
|
||
Python code
|
||
^^^^^^^^^^^
|
||
|
||
In Python, ``type.__name__`` gets the type short name, whereas
|
||
``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully
|
||
qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to
|
||
get the type of the object *obj*. Sometimes, the type name is put
|
||
between quotes.
|
||
|
||
Examples:
|
||
|
||
* ``raise TypeError("str expected, not %s" % type(value).__name__)``
|
||
* ``raise TypeError("can't serialize %s" % self.__class__.__name__)``
|
||
* ``name = "%s.%s" % (obj.__module__, obj.__qualname__)``
|
||
|
||
Qualified names were added to types (``type.__qualname__``) in Python
|
||
3.3 by :pep:`3155` "Qualified name for classes and functions".
|
||
|
||
C code
|
||
^^^^^^
|
||
|
||
In C, the most common way to format a type name is to get the
|
||
``PyTypeObject.tp_name`` member of the type. Example:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
|
||
Py_TYPE(globals)->tp_name);
|
||
|
||
The type "fully qualified name" is used in a few places:
|
||
``PyErr_Display()``, ``type.__repr__()`` implementation, and
|
||
``sys.unraisablehook`` implementation.
|
||
|
||
Using ``Py_TYPE(obj)->tp_name`` is preferred since it is more convenient
|
||
than calling ``PyType_GetQualName()`` which requires ``Py_DECREF()``.
|
||
Moreover, ``PyType_GetQualName()`` was only added recently, in Python
|
||
3.11.
|
||
|
||
Some functions use ``%R`` (``repr(type)``) to format a type name, the
|
||
output contains the type fully qualified name. Example:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"calling %R should have returned an instance "
|
||
"of BaseException, not %R",
|
||
type, Py_TYPE(value));
|
||
|
||
|
||
Using PyTypeObject.tp_name is inconsistent with Python
|
||
------------------------------------------------------
|
||
|
||
The ``PyTypeObject.tp_name`` member is different depending on the type
|
||
implementation:
|
||
|
||
* Static types and heap types in C: *tp_name* is the type fully
|
||
qualified name.
|
||
* Python class: *tp_name* is the type short name (``type.__name__``).
|
||
|
||
So using ``Py_TYPE(obj)->tp_name`` to format an object type name gives
|
||
a different output depending if a type is implemented in C or in Python.
|
||
|
||
It goes against :pep:`399` "Pure Python/C Accelerator Module
|
||
Compatibility Requirements" principles which recommends code behaves
|
||
the same way if written in Python or in C.
|
||
|
||
Example:
|
||
|
||
.. code-block:: pycon
|
||
|
||
$ python3.12
|
||
>>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
|
||
>>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
|
||
>>> my_list = list(range(3))
|
||
|
||
>>> my_list[c_obj] # C type
|
||
TypeError: list indices must be integers or slices, not datetime.date
|
||
|
||
>>> my_list[py_obj] # Python type
|
||
TypeError: list indices must be integers or slices, not date
|
||
|
||
The error message contains the type fully qualified name
|
||
(``datetime.date``) if the type is implemented in C, or the type short
|
||
name (``date``) if the type is implemented in Python.
|
||
|
||
|
||
Limited C API
|
||
-------------
|
||
|
||
The ``Py_TYPE(obj)->tp_name`` code cannot be used with the limited C
|
||
API, since the ``PyTypeObject`` members are excluded from the limited C
|
||
API.
|
||
|
||
The type name should be read using ``PyType_GetName()``,
|
||
``PyType_GetQualName()`` and ``PyType_GetModule()`` functions which are
|
||
less convenient to use.
|
||
|
||
|
||
Truncating type names in C
|
||
--------------------------
|
||
|
||
In 1998, when the ``PyErr_Format()`` function was added, the
|
||
implementation used a fixed buffer of 500 bytes. The function had the
|
||
following comment:
|
||
|
||
.. code-block:: c
|
||
|
||
/* Caller is responsible for limiting the format */
|
||
|
||
In 2001, the function was modified to allocate a dynamic buffer on the
|
||
heap. Too late, the practice of truncating type names, like using the
|
||
``%.100s`` format, already became a habit, and developers forgot why
|
||
type names are truncated. In Python, type names are not truncated.
|
||
|
||
Truncating type names in C but not in Python goes against :pep:`399`
|
||
"Pure Python/C Accelerator Module Compatibility Requirements" principles
|
||
which recommends code behaves the same way if written in Python or in
|
||
C.
|
||
|
||
See the issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary
|
||
limit of 500 bytes is outdated
|
||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
* Add ``type.__fully_qualified_name__`` attribute.
|
||
* Add ``type.__format__()`` method.
|
||
* Add formats to ``PyUnicode_FromFormat()``.
|
||
* Add ``PyType_GetModuleName()`` function.
|
||
* Add ``PyType_GetFullyQualifiedName()`` function.
|
||
* Recommend using the type fully qualified name in error messages and
|
||
in ``__repr__()`` methods in new code.
|
||
* Recommend not truncating type names in new code.
|
||
|
||
|
||
Add type.__fully_qualified_name__ attribute
|
||
-------------------------------------------
|
||
|
||
Add ``type.__fully_qualified_name__`` read-only attribute, the fully
|
||
qualified name of a type: similar to
|
||
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
|
||
``type.__module__`` is not a string or is equal to ``"builtins"`` or is
|
||
equal to ``"__main__"``.
|
||
|
||
The ``type.__repr__()`` is left unchanged, it only omits the module if
|
||
the module is equal to ``"builtins"``.
|
||
|
||
|
||
Add type.__format__() method
|
||
----------------------------
|
||
|
||
Add ``type.__format__()`` method with the following formats:
|
||
|
||
* ``N`` formats the type **fully qualified name**
|
||
(``type.__fully_qualified_name__``);
|
||
``N`` stands for **N**\ ame.
|
||
* ``#N`` (alternative form) formats the type **fully qualified name**
|
||
using the **colon** (``:``) separator, instead of the dot separator
|
||
(``.``), between the module name and the qualified name.
|
||
|
||
Examples using f-string::
|
||
|
||
>>> import datetime
|
||
>>> f"{datetime.timedelta:N}" # fully qualified name
|
||
'datetime.timedelta'
|
||
>>> f"{datetime.timedelta:#N}" # fully qualified name, colon separator
|
||
'datetime:timedelta'
|
||
|
||
The colon (``:``) separator used by the ``#N`` format eliminates
|
||
guesswork when you want to import the name, see
|
||
``pkgutil.resolve_name()``, ``python -m inspect`` command line
|
||
interface, and ``setuptools`` entry points.
|
||
|
||
|
||
Add formats to PyUnicode_FromFormat()
|
||
-------------------------------------
|
||
|
||
Add the following formats to ``PyUnicode_FromFormat()``:
|
||
|
||
* ``%N`` formats the **fully qualified name** of a **type**
|
||
(``type.__fully_qualified_name__``); **N** stands for type **N**\ ame.
|
||
* ``%T`` formats the type **fully qualified name** of an **object**
|
||
(``type(obj).__fully_qualified_name__``); **T** stands for object
|
||
**T**\ ype.
|
||
* ``%#N`` and ``%#T``: the alternative form uses the **colon** separator
|
||
(``:``), instead of the dot separator (``.``), between the module name
|
||
and the qualified name.
|
||
|
||
For example, the existing code using *tp_name*:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"__format__ must return a str, not %.200s",
|
||
Py_TYPE(result)->tp_name);
|
||
|
||
can be replaced with the ``%T`` format:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"__format__ must return a str, not %T", result);
|
||
|
||
Advantages of the updated code:
|
||
|
||
* Safer C code: avoid ``Py_TYPE()`` which returns a borrowed reference.
|
||
* The ``PyTypeObject.tp_name`` member is no longer read explicitly: the
|
||
code becomes compatible with the limited C API.
|
||
* The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded
|
||
from UTF-8 at each ``PyErr_Format()`` call, since
|
||
``type.__fully_qualified_name__`` is already a Unicode string.
|
||
* The formatted type name no longer depends on the type implementation.
|
||
* The type name is no longer truncated.
|
||
|
||
Note: The ``%T`` format is used by ``time.strftime()``, but not by
|
||
``printf()``.
|
||
|
||
|
||
Formats Summary
|
||
---------------
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - C object
|
||
- C type
|
||
- Python
|
||
- Format
|
||
* - ``%T``
|
||
- ``%N``
|
||
- ``:N``
|
||
- Type **fully qualified** name.
|
||
* - ``%#T``
|
||
- ``%#N``
|
||
- ``:#N``
|
||
- Type **fully qualified** name, **colon** separator.
|
||
|
||
Add PyType_GetModuleName() function
|
||
-----------------------------------
|
||
|
||
Add the ``PyType_GetModuleName()`` function to get the module name of a
|
||
type (``type.__module__``). API:
|
||
|
||
.. code-block:: c
|
||
|
||
PyObject* PyType_GetModuleName(PyTypeObject *type)
|
||
|
||
On success, return a new reference to the string. On error, raise an
|
||
exception and return ``NULL``.
|
||
|
||
|
||
Add PyType_GetFullyQualifiedName() function
|
||
-------------------------------------------
|
||
|
||
Add the ``PyType_GetFullyQualifiedName()`` function to get the fully
|
||
qualified name of a type (``type.__fully_qualified_name__``). API:
|
||
|
||
.. code-block:: c
|
||
|
||
PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)
|
||
|
||
On success, return a new reference to the string. On error, raise an
|
||
exception and return ``NULL``.
|
||
|
||
|
||
Recommend using the type fully qualified name
|
||
---------------------------------------------
|
||
|
||
The type fully qualified name is recommended in error messages and in
|
||
``__repr__()`` methods in new code.
|
||
|
||
In non-trivial applications, it is likely to have two types with the
|
||
same short name defined in two different modules, especially with
|
||
generic names. Using the fully qualified name helps identifying the type
|
||
in an unambiguous way.
|
||
|
||
|
||
Recommend not truncating type names
|
||
-----------------------------------
|
||
|
||
Type names should not be truncated in new code. For example, the
|
||
``%.100s`` format should be avoided: use the ``%s`` format instead (or
|
||
``%T`` format in C).
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
* Pull request: `Add type.__fully_qualified_name__ attribute <https://github.com/python/cpython/pull/112133>`_.
|
||
* Pull request: `Add %T format to PyUnicode_FromFormat() <https://github.com/python/cpython/pull/111703>`_.
|
||
|
||
|
||
Backwards Compatibility
|
||
=======================
|
||
|
||
Changes proposed in this PEP are backward compatible.
|
||
|
||
Adding new APIs has no effect on the backward compatibility. Existing
|
||
APIs are left unchanged.
|
||
|
||
Replacing the type short name with the type fully qualified name is only
|
||
recommended in new code. No longer truncating type names is only
|
||
recommended in new code. Existing code should be left unchanged and so
|
||
remains backward compatible.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Change str(type)
|
||
----------------
|
||
|
||
The ``type.__str__()`` method can be modified to format a type name
|
||
differently. For example, it can return the type fully qualified name.
|
||
|
||
The problem is that it's a backward incompatible change. For example,
|
||
``enum``, ``functools``, ``optparse``, ``pdb`` and ``xmlrpc.server``
|
||
modules of the standard library have to be updated.
|
||
``test_dataclasses``, ``test_descrtut`` and ``test_cmd_line_script``
|
||
tests have to be updated as well.
|
||
|
||
See the `pull request: type(str) returns the fully qualified name
|
||
<https://github.com/python/cpython/pull/112129>`_.
|
||
|
||
|
||
Add !t formatter to get an object type
|
||
--------------------------------------
|
||
|
||
Use ``f"{obj!t:T}"`` to format ``type(obj).__fully_qualified_name__``,
|
||
similar to ``f"{type(obj):T}"``.
|
||
|
||
When the ``!t`` formatter was proposed in 2018, `Eric Smith was stronly
|
||
opposed to this
|
||
<https://mail.python.org/archives/list/python-dev@python.org/message/BMIW3FEB77OS7OB3YYUUDUBITPWLRG3U/>`_;
|
||
Eric is the author of the f-string :pep:`498` "Literal String Interpolation".
|
||
|
||
|
||
Add formats to str % args
|
||
-------------------------
|
||
|
||
It was proposed to add formats to format a type name in ``str % arg``.
|
||
For example, add the ``%T`` format to format a type fully qualified
|
||
name.
|
||
|
||
Nowadays, f-strings are preferred for new code.
|
||
|
||
|
||
Other ways to format type names in C
|
||
------------------------------------
|
||
|
||
The ``printf()`` function supports multiple size modifiers: ``hh``
|
||
(``char``), ``h`` (``short``), ``l`` (``long``), ``ll`` (``long long``),
|
||
``z`` (``size_t``), ``t`` (``ptrdiff_t``) and ``j`` (``intmax_t``).
|
||
The ``PyUnicode_FromFormat()`` function supports most of them.
|
||
|
||
Proposed formats using ``h`` and ``hh`` length modifiers:
|
||
|
||
* ``%hhT`` formats ``type.__name__``.
|
||
* ``%hT`` formats ``type.__qualname__``.
|
||
* ``%T`` formats ``type.__fully_qualified_name__``.
|
||
|
||
Length modifiers are used to specify the C type of the argument, not to
|
||
change how an argument is formatted. The alternate form (``#``) changes
|
||
how an argument is formatted. Here the argument C type is always
|
||
``PyObject*``.
|
||
|
||
Other proposed formats:
|
||
|
||
* ``%Q``
|
||
* ``%t``.
|
||
* ``%lT`` formats ``type.__fully_qualified_name__``.
|
||
* ``%Tn`` formats ``type.__name__``.
|
||
* ``%Tq`` formats ``type.__qualname__``.
|
||
* ``%Tf`` formats ``type.__fully_qualified_name__``.
|
||
|
||
Having more options to format type names can lead to inconsistencies
|
||
between different modules and make the API more error prone.
|
||
|
||
About the ``%t`` format, ``printf()`` now uses ``t`` as a length
|
||
modifier for ``ptrdiff_t`` argument.
|
||
|
||
The following APIs to be used to format a type:
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - C API
|
||
- Python API
|
||
- Format
|
||
* - ``PyType_GetName()``
|
||
- ``type.__name__``
|
||
- Type **short** name.
|
||
* - ``PyType_GetQualName()``
|
||
- ``type.__qualname__``
|
||
- Type **qualified** name.
|
||
* - ``PyType_GetModuleName()``
|
||
- ``type.__module__``
|
||
- Type **module** name.
|
||
|
||
|
||
Use %T format with Py_TYPE(): pass a type
|
||
-----------------------------------------
|
||
|
||
It was proposed to pass a type to the ``%T`` format, like:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));
|
||
|
||
The ``Py_TYPE()`` functions returns a borrowed reference. Just to format
|
||
an error, using a borrowed reference to a type looks safe. In practice,
|
||
it can lead to crash. Example::
|
||
|
||
import gc
|
||
import my_cext
|
||
|
||
class ClassA:
|
||
pass
|
||
|
||
def create_object():
|
||
class ClassB:
|
||
def __repr__(self):
|
||
self.__class__ = ClassA
|
||
gc.collect()
|
||
return "ClassB repr"
|
||
return ClassB()
|
||
|
||
obj = create_object()
|
||
my_cext.func(obj)
|
||
|
||
where ``my_cext.func()`` is a C function which calls::
|
||
|
||
PyErr_Format(PyExc_ValueError,
|
||
"Unexpected value %R of type %T",
|
||
obj, Py_TYPE(obj));
|
||
|
||
``PyErr_Format()`` is called with a borrowed reference to ``ClassB``.
|
||
When ``repr(obj)`` is called by the ``%R`` format, the last reference to
|
||
``ClassB`` is removed and the class is deallocated. When the ``%T``
|
||
format is proceed, ``Py_TYPE(obj)`` is already a dangling pointer and
|
||
Python does crash.
|
||
|
||
|
||
Other proposed APIs to get a type fully qualified name
|
||
------------------------------------------------------
|
||
|
||
* Add ``type.__fullyqualname__`` attribute: name without underscore
|
||
between words. Several dunders, including some of the most recently
|
||
added ones, include an underscore in the word:
|
||
``__class_getitem__``, ``__release_buffer__``, ``__type_params__``,
|
||
``__init_subclass__`` and ``__text_signature__``.
|
||
* Add ``type.__fqn__`` attribute: FQN name stands for **F**\ ully
|
||
**Q**\ ualified **N**\ ame.
|
||
* Add ``type.fully_qualified_name()`` method. Methods added to ``type``
|
||
are inherited by all types and so can affect existing code.
|
||
* Add a function to the ``inspect`` module. Need to import the
|
||
``inspect`` module to use it.
|
||
|
||
|
||
Include the __main__ module in the type fully qualified name
|
||
------------------------------------------------------------
|
||
|
||
Format ``type.__fully_qualified_name__`` as
|
||
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
|
||
``type.__module__`` is not a string or is equal to ``"builtins"``. Do
|
||
not treat the ``__main__`` module differently: include it in the name.
|
||
|
||
Existing code such as ``type.__repr__()``, ``collections.abc`` and
|
||
``unittest`` modules format a type name with
|
||
``f'{obj.__module__}.{obj.__qualname__}'`` and only omit the module part
|
||
if the module is equal to ``builtins``.
|
||
|
||
Only the ``traceback`` and ``pdb`` modules also omit the module if it's
|
||
equal to ``"builtins"`` or ``"__main__"``.
|
||
|
||
The ``type.__fully_qualified_name__`` attribute omits the ``__main__``
|
||
module to produce shorter names for a common case: types defined in a
|
||
script run with ``python script.py``. For debugging, the ``repr()``
|
||
function can be used on a type, it includes the ``__main__`` module in
|
||
the type name. Or use ``f"{type.__module__}.{type.__qualname__}"``
|
||
format to always include the module name, even for the ``"builtins"``
|
||
module.
|
||
|
||
Example of script::
|
||
|
||
class MyType:
|
||
pass
|
||
|
||
print(f"name: {MyType.__fully_qualified_name__}")
|
||
print(f"repr: {repr(MyType)}")
|
||
|
||
Output::
|
||
|
||
name: MyType
|
||
repr: <class '__main__.MyType'>
|
||
|
||
|
||
Discussions
|
||
===========
|
||
|
||
* Discourse: `PEP 737 – Unify type name formatting
|
||
<https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`_
|
||
(2023).
|
||
* Discourse: `Enhance type name formatting when raising an exception:
|
||
add %T format in C, and add type.__fullyqualname__
|
||
<https://discuss.python.org/t/enhance-type-name-formatting-when-raising-an-exception-add-t-format-in-c-and-add-type-fullyqualname/38129>`_
|
||
(2023).
|
||
* Issue: `PyUnicode_FromFormat(): Add %T format to format the type name
|
||
of an object <https://github.com/python/cpython/issues/111696>`_
|
||
(2023).
|
||
* Issue: `C API: Investigate how the PyTypeObject members can be removed
|
||
from the public C API
|
||
<https://github.com/python/cpython/issues/105970>`_ (2023).
|
||
* python-dev thread: `bpo-34595: How to format a type name?
|
||
<https://mail.python.org/archives/list/python-dev@python.org/thread/HKYUMTVHNBVB5LJNRMZ7TPUQKGKAERCJ/>`_
|
||
(2018).
|
||
* Issue: `PyUnicode_FromFormat(): add %T format for an object type name
|
||
<https://github.com/python/cpython/issues/78776>`_ (2018).
|
||
* Issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary limit of
|
||
500 bytes is outdated
|
||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|