553 lines
19 KiB
ReStructuredText
553 lines
19 KiB
ReStructuredText
PEP: 737
|
||
Title: Unify type name formatting
|
||
Author: Victor Stinner <vstinner@python.org>
|
||
Discussions-To: https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 29-Nov-2023
|
||
Python-Version: 3.13
|
||
Post-History: `29-Nov-2023 <https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`__
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
Add new convenient APIs to format type names the same way in Python and
|
||
in C. No longer format type names differently depending on how types are
|
||
implemented. No longer truncate type names in the standard library.
|
||
|
||
Recommend using the type fully qualified name in error messages and in
|
||
``__repr__()`` methods in new code.
|
||
|
||
Make C code safer by avoiding borrowed reference which can lead to
|
||
crashes. The new C API is compatible with the limited C API.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Standard library
|
||
----------------
|
||
|
||
In the Python standard library, formatting a type name or the type name
|
||
of an object is a common operation to format an error message and to
|
||
implement a ``__repr__()`` method. There are different ways to format a
|
||
type name which give different outputs.
|
||
|
||
Example with the ``datetime.timedelta`` type:
|
||
|
||
* The type short name (``type.__name__``) and the type qualified name
|
||
(``type.__qualname__``) are ``'timedelta'``.
|
||
* The type module (``type.__module__``) is ``'datetime'``.
|
||
* The type fully qualified name is ``'datetime.timedelta'``.
|
||
* The type representation (``repr(type)``) contains the fully qualified
|
||
name: ``<class 'datetime.timedelta'>``.
|
||
|
||
|
||
Python code
|
||
^^^^^^^^^^^
|
||
|
||
In Python, ``type.__name__`` gets the type short name, whereas
|
||
``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully
|
||
qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to
|
||
get the type of the object *obj*. Sometimes, the type name is put
|
||
between quotes.
|
||
|
||
Examples:
|
||
|
||
* ``raise TypeError("str expected, not %s" % type(value).__name__)``
|
||
* ``raise TypeError("can't serialize %s" % self.__class__.__name__)``
|
||
* ``name = "%s.%s" % (obj.__module__, obj.__qualname__)``
|
||
|
||
Qualified names were added to types (``type.__qualname__``) in Python
|
||
3.3 by :pep:`3155` "Qualified name for classes and functions".
|
||
|
||
C code
|
||
^^^^^^
|
||
|
||
In C, the most common way to format a type name is to get the
|
||
``PyTypeObject.tp_name`` member of the type. Example:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
|
||
Py_TYPE(globals)->tp_name);
|
||
|
||
The type "fully qualified name" is used in a few places:
|
||
``PyErr_Display()``, ``type.__repr__()`` implementation, and
|
||
``sys.unraisablehook`` implementation.
|
||
|
||
Using ``Py_TYPE(obj)->tp_name`` is preferred since it is more convenient
|
||
than calling ``PyType_GetQualName()`` which requires ``Py_DECREF()``.
|
||
Moreover, ``PyType_GetQualName()`` was only added recently, in Python
|
||
3.11.
|
||
|
||
Some functions use ``%R`` (``repr(type)``) to format a type name, the
|
||
output contains the type fully qualified name. Example:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"calling %R should have returned an instance "
|
||
"of BaseException, not %R",
|
||
type, Py_TYPE(value));
|
||
|
||
|
||
Using PyTypeObject.tp_name is inconsistent with Python
|
||
------------------------------------------------------
|
||
|
||
The ``PyTypeObject.tp_name`` member is different depending on the type
|
||
implementation:
|
||
|
||
* Static types and heap types in C: *tp_name* is the type fully
|
||
qualified name.
|
||
* Python class: *tp_name* is the type short name (``type.__name__``).
|
||
|
||
So using ``Py_TYPE(obj)->tp_name`` to format an object type name gives
|
||
a different output depending if a type is implemented in C or in Python.
|
||
|
||
It goes against :pep:`399` "Pure Python/C Accelerator Module
|
||
Compatibility Requirements" principles which recommends code behaves
|
||
the same way if written in Python or in C.
|
||
|
||
Example:
|
||
|
||
.. code-block:: pycon
|
||
|
||
$ python3.12
|
||
>>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
|
||
>>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
|
||
>>> my_list = list(range(3))
|
||
|
||
>>> my_list[c_obj] # C type
|
||
TypeError: list indices must be integers or slices, not datetime.date
|
||
|
||
>>> my_list[py_obj] # Python type
|
||
TypeError: list indices must be integers or slices, not date
|
||
|
||
The error message contains the type fully qualified name
|
||
(``datetime.date``) if the type is implemented in C, or the type short
|
||
name (``date``) if the type is implemented in Python.
|
||
|
||
|
||
Limited C API
|
||
-------------
|
||
|
||
The ``Py_TYPE(obj)->tp_name`` code cannot be used with the limited C
|
||
API, since the ``PyTypeObject`` members are excluded from the limited C
|
||
API.
|
||
|
||
The type name should be read using ``PyType_GetName()``,
|
||
``PyType_GetQualName()`` and ``PyType_GetModule()`` functions which are
|
||
less convenient to use.
|
||
|
||
|
||
Truncating type names in C
|
||
--------------------------
|
||
|
||
In 1998, when the ``PyErr_Format()`` function was added, the
|
||
implementation used a fixed buffer of 500 bytes. The function had the
|
||
following comment:
|
||
|
||
.. code-block:: c
|
||
|
||
/* Caller is responsible for limiting the format */
|
||
|
||
In 2001, the function was modified to allocate a dynamic buffer on the
|
||
heap. Too late, the practice of truncating type names, like using the
|
||
``%.100s`` format, already became a habit, and developers forgot why
|
||
type names are truncated. In Python, type names are not truncated.
|
||
|
||
Truncating type names in C but not in Python goes against :pep:`399`
|
||
"Pure Python/C Accelerator Module Compatibility Requirements" principles
|
||
which recommends code behaves the same way if written in Python or in
|
||
C.
|
||
|
||
See the issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary
|
||
limit of 500 bytes is outdated
|
||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
* Add ``type.__fully_qualified_name__`` attribute.
|
||
* Add ``%T``, ``%#T``, ``%N``, ``%#N`` formats to
|
||
``PyUnicode_FromFormat()``.
|
||
* Add ``PyType_GetFullyQualifiedName()`` function.
|
||
* Recommend using the type fully qualified name in error messages and
|
||
in ``__repr__()`` methods in new code.
|
||
* Recommend not truncating type names.
|
||
|
||
|
||
Python API
|
||
----------
|
||
|
||
Add ``type.__fully_qualified_name__`` read-only attribute, the fully
|
||
qualified name of a type: similar to
|
||
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
|
||
``type.__module__`` is not a string or is equal to ``"builtins"`` or is
|
||
equal to ``"__main__"``.
|
||
|
||
The ``type.__repr__()`` is left unchanged, it only omits the module if
|
||
the module is equal to ``"builtins"``. It includes the module if the
|
||
module is equal to ``"__main__"``. Pseudo-code::
|
||
|
||
def type_repr(cls):
|
||
if isinstance(cls.__module__, str) and cls.__module__ != "builtins":
|
||
name = f"{cls.__module__}.{cls.__qualname__}"
|
||
else:
|
||
name = cls.__qualname__
|
||
return f"<class '{name}'>"
|
||
|
||
|
||
Add PyUnicode_FromFormat() formats
|
||
----------------------------------
|
||
|
||
Add formats to ``PyUnicode_FromFormat()``:
|
||
|
||
* ``%T`` formats the type fully qualified name of an **object**:
|
||
similar to ``type(obj).__fully_qualified_name__``.
|
||
* ``%#T`` formats the type short name of an **object**:
|
||
similar to ``type(obj).__name__``.
|
||
* ``%N`` formats the fully qualified name of a **type**:
|
||
similar to ``type.__fully_qualified_name__``.
|
||
* ``%#N`` formats the short name of an object of a **type**:
|
||
similar to ``type.__name__``.
|
||
|
||
The hash character (``#``) in the format string stands for
|
||
`alternative format
|
||
<https://docs.python.org/3/library/string.html#format-specification-mini-language>`_.
|
||
For example, ``f"{123:x}"`` returns ``'7b'`` and ``f"{123:#x}"`` returns
|
||
``'0x7b'`` (``#`` adds ``'0x'`` prefix).
|
||
|
||
The ``%T`` format is used by ``time.strftime()``, but it's not used by
|
||
``printf()``.
|
||
|
||
For example, the existing code using *tp_name*:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"__format__ must return a str, not %.200s",
|
||
Py_TYPE(result)->tp_name);
|
||
|
||
can be replaced with the ``%T`` format:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError,
|
||
"__format__ must return a str, not %T", result);
|
||
|
||
Advantages of the updated code:
|
||
|
||
* Safer C code: avoid ``Py_TYPE()`` which returns a borrowed reference.
|
||
* The ``PyTypeObject.tp_name`` member is no longer read explicitly: the
|
||
code becomes compatible with the limited C API.
|
||
* The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded
|
||
from UTF-8 at each ``PyErr_Format()`` call, since
|
||
``type.__fully_qualified_name__`` is already a Unicode string.
|
||
* The type name is no longer truncated.
|
||
|
||
|
||
Add PyType_GetFullyQualifiedName() function
|
||
-------------------------------------------
|
||
|
||
Add the ``PyType_GetFullyQualifiedName()`` function to get the fully
|
||
qualified name of a type (``type.__fully_qualified_name__``). API:
|
||
|
||
.. code-block:: c
|
||
|
||
PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)
|
||
|
||
On success, return a new reference to the string. On error, raise an
|
||
exception and return ``NULL``.
|
||
|
||
|
||
Recommend using the type fully qualified name
|
||
---------------------------------------------
|
||
|
||
The type fully qualified name is recommended in error messages and in
|
||
``__repr__()`` methods in new code.
|
||
|
||
In non-trivial applications, it is likely to have two types with the
|
||
same short name defined in two different modules, especially with
|
||
generic names. Using the fully qualified name helps identifying the type
|
||
in an unambiguous way.
|
||
|
||
|
||
Recommend not truncating type names
|
||
-----------------------------------
|
||
|
||
Type names must not be truncated. For example, the ``%.100s`` format
|
||
should be avoided: use the ``%s`` format instead (or ``%T`` and ``%#T``
|
||
formats in C).
|
||
|
||
Code in the standard library is updated to no longer truncate type
|
||
names.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
* Pull request: `Add type.__fully_qualified_name__ attribute <https://github.com/python/cpython/pull/112133>`_.
|
||
* Pull request: `Add %T format to PyUnicode_FromFormat() <https://github.com/python/cpython/pull/111703>`_.
|
||
|
||
|
||
Backwards Compatibility
|
||
=======================
|
||
|
||
Changes proposed in this PEP are backward compatible.
|
||
|
||
Adding new APIs has no effect on the backward compatibility. Existing
|
||
APIs are left unchanged.
|
||
|
||
Replacing the type short name with the type fully qualified name is only
|
||
recommended in new code. Existing code should be left
|
||
unchanged and so remains backward compatible.
|
||
|
||
In the standard library, type names are no longer truncated. We believe
|
||
that no code should be affected in practice, since type names longer
|
||
than 100 characters are rare.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Change str(type)
|
||
----------------
|
||
|
||
The ``type.__str__()`` method can be modified to format a type name
|
||
differently. For example, it can return the type fully qualified name.
|
||
|
||
The problem is that it's a backward incompatible change. For example,
|
||
``enum``, ``functools``, ``optparse``, ``pdb`` and ``xmlrpc.server``
|
||
modules of the standard library have to be updated.
|
||
``test_dataclasses``, ``test_descrtut`` and ``test_cmd_line_script``
|
||
tests have to be updated as well.
|
||
|
||
See the `pull request: type(str) returns the fully qualified name
|
||
<https://github.com/python/cpython/pull/112129>`_.
|
||
|
||
|
||
Add formats to type.__format__()
|
||
--------------------------------
|
||
|
||
Examples of proposed formats for ``type.__format__()``:
|
||
|
||
* ``f"{type(obj):z}"`` formats ``type(obj).__name__``.
|
||
* ``f"{type(obj):M.T}"`` formats ``type(obj).__fully_qualified_name__``.
|
||
* ``f"{type(obj):M:T}"`` formats ``type(obj).__fully_qualified_name__``
|
||
using colon (``:``) separator.
|
||
* ``f"{type(obj):T}"`` formats ``type(obj).__name__``.
|
||
* ``f"{type(obj):#T}"`` formats ``type(obj).__fully_qualified_name__``.
|
||
|
||
Using short format (such as ``z``, a single letter) requires to refer to
|
||
format documentation to understand how a type name is formatted, whereas
|
||
``type(obj).__name__`` is explicit.
|
||
|
||
The dot character (``.``) is already used for the "precision" in format
|
||
strings. The colon character (``:``) is already used to separated the
|
||
expression from the format specification. For example, ``f"{3.14:g}"``
|
||
uses ``g`` format which comes after the colon (``:``). Usually, a format
|
||
type is a single letter, such as ``g`` in ``f"{3.14:g}"``, not ``M.T``
|
||
or ``M:T``. Reusing dot and colon characters for a different purpose can
|
||
be misleading and make the format parser more complicated.
|
||
|
||
Add !t formatter to get an object type
|
||
--------------------------------------
|
||
|
||
Use ``f"{obj!t:T}"`` to format ``type(obj).__name__``, similar to
|
||
``f"{type(obj).__name__}"``.
|
||
|
||
When the ``!t`` formatter was proposed in 2018, `Eric Smith was opposed
|
||
to this
|
||
<https://mail.python.org/archives/list/python-dev@python.org/message/BMIW3FEB77OS7OB3YYUUDUBITPWLRG3U/>`_;
|
||
Eric is the author of the f-string :pep:`498` "Literal String Interpolation".
|
||
|
||
|
||
Add formats to str % args
|
||
-------------------------
|
||
|
||
It was proposed to add formats to format a type name in ``str % arg``.
|
||
For example, ``%T`` and ``%#T`` formats.
|
||
|
||
Nowadays, f-strings are preferred for new code.
|
||
|
||
|
||
Use colon separator in fully qualified name
|
||
-------------------------------------------
|
||
|
||
The colon (``:``) separator eliminates guesswork when you want to import
|
||
the name, see ``pkgutil.resolve_name()``. A type fully qualified name
|
||
can be formatted as ``f"{type.__module__}:{type.__qualname__}"``, or
|
||
``type.__qualname__`` if the type module is ``"builtins"``.
|
||
|
||
In the standard library, no code formats a type fully qualified name
|
||
this way.
|
||
|
||
|
||
Other ways to format type names in C
|
||
------------------------------------
|
||
|
||
The ``printf()`` function supports multiple size modifiers: ``hh``
|
||
(``char``), ``h`` (``short``), ``l`` (``long``), ``ll`` (``long long``),
|
||
``z`` (``size_t``), ``t`` (``ptrdiff_t``) and ``j`` (``intmax_t``).
|
||
The ``PyUnicode_FromFormat()`` function supports most of them.
|
||
|
||
Proposed formats using ``h`` and ``hh`` length modifiers:
|
||
|
||
* ``%hhT`` formats ``type.__name__``.
|
||
* ``%hT`` formats ``type.__qualname__``.
|
||
* ``%T`` formats ``type.__fully_qualified_name__``.
|
||
|
||
Length modifiers are used to specify the C type of the argument, not to
|
||
change how an argument is formatted. The alternate form (``#``) changes
|
||
how an argument is formatted. Here the argument C type is always
|
||
``PyObject*``.
|
||
|
||
Other proposed formats:
|
||
|
||
* ``%Q``
|
||
* ``%t``.
|
||
* ``%lT`` formats ``type.__fully_qualified_name__``.
|
||
* ``%Tn`` formats ``type.__name__``.
|
||
* ``%Tq`` formats ``type.__qualname__``.
|
||
* ``%Tf`` formats ``type.__fully_qualified_name__``.
|
||
|
||
Having more options to format type names can lead to inconsistencies
|
||
between different modules and make the API more error prone.
|
||
|
||
About the ``%t`` format, ``printf()`` now uses ``t`` as a length
|
||
modifier for ``ptrdiff_t`` argument.
|
||
|
||
``type.__qualname__`` can be used in Python and ``PyType_GetQualName()``
|
||
can be used in C to format a type qualified name.
|
||
|
||
|
||
Use %T format with Py_TYPE(): pass a type
|
||
-----------------------------------------
|
||
|
||
It was proposed to pass a type to the ``%T`` format, like:
|
||
|
||
.. code-block:: c
|
||
|
||
PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));
|
||
|
||
The ``Py_TYPE()`` functions returns a borrowed reference. Just to format
|
||
an error, using a borrowed reference to a type looks safe. In practice,
|
||
it can lead to crash. Example::
|
||
|
||
import gc
|
||
import my_cext
|
||
|
||
class ClassA:
|
||
pass
|
||
|
||
def create_object():
|
||
class ClassB:
|
||
def __repr__(self):
|
||
self.__class__ = ClassA
|
||
gc.collect()
|
||
return "ClassB repr"
|
||
return ClassB()
|
||
|
||
obj = create_object()
|
||
my_cext.func(obj)
|
||
|
||
where ``my_cext.func()`` is a C function which calls::
|
||
|
||
PyErr_Format(PyExc_ValueError,
|
||
"Unexpected value %R of type %T",
|
||
obj, Py_TYPE(obj));
|
||
|
||
``PyErr_Format()`` is called with a borrowed reference to ``ClassB``.
|
||
When ``repr(obj)`` is called by the ``%R`` format, the last reference to
|
||
``ClassB`` is removed and the class is deallocated. When the ``%T``
|
||
format is proceed, ``Py_TYPE(obj)`` is already a dangling pointer and
|
||
Python does crash.
|
||
|
||
|
||
Other proposed APIs to get a type fully qualified name
|
||
------------------------------------------------------
|
||
|
||
* ``type.__fullyqualname__`` attribute name: attribute without an underscore
|
||
between words. Several dunders, including some of the most recently
|
||
added ones, include an underscore in the word: ``__class_getitem__``,
|
||
``__release_buffer__``, ``__type_params__``, ``__init_subclass__`` and
|
||
``__text_signature__``.
|
||
* ``type.__fqn__`` attribute name, where FQN stands for Fully Qualified
|
||
Name.
|
||
* Add a function to the ``inspect`` module. Need to import the
|
||
``inspect`` module to use it.
|
||
|
||
|
||
Include the __main__ module in the type fully qualified name
|
||
------------------------------------------------------------
|
||
|
||
Format ``type.__fully_qualified_name__`` as
|
||
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
|
||
``type.__module__`` is not a string or is equal to ``"builtins"``. Do
|
||
not treat the ``__main__`` module differently: include it in the name.
|
||
|
||
Existing code such as ``type.__repr__()``, ``collections.abc`` and
|
||
``unittest`` modules format a type name with
|
||
``f'{obj.__module__}.{obj.__qualname__}'`` and only omit the module part
|
||
if the module is equal to ``builtins``. Only the ``traceback`` and
|
||
``pdb`` modules also the module if it's equal to ``"builtins"`` or
|
||
``"__main__"``.
|
||
|
||
The ``type.__fully_qualified_name__`` attribute omits the ``__main__``
|
||
module to produce shorter names for a common case: types defined in a
|
||
script run with ``python script.py``. For debugging, the ``repr()``
|
||
function can be used on a type, it includes the ``__main__`` module in
|
||
the type name. Or use ``f"{type.__module__}.{type.__qualname__}"``
|
||
format to always include the module name, even for the ``"builtins"``
|
||
module.
|
||
|
||
Example of script::
|
||
|
||
class MyType:
|
||
pass
|
||
|
||
print(f"name: {MyType.__fully_qualified_name__}")
|
||
print(f"repr: {repr(MyType)}")
|
||
|
||
Output::
|
||
|
||
name: MyType
|
||
repr: <class '__main__.MyType'>
|
||
|
||
|
||
Discussions
|
||
===========
|
||
|
||
* Discourse: `PEP 737 – Unify type name formatting
|
||
<https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872>`_
|
||
(2023).
|
||
* Discourse: `Enhance type name formatting when raising an exception:
|
||
add %T format in C, and add type.__fullyqualname__
|
||
<https://discuss.python.org/t/enhance-type-name-formatting-when-raising-an-exception-add-t-format-in-c-and-add-type-fullyqualname/38129>`_
|
||
(2023).
|
||
* Issue: `PyUnicode_FromFormat(): Add %T format to format the type name
|
||
of an object <https://github.com/python/cpython/issues/111696>`_
|
||
(2023).
|
||
* Issue: `C API: Investigate how the PyTypeObject members can be removed
|
||
from the public C API
|
||
<https://github.com/python/cpython/issues/105970>`_ (2023).
|
||
* python-dev thread: `bpo-34595: How to format a type name?
|
||
<https://mail.python.org/archives/list/python-dev@python.org/thread/HKYUMTVHNBVB5LJNRMZ7TPUQKGKAERCJ/>`_
|
||
(2018).
|
||
* Issue: `PyUnicode_FromFormat(): add %T format for an object type name
|
||
<https://github.com/python/cpython/issues/78776>`_ (2018).
|
||
* Issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary limit of
|
||
500 bytes is outdated
|
||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|