PEP 737: Add %N format, recommend fully qualified name (#3560)

* Add %N and %#N formats.
* The %T and %#T formats now expect an object instead of a type.
* Exchange %T and %#T formats: %T now formats the fully qualified
  name.
* Recommend using the type fully qualified name in error messages and
  in __repr__() methods in new code.
* Skip the __main__ module in the fully qualified name.
This commit is contained in:
Victor Stinner 2023-12-05 12:15:09 +01:00 committed by GitHub
parent 15dbd2632c
commit 99825461f8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 140 additions and 66 deletions

View File

@ -14,8 +14,14 @@ Abstract
Add new convenient APIs to format type names the same way in Python and
in C. No longer format type names differently depending on how types are
implemented. Also, put an end to truncating type names in C. The new C
API is compatible with the limited C API.
implemented. No longer truncate type names in the standard library.
Recommend using the type fully qualified name in error messages and in
``__repr__()`` methods in new code.
Make C code safer by avoiding borrowed reference which can lead to
crashes. The new C API is compatible with the limited C API.
Rationale
=========
@ -41,7 +47,7 @@ Example with the ``datetime.timedelta`` type:
Python code
^^^^^^^^^^^
In Python, ``type.__name__`` gets the type "short name", whereas
In Python, ``type.__name__`` gets the type short name, whereas
``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully
qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to
get the type of the object *obj*. Sometimes, the type name is put
@ -67,11 +73,14 @@ In C, the most common way to format a type name is to get the
PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
Py_TYPE(globals)->tp_name);
The type qualified name (``type.__qualname__``) is only used at a single
place, by the ``type.__repr__()`` implementation. Using
``Py_TYPE(obj)->tp_name`` is more convenient than calling
``PyType_GetQualName()`` which requires ``Py_DECREF()``. Moreover,
``PyType_GetQualName()`` was only added recently, in Python 3.11.
The type "fully qualified name" is used in a few places:
``PyErr_Display()``, ``type.__repr__()`` implementation, and
``sys.unraisablehook`` implementation.
Using ``Py_TYPE(obj)->tp_name`` is preferred since it is more convenient
than calling ``PyType_GetQualName()`` which requires ``Py_DECREF()``.
Moreover, ``PyType_GetQualName()`` was only added recently, in Python
3.11.
Some functions use ``%R`` (``repr(type)``) to format a type name, the
output contains the type fully qualified name. Example:
@ -163,29 +172,48 @@ Specification
=============
* Add ``type.__fully_qualified_name__`` attribute.
* Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()``.
* Add ``%T``, ``%#T``, ``%N``, ``%#N`` formats to
``PyUnicode_FromFormat()``.
* Add ``PyType_GetFullyQualifiedName()`` function.
* Recommend using the type fully qualified name in error messages and
in ``__repr__()`` methods in new code.
* Recommend not truncating type names.
Python API
----------
Add ``type.__fully_qualified_name__`` read-only attribute, the fully
qualified name of a type: similar to
``f"{type.__module__}.{type.__qualname__}"`` or ``type.__qualname__`` if
``type.__module__`` is not a string or is equal to ``"builtins"``.
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
``type.__module__`` is not a string or is equal to ``"builtins"`` or is
equal to ``"__main__"``.
The ``type.__repr__()`` is left unchanged, it only omits the module if
the module is equal to ``"builtins"``. It includes the module if the
module is equal to ``"__main__"``. Pseudo-code::
def type_repr(cls):
if isinstance(cls.__module__, str) and cls.__module__ != "builtins":
name = f"{cls.__module__}.{cls.__qualname__}"
else:
name = cls.__qualname__
return f"<class '{name}'>"
Add PyUnicode_FromFormat() formats
----------------------------------
Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()`` to format
a type name:
Add formats to ``PyUnicode_FromFormat()``:
* ``%T`` formats the type "short name" (``type.__name__``).
* ``%#T`` formats the type "fully qualified name"
(``type.__fully_qualified_name__``).
Both formats expect a type as argument.
* ``%T`` formats the type fully qualified name of an **object**:
similar to ``type(obj).__fully_qualified_name__``.
* ``%#T`` formats the type short name of an **object**:
similar to ``type(obj).__name__``.
* ``%N`` formats the fully qualified name of a **type**:
similar to ``type.__fully_qualified_name__``.
* ``%#N`` formats the short name of an object of a **type**:
similar to ``type.__name__``.
The hash character (``#``) in the format string stands for
`alternative format
@ -209,11 +237,11 @@ can be replaced with the ``%T`` format:
.. code-block:: c
PyErr_Format(PyExc_TypeError,
"__format__ must return a str, not %T",
Py_TYPE(result));
"__format__ must return a str, not %T", result);
Advantages of the updated code:
* Safer C code: avoid ``Py_TYPE()`` which returns a borrowed reference.
* The ``PyTypeObject.tp_name`` member is no longer read explicitly: the
code becomes compatible with the limited C API.
* The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded
@ -221,6 +249,7 @@ Advantages of the updated code:
``type.__fully_qualified_name__`` is already a Unicode string.
* The type name is no longer truncated.
Add PyType_GetFullyQualifiedName() function
-------------------------------------------
@ -235,6 +264,18 @@ On success, return a new reference to the string. On error, raise an
exception and return ``NULL``.
Recommend using the type fully qualified name
---------------------------------------------
The type fully qualified name is recommended in error messages and in
``__repr__()`` methods in new code.
In non-trivial applications, it is likely to have two types with the
same short name defined in two different modules, especially with
generic names. Using the fully qualified name helps identifying the type
in an unambiguous way.
Recommend not truncating type names
-----------------------------------
@ -242,6 +283,9 @@ Type names must not be truncated. For example, the ``%.100s`` format
should be avoided: use the ``%s`` format instead (or ``%T`` and ``%#T``
formats in C).
Code in the standard library is updated to no longer truncate type
names.
Implementation
==============
@ -253,8 +297,18 @@ Implementation
Backwards Compatibility
=======================
Only new APIs are added. No existing API is modified. Changes are fully
backward compatible.
Changes proposed in this PEP are backward compatible.
Adding new APIs has no effect on the backward compatibility. Existing
APIs are left unchanged.
Replacing the type short name with the type fully qualified name is only
recommended in new code. Existing code should be left
unchanged and so remains backward compatible.
In the standard library, type names are no longer truncated. We believe
that no code should be affected in practice, since type names longer
than 100 characters are rare.
Rejected Ideas
@ -332,13 +386,6 @@ can be formatted as ``f"{type.__module__}:{type.__qualname__}"``, or
In the standard library, no code formats a type fully qualified name
this way.
It is already tricky to get a type from its qualified name. The type
qualified name already uses the dot (``.``) separator between different
parts: class name, ``<locals>``, nested class name, etc.
The colon separator is not consistent with dot separator used in a
module fully qualified name (``module.__name__``).
Other ways to format type names in C
------------------------------------
@ -378,35 +425,47 @@ modifier for ``ptrdiff_t`` argument.
can be used in C to format a type qualified name.
Omit Py_TYPE() with %T format: pass an object
-----------------------------------------------
Use %T format with Py_TYPE(): pass a type
-----------------------------------------
It was proposed to format a type name of an object, like:
It was proposed to pass a type to the ``%T`` format, like:
.. code-block:: c
PyErr_Format(PyExc_TypeError, "type name: %T", obj);
PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));
The intent is to avoid ``Py_TYPE()`` which returns a borrowed reference
to the type. Using a borrowed reference can cause a bug or crash if the
type is finalized or deallocated while being used.
The ``Py_TYPE()`` functions returns a borrowed reference. Just to format
an error, using a borrowed reference to a type looks safe. In practice,
it can lead to crash. Example::
In practice, it's unlikely that a type is finalized while the error
message is formatted. Instances of static types cannot have their type
deallocated: static types are never deallocated. Since Python 3.8,
instances of heap types hold a strong reference to their type (in
``PyObject.ob_type``) and it's safe to make the assumption that the code
holds a strong reference to the formatted object, so the object type
cannot be deallocated.
import gc
import my_cext
In short, it is safe to use a ``Py_TYPE(obj)`` borrowed reference while
formatting an error message.
class ClassA:
pass
If the ``%T`` format expects an instance, formatting a type cannot use
the ``%T`` format, whereas it's a common operation in stdlib C
extensions. The ``%T`` format would only cover half of cases (only
instances). If the ``%T`` format takes a type, all cases are covered
(types, and instances using ``Py_TYPE()``).
def create_object():
class ClassB:
def __repr__(self):
self.__class__ = ClassA
gc.collect()
return "ClassB repr"
return ClassB()
obj = create_object()
my_cext.func(obj)
where ``my_cext.func()`` is a C function which calls::
PyErr_Format(PyExc_ValueError,
"Unexpected value %R of type %T",
obj, Py_TYPE(obj));
``PyErr_Format()`` is called with a borrowed reference to ``ClassB``.
When ``repr(obj)`` is called by the ``%R`` format, the last reference to
``ClassB`` is removed and the class is deallocated. When the ``%T``
format is proceed, ``Py_TYPE(obj)`` is already a dangling pointer and
Python does crash.
Other proposed APIs to get a type fully qualified name
@ -423,26 +482,41 @@ Other proposed APIs to get a type fully qualified name
``inspect`` module to use it.
Omit __main__ module in the type fully qualified name
-----------------------------------------------------
Include the __main__ module in the type fully qualified name
------------------------------------------------------------
The ``pdb`` module formats a type fully qualified names in a similar way
as the proposed ``type.__fully_qualified_name__``, but it omits the module
if the module is equal to ``"__main__"``.
Format ``type.__fully_qualified_name__`` as
``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if
``type.__module__`` is not a string or is equal to ``"builtins"``. Do
not treat the ``__main__`` module differently: include it in the name.
The ``unittest`` module and a lot of existing stdlib code format a type
fully qualified names the same way as the proposed
``type.__fully_qualified_name__``: only omits the module if the module
is equal to ``"builtins"``.
Existing code such as ``type.__repr__()``, ``collections.abc`` and
``unittest`` modules format a type name with
``f'{obj.__module__}.{obj.__qualname__}'`` and only omit the module part
if the module is equal to ``builtins``. Only the ``traceback`` and
``pdb`` modules also the module if it's equal to ``"builtins"`` or
``"__main__"``.
It's possible to omit the ``"__main__."`` prefix of the ``__main__``
module with::
The ``type.__fully_qualified_name__`` attribute omits the ``__main__``
module to produce shorter names for a common case: types defined in a
script run with ``python script.py``. For debugging, the ``repr()``
function can be used on a type, it includes the ``__main__`` module in
the type name. Or use ``f"{type.__module__}.{type.__qualname__}"``
format to always include the module name, even for the ``"builtins"``
module.
def format_type(cls):
if cls.__module__ != "__main"__:
return cls.__fully_qualified_name__
else:
return cls.__qualname__
Example of script::
class MyType:
pass
print(f"name: {MyType.__fully_qualified_name__}")
print(f"repr: {repr(MyType)}")
Output::
name: MyType
repr: <class '__main__.MyType'>
Discussions