From 99825461f8662458622f846f5d3b16727a0b2f6d Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Tue, 5 Dec 2023 12:15:09 +0100 Subject: [PATCH] PEP 737: Add %N format, recommend fully qualified name (#3560) * Add %N and %#N formats. * The %T and %#T formats now expect an object instead of a type. * Exchange %T and %#T formats: %T now formats the fully qualified name. * Recommend using the type fully qualified name in error messages and in __repr__() methods in new code. * Skip the __main__ module in the fully qualified name. --- peps/pep-0737.rst | 206 +++++++++++++++++++++++++++++++--------------- 1 file changed, 140 insertions(+), 66 deletions(-) diff --git a/peps/pep-0737.rst b/peps/pep-0737.rst index d3c839ee9..005022c11 100644 --- a/peps/pep-0737.rst +++ b/peps/pep-0737.rst @@ -14,8 +14,14 @@ Abstract Add new convenient APIs to format type names the same way in Python and in C. No longer format type names differently depending on how types are -implemented. Also, put an end to truncating type names in C. The new C -API is compatible with the limited C API. +implemented. No longer truncate type names in the standard library. + +Recommend using the type fully qualified name in error messages and in +``__repr__()`` methods in new code. + +Make C code safer by avoiding borrowed reference which can lead to +crashes. The new C API is compatible with the limited C API. + Rationale ========= @@ -41,7 +47,7 @@ Example with the ``datetime.timedelta`` type: Python code ^^^^^^^^^^^ -In Python, ``type.__name__`` gets the type "short name", whereas +In Python, ``type.__name__`` gets the type short name, whereas ``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to get the type of the object *obj*. Sometimes, the type name is put @@ -67,11 +73,14 @@ In C, the most common way to format a type name is to get the PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s", Py_TYPE(globals)->tp_name); -The type qualified name (``type.__qualname__``) is only used at a single -place, by the ``type.__repr__()`` implementation. Using -``Py_TYPE(obj)->tp_name`` is more convenient than calling -``PyType_GetQualName()`` which requires ``Py_DECREF()``. Moreover, -``PyType_GetQualName()`` was only added recently, in Python 3.11. +The type "fully qualified name" is used in a few places: +``PyErr_Display()``, ``type.__repr__()`` implementation, and +``sys.unraisablehook`` implementation. + +Using ``Py_TYPE(obj)->tp_name`` is preferred since it is more convenient +than calling ``PyType_GetQualName()`` which requires ``Py_DECREF()``. +Moreover, ``PyType_GetQualName()`` was only added recently, in Python +3.11. Some functions use ``%R`` (``repr(type)``) to format a type name, the output contains the type fully qualified name. Example: @@ -163,29 +172,48 @@ Specification ============= * Add ``type.__fully_qualified_name__`` attribute. -* Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()``. +* Add ``%T``, ``%#T``, ``%N``, ``%#N`` formats to + ``PyUnicode_FromFormat()``. * Add ``PyType_GetFullyQualifiedName()`` function. +* Recommend using the type fully qualified name in error messages and + in ``__repr__()`` methods in new code. * Recommend not truncating type names. + Python API ---------- Add ``type.__fully_qualified_name__`` read-only attribute, the fully qualified name of a type: similar to -``f"{type.__module__}.{type.__qualname__}"`` or ``type.__qualname__`` if -``type.__module__`` is not a string or is equal to ``"builtins"``. +``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if +``type.__module__`` is not a string or is equal to ``"builtins"`` or is +equal to ``"__main__"``. + +The ``type.__repr__()`` is left unchanged, it only omits the module if +the module is equal to ``"builtins"``. It includes the module if the +module is equal to ``"__main__"``. Pseudo-code:: + + def type_repr(cls): + if isinstance(cls.__module__, str) and cls.__module__ != "builtins": + name = f"{cls.__module__}.{cls.__qualname__}" + else: + name = cls.__qualname__ + return f"" + Add PyUnicode_FromFormat() formats ---------------------------------- -Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()`` to format -a type name: +Add formats to ``PyUnicode_FromFormat()``: -* ``%T`` formats the type "short name" (``type.__name__``). -* ``%#T`` formats the type "fully qualified name" - (``type.__fully_qualified_name__``). - -Both formats expect a type as argument. +* ``%T`` formats the type fully qualified name of an **object**: + similar to ``type(obj).__fully_qualified_name__``. +* ``%#T`` formats the type short name of an **object**: + similar to ``type(obj).__name__``. +* ``%N`` formats the fully qualified name of a **type**: + similar to ``type.__fully_qualified_name__``. +* ``%#N`` formats the short name of an object of a **type**: + similar to ``type.__name__``. The hash character (``#``) in the format string stands for `alternative format @@ -209,11 +237,11 @@ can be replaced with the ``%T`` format: .. code-block:: c PyErr_Format(PyExc_TypeError, - "__format__ must return a str, not %T", - Py_TYPE(result)); + "__format__ must return a str, not %T", result); Advantages of the updated code: +* Safer C code: avoid ``Py_TYPE()`` which returns a borrowed reference. * The ``PyTypeObject.tp_name`` member is no longer read explicitly: the code becomes compatible with the limited C API. * The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded @@ -221,6 +249,7 @@ Advantages of the updated code: ``type.__fully_qualified_name__`` is already a Unicode string. * The type name is no longer truncated. + Add PyType_GetFullyQualifiedName() function ------------------------------------------- @@ -235,6 +264,18 @@ On success, return a new reference to the string. On error, raise an exception and return ``NULL``. +Recommend using the type fully qualified name +--------------------------------------------- + +The type fully qualified name is recommended in error messages and in +``__repr__()`` methods in new code. + +In non-trivial applications, it is likely to have two types with the +same short name defined in two different modules, especially with +generic names. Using the fully qualified name helps identifying the type +in an unambiguous way. + + Recommend not truncating type names ----------------------------------- @@ -242,6 +283,9 @@ Type names must not be truncated. For example, the ``%.100s`` format should be avoided: use the ``%s`` format instead (or ``%T`` and ``%#T`` formats in C). +Code in the standard library is updated to no longer truncate type +names. + Implementation ============== @@ -253,8 +297,18 @@ Implementation Backwards Compatibility ======================= -Only new APIs are added. No existing API is modified. Changes are fully -backward compatible. +Changes proposed in this PEP are backward compatible. + +Adding new APIs has no effect on the backward compatibility. Existing +APIs are left unchanged. + +Replacing the type short name with the type fully qualified name is only +recommended in new code. Existing code should be left +unchanged and so remains backward compatible. + +In the standard library, type names are no longer truncated. We believe +that no code should be affected in practice, since type names longer +than 100 characters are rare. Rejected Ideas @@ -332,13 +386,6 @@ can be formatted as ``f"{type.__module__}:{type.__qualname__}"``, or In the standard library, no code formats a type fully qualified name this way. -It is already tricky to get a type from its qualified name. The type -qualified name already uses the dot (``.``) separator between different -parts: class name, ````, nested class name, etc. - -The colon separator is not consistent with dot separator used in a -module fully qualified name (``module.__name__``). - Other ways to format type names in C ------------------------------------ @@ -378,35 +425,47 @@ modifier for ``ptrdiff_t`` argument. can be used in C to format a type qualified name. -Omit Py_TYPE() with %T format: pass an object ------------------------------------------------ +Use %T format with Py_TYPE(): pass a type +----------------------------------------- -It was proposed to format a type name of an object, like: +It was proposed to pass a type to the ``%T`` format, like: .. code-block:: c - PyErr_Format(PyExc_TypeError, "type name: %T", obj); + PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj)); -The intent is to avoid ``Py_TYPE()`` which returns a borrowed reference -to the type. Using a borrowed reference can cause a bug or crash if the -type is finalized or deallocated while being used. +The ``Py_TYPE()`` functions returns a borrowed reference. Just to format +an error, using a borrowed reference to a type looks safe. In practice, +it can lead to crash. Example:: -In practice, it's unlikely that a type is finalized while the error -message is formatted. Instances of static types cannot have their type -deallocated: static types are never deallocated. Since Python 3.8, -instances of heap types hold a strong reference to their type (in -``PyObject.ob_type``) and it's safe to make the assumption that the code -holds a strong reference to the formatted object, so the object type -cannot be deallocated. + import gc + import my_cext -In short, it is safe to use a ``Py_TYPE(obj)`` borrowed reference while -formatting an error message. + class ClassA: + pass -If the ``%T`` format expects an instance, formatting a type cannot use -the ``%T`` format, whereas it's a common operation in stdlib C -extensions. The ``%T`` format would only cover half of cases (only -instances). If the ``%T`` format takes a type, all cases are covered -(types, and instances using ``Py_TYPE()``). + def create_object(): + class ClassB: + def __repr__(self): + self.__class__ = ClassA + gc.collect() + return "ClassB repr" + return ClassB() + + obj = create_object() + my_cext.func(obj) + +where ``my_cext.func()`` is a C function which calls:: + + PyErr_Format(PyExc_ValueError, + "Unexpected value %R of type %T", + obj, Py_TYPE(obj)); + +``PyErr_Format()`` is called with a borrowed reference to ``ClassB``. +When ``repr(obj)`` is called by the ``%R`` format, the last reference to +``ClassB`` is removed and the class is deallocated. When the ``%T`` +format is proceed, ``Py_TYPE(obj)`` is already a dangling pointer and +Python does crash. Other proposed APIs to get a type fully qualified name @@ -423,26 +482,41 @@ Other proposed APIs to get a type fully qualified name ``inspect`` module to use it. -Omit __main__ module in the type fully qualified name ------------------------------------------------------ +Include the __main__ module in the type fully qualified name +------------------------------------------------------------ -The ``pdb`` module formats a type fully qualified names in a similar way -as the proposed ``type.__fully_qualified_name__``, but it omits the module -if the module is equal to ``"__main__"``. +Format ``type.__fully_qualified_name__`` as +``f"{type.__module__}.{type.__qualname__}"``, or ``type.__qualname__`` if +``type.__module__`` is not a string or is equal to ``"builtins"``. Do +not treat the ``__main__`` module differently: include it in the name. -The ``unittest`` module and a lot of existing stdlib code format a type -fully qualified names the same way as the proposed -``type.__fully_qualified_name__``: only omits the module if the module -is equal to ``"builtins"``. +Existing code such as ``type.__repr__()``, ``collections.abc`` and +``unittest`` modules format a type name with +``f'{obj.__module__}.{obj.__qualname__}'`` and only omit the module part +if the module is equal to ``builtins``. Only the ``traceback`` and +``pdb`` modules also the module if it's equal to ``"builtins"`` or +``"__main__"``. -It's possible to omit the ``"__main__."`` prefix of the ``__main__`` -module with:: +The ``type.__fully_qualified_name__`` attribute omits the ``__main__`` +module to produce shorter names for a common case: types defined in a +script run with ``python script.py``. For debugging, the ``repr()`` +function can be used on a type, it includes the ``__main__`` module in +the type name. Or use ``f"{type.__module__}.{type.__qualname__}"`` +format to always include the module name, even for the ``"builtins"`` +module. - def format_type(cls): - if cls.__module__ != "__main"__: - return cls.__fully_qualified_name__ - else: - return cls.__qualname__ +Example of script:: + + class MyType: + pass + + print(f"name: {MyType.__fully_qualified_name__}") + print(f"repr: {repr(MyType)}") + +Output:: + + name: MyType + repr: Discussions