PEP 737: Unify type name formatting (#3554)
This commit is contained in:
parent
1a089b850e
commit
c094b50cc7
|
@ -613,6 +613,7 @@ peps/pep-0732.rst @Mariatta
|
|||
peps/pep-0733.rst @encukou @vstinner @zooba @iritkatriel
|
||||
peps/pep-0734.rst @ericsnowcurrently
|
||||
peps/pep-0735.rst @brettcannon
|
||||
peps/pep-0737.rst @vstinner
|
||||
# ...
|
||||
# peps/pep-0754.rst
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,473 @@
|
|||
PEP: 737
|
||||
Title: Unify type name formatting
|
||||
Author: Victor Stinner <vstinner@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 29-Nov-2023
|
||||
Python-Version: 3.13
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Add new convenient APIs to format type names the same way in Python and
|
||||
in C. No longer format type names differently depending on how types are
|
||||
implemented. Also, put an end to truncating type names in C. The new C
|
||||
API is compatible with the limited C API.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Standard library
|
||||
----------------
|
||||
|
||||
In the Python standard library, formatting a type name or the type name
|
||||
of an object is a common operation to format an error message and to
|
||||
implement a ``__repr__()`` method. There are different ways to format a
|
||||
type name which give different outputs.
|
||||
|
||||
Example with the ``datetime.timedelta`` type:
|
||||
|
||||
* The type short name (``type.__name__``) and the type qualified name
|
||||
(``type.__qualname__``) are ``'timedelta'``.
|
||||
* The type module (``type.__module__``) is ``'datetime'``.
|
||||
* The type fully qualified name is ``'datetime.timedelta'``.
|
||||
* The type representation (``repr(type)``) contains the fully qualified
|
||||
name: ``<class 'datetime.timedelta'>``.
|
||||
|
||||
|
||||
Python code
|
||||
^^^^^^^^^^^
|
||||
|
||||
In Python, ``type.__name__`` gets the type "short name", whereas
|
||||
``f"{type.__module__}.{type.__qualname__}"`` formats the type "fully
|
||||
qualified name". Usually, ``type(obj)`` or ``obj.__class__`` are used to
|
||||
get the type of the object *obj*. Sometimes, the type name is put
|
||||
between quotes.
|
||||
|
||||
Examples:
|
||||
|
||||
* ``raise TypeError("str expected, not %s" % type(value).__name__)``
|
||||
* ``raise TypeError("can't serialize %s" % self.__class__.__name__)``
|
||||
* ``name = "%s.%s" % (obj.__module__, obj.__qualname__)``
|
||||
|
||||
Qualified names were added to types (``type.__qualname__``) in Python
|
||||
3.3 by :pep:`3155` "Qualified name for classes and functions".
|
||||
|
||||
C code
|
||||
^^^^^^
|
||||
|
||||
In C, the most common way to format a type name is to get the
|
||||
``PyTypeObject.tp_name`` member of the type. Example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
|
||||
Py_TYPE(globals)->tp_name);
|
||||
|
||||
The type qualified name (``type.__qualname__``) is only used at a single
|
||||
place, by the ``type.__repr__()`` implementation. Using
|
||||
``Py_TYPE(obj)->tp_name`` is more convenient than calling
|
||||
``PyType_GetQualName()`` which requires ``Py_DECREF()``. Moreover,
|
||||
``PyType_GetQualName()`` was only added recently, in Python 3.11.
|
||||
|
||||
Some functions use ``%R`` (``repr(type)``) to format a type name, the
|
||||
output contains the type fully qualified name. Example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyErr_Format(PyExc_TypeError,
|
||||
"calling %R should have returned an instance "
|
||||
"of BaseException, not %R",
|
||||
type, Py_TYPE(value));
|
||||
|
||||
|
||||
Using PyTypeObject.tp_name is inconsistent with Python
|
||||
------------------------------------------------------
|
||||
|
||||
The ``PyTypeObject.tp_name`` member is different depending on the type
|
||||
implementation:
|
||||
|
||||
* Static types and heap types in C: *tp_name* is the type fully
|
||||
qualified name.
|
||||
* Python class: *tp_name* is the type short name (``type.__name__``).
|
||||
|
||||
So using ``Py_TYPE(obj)->tp_name`` to format an object type name gives
|
||||
a different output depending if a type is implemented in C or in Python.
|
||||
|
||||
It goes against :pep:`399` "Pure Python/C Accelerator Module
|
||||
Compatibility Requirements" principles which recommends code behaves
|
||||
the same way if written in Python or in C.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
$ python3.12
|
||||
>>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
|
||||
>>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
|
||||
>>> my_list = list(range(3))
|
||||
|
||||
>>> my_list[c_obj] # C type
|
||||
TypeError: list indices must be integers or slices, not datetime.date
|
||||
|
||||
>>> my_list[py_obj] # Python type
|
||||
TypeError: list indices must be integers or slices, not date
|
||||
|
||||
The error message contains the type fully qualified name
|
||||
(``datetime.date``) if the type is implemented in C, or the type short
|
||||
name (``date``) if the type is implemented in Python.
|
||||
|
||||
|
||||
Limited C API
|
||||
-------------
|
||||
|
||||
The ``Py_TYPE(obj)->tp_name`` code cannot be used with the limited C
|
||||
API, since the ``PyTypeObject`` members are excluded from the limited C
|
||||
API.
|
||||
|
||||
The type name should be read using ``PyType_GetName()``,
|
||||
``PyType_GetQualName()`` and ``PyType_GetModule()`` functions which are
|
||||
less convenient to use.
|
||||
|
||||
|
||||
Truncating type names in C
|
||||
--------------------------
|
||||
|
||||
In 1998, when the ``PyErr_Format()`` function was added, the
|
||||
implementation used a fixed buffer of 500 bytes. The function had the
|
||||
following comment:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* Caller is responsible for limiting the format */
|
||||
|
||||
In 2001, the function was modified to allocate a dynamic buffer on the
|
||||
heap. Too late, the practice of truncating type names, like using the
|
||||
``%.100s`` format, already became a habit, and developers forgot why
|
||||
type names are truncated. In Python, type names are not truncated.
|
||||
|
||||
Truncating type names in C but not in Python goes against :pep:`399`
|
||||
"Pure Python/C Accelerator Module Compatibility Requirements" principles
|
||||
which recommends code behaves the same way if written in Python or in
|
||||
C.
|
||||
|
||||
See the issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary
|
||||
limit of 500 bytes is outdated
|
||||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
* Add ``type.__fully_qualified_name__`` attribute.
|
||||
* Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()``.
|
||||
* Add ``PyType_GetFullyQualifiedName()`` function.
|
||||
* Recommend not truncating type names.
|
||||
|
||||
Python API
|
||||
----------
|
||||
|
||||
Add ``type.__fully_qualified_name__`` read-only attribute, the fully
|
||||
qualified name of a type: similar to
|
||||
``f"{type.__module__}.{type.__qualname__}"`` or ``type.__qualname__`` if
|
||||
``type.__module__`` is not a string or is equal to ``"builtins"``.
|
||||
|
||||
Add PyUnicode_FromFormat() formats
|
||||
----------------------------------
|
||||
|
||||
Add ``%T`` and ``%#T`` formats to ``PyUnicode_FromFormat()`` to format
|
||||
a type name:
|
||||
|
||||
* ``%T`` formats the type "short name" (``type.__name__``).
|
||||
* ``%#T`` formats the type "fully qualified name"
|
||||
(``type.__fully_qualified_name__``).
|
||||
|
||||
Both formats expect a type as argument.
|
||||
|
||||
The hash character (``#``) in the format string stands for
|
||||
`alternative format
|
||||
<https://docs.python.org/3/library/string.html#format-specification-mini-language>`_.
|
||||
For example, ``f"{123:x}"`` returns ``'7b'`` and ``f"{123:#x}"`` returns
|
||||
``'0x7b'`` (``#`` adds ``'0x'`` prefix).
|
||||
|
||||
The ``%T`` format is used by ``time.strftime()``, but it's not used by
|
||||
``printf()``.
|
||||
|
||||
For example, the existing code using *tp_name*:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyErr_Format(PyExc_TypeError,
|
||||
"__format__ must return a str, not %.200s",
|
||||
Py_TYPE(result)->tp_name);
|
||||
|
||||
can be replaced with the ``%T`` format:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyErr_Format(PyExc_TypeError,
|
||||
"__format__ must return a str, not %T",
|
||||
Py_TYPE(result));
|
||||
|
||||
Advantages of the updated code:
|
||||
|
||||
* The ``PyTypeObject.tp_name`` member is no longer read explicitly: the
|
||||
code becomes compatible with the limited C API.
|
||||
* The ``PyTypeObject.tp_name`` bytes string no longer has to be decoded
|
||||
from UTF-8 at each ``PyErr_Format()`` call, since
|
||||
``type.__fully_qualified_name__`` is already a Unicode string.
|
||||
* The type name is no longer truncated.
|
||||
|
||||
Add PyType_GetFullyQualifiedName() function
|
||||
-------------------------------------------
|
||||
|
||||
Add the ``PyType_GetFullyQualifiedName()`` function to get the fully
|
||||
qualified name of a type (``type.__fully_qualified_name__``). API:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)
|
||||
|
||||
On success, return a new reference to the string. On error, raise an
|
||||
exception and return ``NULL``.
|
||||
|
||||
|
||||
Recommend not truncating type names
|
||||
-----------------------------------
|
||||
|
||||
Type names must not be truncated. For example, the ``%.100s`` format
|
||||
should be avoided: use the ``%s`` format instead (or ``%T`` and ``%#T``
|
||||
formats in C).
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
* Pull request: `Add type.__fully_qualified_name__ attribute <https://github.com/python/cpython/pull/112133>`_.
|
||||
* Pull request: `Add %T format to PyUnicode_FromFormat() <https://github.com/python/cpython/pull/111703>`_.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
Only new APIs are added. No existing API is modified. Changes are fully
|
||||
backward compatible.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
Change str(type)
|
||||
----------------
|
||||
|
||||
The ``type.__str__()`` method can be modified to format a type name
|
||||
differently. For example, it can return the type fully qualified name.
|
||||
|
||||
The problem is that it's a backward incompatible change. For example,
|
||||
``enum``, ``functools``, ``optparse``, ``pdb`` and ``xmlrpc.server``
|
||||
modules of the standard library have to be updated.
|
||||
``test_dataclasses``, ``test_descrtut`` and ``test_cmd_line_script``
|
||||
tests have to be updated as well.
|
||||
|
||||
See the `pull request: type(str) returns the fully qualified name
|
||||
<https://github.com/python/cpython/pull/112129>`_.
|
||||
|
||||
|
||||
Add formats to type.__format__()
|
||||
--------------------------------
|
||||
|
||||
Examples of proposed formats for ``type.__format__()``:
|
||||
|
||||
* ``f"{type(obj):z}"`` formats ``type(obj).__name__``.
|
||||
* ``f"{type(obj):M.T}"`` formats ``type(obj).__fully_qualified_name__``.
|
||||
* ``f"{type(obj):M:T}"`` formats ``type(obj).__fully_qualified_name__``
|
||||
using colon (``:``) separator.
|
||||
* ``f"{type(obj):T}"`` formats ``type(obj).__name__``.
|
||||
* ``f"{type(obj):#T}"`` formats ``type(obj).__fully_qualified_name__``.
|
||||
|
||||
Using short format (such as ``z``, a single letter) requires to refer to
|
||||
format documentation to understand how a type name is formatted, whereas
|
||||
``type(obj).__name__`` is explicit.
|
||||
|
||||
The dot character (``.``) is already used for the "precision" in format
|
||||
strings. The colon character (``:``) is already used to separated the
|
||||
expression from the format specification. For example, ``f"{3.14:g}"``
|
||||
uses ``g`` format which comes after the colon (``:``). Usually, a format
|
||||
type is a single letter, such as ``g`` in ``f"{3.14:g}"``, not ``M.T``
|
||||
or ``M:T``. Reusing dot and colon characters for a different purpose can
|
||||
be misleading and make the format parser more complicated.
|
||||
|
||||
Add !t formatter to get an object type
|
||||
--------------------------------------
|
||||
|
||||
Use ``f"{obj!t:T}"`` to format ``type(obj).__name__``, similar to
|
||||
``f"{type(obj).__name__}"``.
|
||||
|
||||
When the ``!t`` formatter was proposed in 2018, `Eric Smith was opposed
|
||||
to this
|
||||
<https://mail.python.org/archives/list/python-dev@python.org/message/BMIW3FEB77OS7OB3YYUUDUBITPWLRG3U/>`_;
|
||||
Eric is the author of the f-string :pep:`498` "Literal String Interpolation".
|
||||
|
||||
|
||||
Add formats to str % args
|
||||
-------------------------
|
||||
|
||||
It was proposed to add formats to format a type name in ``str % arg``.
|
||||
For example, ``%T`` and ``%#T`` formats.
|
||||
|
||||
Nowadays, f-strings are preferred for new code.
|
||||
|
||||
|
||||
Use colon separator in fully qualified name
|
||||
-------------------------------------------
|
||||
|
||||
The colon (``:``) separator eliminates guesswork when you want to import
|
||||
the name, see ``pkgutil.resolve_name()``. A type fully qualified name
|
||||
can be formatted as ``f"{type.__module__}:{type.__qualname__}"``, or
|
||||
``type.__qualname__`` if the type module is ``"builtins"``.
|
||||
|
||||
In the standard library, no code formats a type fully qualified name
|
||||
this way.
|
||||
|
||||
It is already tricky to get a type from its qualified name. The type
|
||||
qualified name already uses the dot (``.``) separator between different
|
||||
parts: class name, ``<locals>``, nested class name, etc.
|
||||
|
||||
The colon separator is not consistent with dot separator used in a
|
||||
module fully qualified name (``module.__name__``).
|
||||
|
||||
|
||||
Other ways to format type names in C
|
||||
------------------------------------
|
||||
|
||||
The ``printf()`` function supports multiple size modifiers: ``hh``
|
||||
(``char``), ``h`` (``short``), ``l`` (``long``), ``ll`` (``long long``),
|
||||
``z`` (``size_t``), ``t`` (``ptrdiff_t``) and ``j`` (``intmax_t``).
|
||||
The ``PyUnicode_FromFormat()`` function supports most of them.
|
||||
|
||||
Proposed formats using ``h`` and ``hh`` length modifiers:
|
||||
|
||||
* ``%hhT`` formats ``type.__name__``.
|
||||
* ``%hT`` formats ``type.__qualname__``.
|
||||
* ``%T`` formats ``type.__fully_qualified_name__``.
|
||||
|
||||
Length modifiers are used to specify the C type of the argument, not to
|
||||
change how an argument is formatted. The alternate form (``#``) changes
|
||||
how an argument is formatted. Here the argument C type is always
|
||||
``PyObject*``.
|
||||
|
||||
Other proposed formats:
|
||||
|
||||
* ``%Q``
|
||||
* ``%t``.
|
||||
* ``%lT`` formats ``type.__fully_qualified_name__``.
|
||||
* ``%Tn`` formats ``type.__name__``.
|
||||
* ``%Tq`` formats ``type.__qualname__``.
|
||||
* ``%Tf`` formats ``type.__fully_qualified_name__``.
|
||||
|
||||
Having more options to format type names can lead to inconsistencies
|
||||
between different modules and make the API more error prone.
|
||||
|
||||
About the ``%t`` format, ``printf()`` now uses ``t`` as a length
|
||||
modifier for ``ptrdiff_t`` argument.
|
||||
|
||||
``type.__qualname__`` can be used in Python and ``PyType_GetQualName()``
|
||||
can be used in C to format a type qualified name.
|
||||
|
||||
|
||||
Omit Py_TYPE() with %T format: pass an object
|
||||
-----------------------------------------------
|
||||
|
||||
It was proposed to format a type name of an object, like:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
PyErr_Format(PyExc_TypeError, "type name: %T", obj);
|
||||
|
||||
The intent is to avoid ``Py_TYPE()`` which returns a borrowed reference
|
||||
to the type. Using a borrowed reference can cause a bug or crash if the
|
||||
type is finalized or deallocated while being used.
|
||||
|
||||
In practice, it's unlikely that a type is finalized while the error
|
||||
message is formatted. Instances of static types cannot have their type
|
||||
deallocated: static types are never deallocated. Since Python 3.8,
|
||||
instances of heap types hold a strong reference to their type (in
|
||||
``PyObject.ob_type``) and it's safe to make the assumption that the code
|
||||
holds a strong reference to the formatted object, so the object type
|
||||
cannot be deallocated.
|
||||
|
||||
In short, it is safe to use a ``Py_TYPE(obj)`` borrowed reference while
|
||||
formatting an error message.
|
||||
|
||||
If the ``%T`` format expects an instance, formatting a type cannot use
|
||||
the ``%T`` format, whereas it's a common operation in stdlib C
|
||||
extensions. The ``%T`` format would only cover half of cases (only
|
||||
instances). If the ``%T`` format takes a type, all cases are covered
|
||||
(types, and instances using ``Py_TYPE()``).
|
||||
|
||||
|
||||
Other proposed APIs to get a type fully qualified name
|
||||
------------------------------------------------------
|
||||
|
||||
* ``type.__fullyqualname__`` attribute name: attribute without an underscore
|
||||
between words. Several dunders, including some of the most recently
|
||||
added ones, include an underscore in the word: ``__class_getitem__``,
|
||||
``__release_buffer__``, ``__type_params__``, ``__init_subclass__`` and
|
||||
``__text_signature__``.
|
||||
* ``type.__fqn__`` attribute name, where FQN stands for Fully Qualified
|
||||
Name.
|
||||
* Add a function to the ``inspect`` module. Need to import the
|
||||
``inspect`` module to use it.
|
||||
|
||||
|
||||
Omit __main__ module in the type fully qualified name
|
||||
-----------------------------------------------------
|
||||
|
||||
The ``pdb`` module formats a type fully qualified names in a similar way
|
||||
as the proposed ``type.__fully_qualified_name__``, but it omits the module
|
||||
if the module is equal to ``"__main__"``.
|
||||
|
||||
The ``unittest`` module and a lot of existing stdlib code format a type
|
||||
fully qualified names the same way as the proposed
|
||||
``type.__fully_qualified_name__``: only omits the module if the module
|
||||
is equal to ``"builtins"``.
|
||||
|
||||
It's possible to omit the ``"__main__."`` prefix of the ``__main__``
|
||||
module with::
|
||||
|
||||
def format_type(cls):
|
||||
if cls.__module__ != "__main"__:
|
||||
return cls.__fully_qualified_name__
|
||||
else:
|
||||
return cls.__qualname__
|
||||
|
||||
|
||||
Discussions
|
||||
===========
|
||||
|
||||
* Discourse: `Enhance type name formatting when raising an exception:
|
||||
add %T format in C, and add type.__fullyqualname__
|
||||
<https://discuss.python.org/t/enhance-type-name-formatting-when-raising-an-exception-add-t-format-in-c-and-add-type-fullyqualname/38129>`_
|
||||
(2023).
|
||||
* Issue: `PyUnicode_FromFormat(): Add %T format to format the type name
|
||||
of an object <https://github.com/python/cpython/issues/111696>`_
|
||||
(2023).
|
||||
* Issue: `C API: Investigate how the PyTypeObject members can be removed
|
||||
from the public C API
|
||||
<https://github.com/python/cpython/issues/105970>`_ (2023).
|
||||
* python-dev thread: `bpo-34595: How to format a type name?
|
||||
<https://mail.python.org/archives/list/python-dev@python.org/thread/HKYUMTVHNBVB5LJNRMZ7TPUQKGKAERCJ/>`_
|
||||
(2018).
|
||||
* Issue: `PyUnicode_FromFormat(): add %T format for an object type name
|
||||
<https://github.com/python/cpython/issues/78776>`_ (2018).
|
||||
* Issue: `Replace %.100s by %s in PyErr_Format(): the arbitrary limit of
|
||||
500 bytes is outdated
|
||||
<https://github.com/python/cpython/issues/55042>`__ (2011).
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue