305 lines
12 KiB
ReStructuredText
305 lines
12 KiB
ReStructuredText
PEP: 712
|
|
Title: Adding a "converter" parameter to dataclasses.field
|
|
Author: Joshua Cannon <joshdcannon@gmail.com>
|
|
Sponsor: Eric V. Smith <eric at trueblade.com>
|
|
Discussions-To: https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 01-Jan-2023
|
|
Python-Version: 3.13
|
|
Post-History: `27-Dec-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/NWZQIINJQZDOCZGO6TGCUP2PNW4PEKNY/>`__,
|
|
`19-Jan-2023 <https://discuss.python.org/t/add-converter-to-dataclass-field/22956>`__,
|
|
`23-Apr-2023 <https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126>`__,
|
|
|
|
Abstract
|
|
========
|
|
|
|
:pep:`557` added :mod:`dataclasses` to the Python stdlib. :pep:`681` added
|
|
:func:`~py3.11:typing.dataclass_transform` to help type checkers understand
|
|
several common dataclass-like libraries, such as attrs, Pydantic, and object
|
|
relational mapper (ORM) packages such as SQLAlchemy and Django.
|
|
|
|
A common feature other libraries provide over the standard library
|
|
implementation is the ability for the library to convert arguments given at
|
|
initialization time into the types expected for each field using a
|
|
user-provided conversion function.
|
|
|
|
Therefore, this PEP adds a ``converter`` parameter to :func:`dataclasses.field`
|
|
(along with the requisite changes to :class:`dataclasses.Field` and
|
|
:func:`~py3.11:typing.dataclass_transform`) to specify the function to use to
|
|
convert the input value for each field to the representation to be stored in
|
|
the dataclass.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
There is no existing, standard way for :mod:`dataclasses` or third-party
|
|
dataclass-like libraries to support argument conversion in a type-checkable
|
|
way. To work around this limitation, library authors/users are forced to choose
|
|
to:
|
|
|
|
* Opt-in to a custom Mypy plugin. These plugins help Mypy understand the
|
|
conversion semantics, but not other tools.
|
|
* Shift conversion responsibility onto the caller of the dataclass
|
|
constructor. This can make constructing certain dataclasses unnecessarily
|
|
verbose and repetitive.
|
|
* Provide a custom ``__init__`` which declares "wider" parameter types and
|
|
converts them when setting the appropriate attribute. This not only duplicates
|
|
the typing annotations between the converter and ``__init__``, but also opts
|
|
the user out of many of the features :mod:`dataclasses` provides.
|
|
* Provide a custom ``__init__`` but without meaningful type annotations
|
|
for the parameter types requiring conversion.
|
|
|
|
None of these choices are ideal.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Adding argument conversion semantics is useful and beneficial enough that most
|
|
dataclass-like libraries provide support. Adding this feature to the standard
|
|
library means more users are able to opt-in to these benefits without requiring
|
|
third-party libraries. Additionally third-party libraries are able to clue
|
|
type-checkers into their own conversion semantics through added support in
|
|
:func:`~py3.11:typing.dataclass_transform`, meaning users of those libraries
|
|
benefit as well.
|
|
|
|
Specification
|
|
=============
|
|
|
|
New ``converter`` parameter
|
|
---------------------------
|
|
|
|
This specification introduces a new parameter named ``converter`` to the
|
|
:func:`dataclasses.field` function. If provided, it represents a single-argument
|
|
callable used to convert all values when assigning to the associated attribute.
|
|
|
|
For frozen dataclasses, the converter is only used inside a ``dataclass``-synthesized
|
|
``__init__`` when setting the attribute. For non-frozen dataclasses, the converter
|
|
is used for all attribute assignment (E.g. ``obj.attr = value``), which includes
|
|
assignment of default values.
|
|
|
|
The converter is not used when reading attributes, as the attributes should already
|
|
have been converted.
|
|
|
|
Adding this parameter also implies the following changes:
|
|
|
|
* A ``converter`` attribute will be added to :class:`dataclasses.Field`.
|
|
* ``converter`` will be added to :func:`~py3.11:typing.dataclass_transform`'s
|
|
list of supported field specifier parameters.
|
|
|
|
Example
|
|
'''''''
|
|
|
|
.. code-block:: python
|
|
|
|
@dataclasses.dataclass
|
|
class InventoryItem:
|
|
# `converter` as a type (including a GenericAlias).
|
|
id: int = dataclasses.field(converter=int)
|
|
skus: tuple[int, ...] = dataclasses.field(converter=tuple[int, ...])
|
|
# `converter` as a callable.
|
|
vendor: str | None = dataclasses.field(converter=str_or_none))
|
|
names: tuple[str, ...] = dataclasses.field(
|
|
converter=lambda names: tuple(map(str.lower, names))
|
|
) # Note that lambdas are supported, but discouraged as they are untyped.
|
|
|
|
# The default value is also converted; therefore the following is not a
|
|
# type error.
|
|
stock_image_path: pathlib.PurePosixPath = dataclasses.field(
|
|
converter=pathlib.PurePosixPath, default="assets/unknown.png"
|
|
)
|
|
|
|
# Default value conversion extends to `default_factory`;
|
|
# therefore the following is also not a type error.
|
|
shelves: tuple = dataclasses.field(
|
|
converter=tuple, default_factory=list
|
|
)
|
|
|
|
item1 = InventoryItem(
|
|
"1",
|
|
[234, 765],
|
|
None,
|
|
["PYTHON PLUSHIE", "FLUFFY SNAKE"]
|
|
)
|
|
# item1's repr would be (with added newlines for readability):
|
|
# InventoryItem(
|
|
# id=1,
|
|
# skus=(234, 765),
|
|
# vendor=None,
|
|
# names=('PYTHON PLUSHIE', 'FLUFFY SNAKE'),
|
|
# stock_image_path=PurePosixPath('assets/unknown.png'),
|
|
# shelves=()
|
|
# )
|
|
|
|
# Attribute assignment also participates in conversion.
|
|
item1.skus = [555]
|
|
# item1's skus attribute is now (555,).
|
|
|
|
|
|
Impact on typing
|
|
----------------
|
|
|
|
A ``converter`` must be a callable that accepts a single positional argument, and
|
|
the parameter type corresponding to this positional argument provides the type
|
|
of the the synthesized ``__init__`` parameter associated with the field.
|
|
|
|
In other words, the argument provided for the converter parameter must be
|
|
compatible with ``Callable[[T], X]`` where ``T`` is the input type for
|
|
the converter and ``X`` is the output type of the converter.
|
|
|
|
Type-checking ``default`` and ``default_factory``
|
|
'''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
Because default values are unconditionally converted using ``converter``, if
|
|
an argument for ``converter`` is provided alongside either ``default`` or
|
|
``default_factory``, the type of the default (the ``default`` argument if
|
|
provided, otherwise the return value of ``default_factory``) should be checked
|
|
using the type of the single argument to the ``converter`` callable.
|
|
|
|
Converter return type
|
|
'''''''''''''''''''''
|
|
|
|
The return type of the callable must be a type that's compatible with the
|
|
field's declared type. This includes the field's type exactly, but can also be
|
|
a type that's more specialized (such as a converter returning a ``list[int]``
|
|
for a field annotated as ``list``, or a converter returning an ``int`` for a
|
|
field annotated as ``int | str``).
|
|
|
|
Indirection of allowable argument types
|
|
---------------------------------------
|
|
|
|
One downside introduced by this PEP is that knowing what argument types are
|
|
allowed in the dataclass' ``__init__`` and during attribute assignment is not
|
|
immediately obvious from reading the dataclass. The allowable types are defined
|
|
by the converter.
|
|
|
|
This is true when reading code from source, however typing-related aides such
|
|
as ``typing.reveal_type`` and "IntelliSense" in an IDE should make it easy to know
|
|
exactly what types are allowed without having to read any source code.
|
|
|
|
|
|
Backward Compatibility
|
|
======================
|
|
|
|
These changes don't introduce any compatibility problems since they
|
|
only introduce opt-in new features.
|
|
|
|
Security Implications
|
|
======================
|
|
|
|
There are no direct security concerns with these changes.
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
Documentation and examples explaining the new parameter and behavior will be
|
|
added to the relevant sections of the docs site (primarily on
|
|
:mod:`dataclasses`) and linked from the *What's New* document.
|
|
|
|
The added documentation/examples will also cover the "common pitfalls" that
|
|
users of converters are likely to encounter. Such pitfalls include:
|
|
|
|
* Needing to handle ``None``/sentinel values.
|
|
* Needing to handle values that are already of the correct type.
|
|
* Avoiding lambdas for converters, as the synthesized ``__init__``
|
|
parameter's type will become ``Any``.
|
|
* Forgetting to convert values in the bodies of user-defined ``__init__`` in
|
|
frozen dataclasses.
|
|
* Forgetting to convert values in the bodies of user-defined ``__setattr__`` in
|
|
non-frozen dataclasses.
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The attrs library `already includes <attrs-converters_>`__ a ``converter``
|
|
parameter containing converter semantics.
|
|
|
|
CPython support is implemented on `a branch in the author's fork <cpython-branch_>`__.
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
Just adding "converter" to ``typing.dataclass_transform``'s ``field_specifiers``
|
|
--------------------------------------------------------------------------------
|
|
|
|
The idea of isolating this addition to
|
|
:func:`~py3.11:typing.dataclass_transform` was briefly
|
|
`discussed on Typing-SIG <only-dataclass-transform_>`__ where it was suggested
|
|
to broaden this to :mod:`dataclasses` more generally.
|
|
|
|
Additionally, adding this to :mod:`dataclasses` ensures anyone can reap the
|
|
benefits without requiring additional libraries.
|
|
|
|
Not converting default values
|
|
-----------------------------
|
|
|
|
There are pros and cons with both converting and not converting default values.
|
|
Leaving default values as-is allows type-checkers and dataclass authors to
|
|
expect that the type of the default matches the type of the field. However,
|
|
converting default values has three large advantages:
|
|
|
|
1. Consistency. Unconditionally converting all values that are assigned to the
|
|
attribute, involves fewer "special rules" that users must remember.
|
|
|
|
2. Simpler defaults. Allowing the default value to have the same type as
|
|
user-provided values means dataclass authors get the same conveniences as
|
|
their callers.
|
|
|
|
3. Compatibility with attrs. Attrs unconditionally uses the converter to
|
|
convert default values.
|
|
|
|
Automatic conversion using the field's type
|
|
-------------------------------------------
|
|
|
|
One idea could be to allow the type of the field specified (e.g. ``str`` or
|
|
``int``) to be used as a converter for each argument provided.
|
|
`Pydantic's data conversion <pydantic-data-conversion_>`__ has semantics which
|
|
appear to be similar to this approach.
|
|
|
|
This works well for fairly simple types, but leads to ambiguity in expected
|
|
behavior for complex types such as generics. E.g. For ``tuple[int, ...]`` it is
|
|
ambiguous if the converter is supposed to simply convert an iterable to a tuple,
|
|
or if it is additionally supposed to convert each element type to ``int``. Or
|
|
for ``int | None``, which isn't callable.
|
|
|
|
Deducing the attribute type from the return type of the converter
|
|
-----------------------------------------------------------------
|
|
|
|
Another idea would be to allow the user to omit the attribute's type annotation
|
|
if providing a ``field`` with a ``converter`` argument. Although this would
|
|
reduce the common repetition this PEP introduces (e.g. ``x: str = field(converter=str)``),
|
|
it isn't clear how to best support this while maintaining the current dataclass
|
|
semantics (namely, that the attribute order is preserved for things like the
|
|
synthesized ``__init__``, or ``dataclasses.fields``). This is because there isn't
|
|
an easy way in Python (today) to get the annotation-only attributes interspersed
|
|
with un-annotated attributes in the order they were defined.
|
|
|
|
A sentinel annotation could be applied (e.g. ``x: FromConverter = ...``),
|
|
however this breaks a fundamental assumption of type annotations.
|
|
|
|
Lastly, this is feasible if *all* fields (including those without a converter)
|
|
were assigned to ``dataclasses.field``, which means the class' own namespace
|
|
holds the order, however this trades repetition of type+converter with
|
|
repetition of field assignment. The end result is no gain or loss of repetition,
|
|
but with the added complexity of dataclasses semantics.
|
|
|
|
This PEP doesn't suggest it can't or shouldn't be done. Just that it isn't
|
|
included in this PEP.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _attrs-converters: https://www.attrs.org/en/21.2.0/examples.html#conversion
|
|
.. _cpython-branch: https://github.com/thejcannon/cpython/tree/converter
|
|
.. _only-dataclass-transform: https://mail.python.org/archives/list/typing-sig@python.org/thread/NWZQIINJQZDOCZGO6TGCUP2PNW4PEKNY/
|
|
.. _pydantic-data-conversion: https://docs.pydantic.dev/usage/models/#data-conversion
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|