612 lines
22 KiB
ReStructuredText
612 lines
22 KiB
ReStructuredText
PEP: 557
|
||
Title: Data Classes
|
||
Author: Eric V. Smith <eric@trueblade.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 02-Jun-2017
|
||
Python-Version: 3.7
|
||
Post-History: 08-Sep-2017
|
||
|
||
Notice for Reviewers
|
||
====================
|
||
|
||
This PEP and the initial implementation were drafted in a separate
|
||
repo: https://github.com/ericvsmith/dataclasses. Before commenting in
|
||
a public forum please at least read the `discussion`_ listed at the
|
||
end of this PEP.
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP describes an addition to the standard library called Data
|
||
Classes. Although they use a very different mechanism, Data Classes
|
||
can be thought of as "mutable namedtuples with defaults".
|
||
|
||
A class decorator is provided which inspects a class definition for
|
||
variables with type annotations as defined in PEP 526, "Syntax for
|
||
Variable Annotations". In this document, such variables are called
|
||
fields. Using these fields, the decorator adds generated method
|
||
definitions to the class to support instance initialization, a repr,
|
||
and comparisons methods. Such a class is called a Data Class, but
|
||
there's really nothing special about the class: it is the same class
|
||
but with the generated methods added.
|
||
|
||
As an example::
|
||
|
||
@dataclass
|
||
class InventoryItem:
|
||
name: str
|
||
unit_price: float
|
||
quantity_on_hand: int = 0
|
||
|
||
def total_cost(self) -> float:
|
||
return self.unit_price * self.quantity_on_hand
|
||
|
||
The ``@dataclass`` decorator will add the equivalent of these methods
|
||
to the InventoryItem class::
|
||
|
||
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:
|
||
self.name = name
|
||
self.unit_price = unit_price
|
||
self.quantity_on_hand = quantity_on_hand
|
||
def __repr__(self):
|
||
return f'InventoryItem(name={self.name!r},unit_price={self.unit_price!r},quantity_on_hand={self.quantity_on_hand!r})'
|
||
def __eq__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
def __ne__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
def __lt__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
def __le__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
def __gt__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
def __ge__(self, other):
|
||
if other.__class__ is self.__class__:
|
||
return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand)
|
||
return NotImplemented
|
||
|
||
Data Classes saves you from writing and maintaining these functions.
|
||
|
||
Rationale
|
||
=========
|
||
|
||
There have been numerous attempts to define classes which exist
|
||
primarily to store values which are accessible by attribute lookup.
|
||
Some examples include:
|
||
|
||
- collection.namedtuple in the standard library.
|
||
|
||
- typing.NamedTuple in the standard library.
|
||
|
||
- The popular attrs [#]_ project.
|
||
|
||
- Many example online recipes [#]_, packages [#]_, and questions [#]_.
|
||
David Beazley used a form of data classes as the motivating example
|
||
in a PyCon 2013 metaclass talk [#]_.
|
||
|
||
So, why is this PEP needed?
|
||
|
||
With the addition of PEP 526, Python has a concise way to specify the
|
||
type of class members. This PEP leverages that syntax to provide a
|
||
simple, unobtrusive way to describe Data Classes. With one exception,
|
||
the specified attribute type annotation is completely ignored by Data
|
||
Classes.
|
||
|
||
No base classes or metaclasses are used by Data Classes. Users of
|
||
these classes are free to use inheritance and metaclasses without any
|
||
interference from Data Classes. The decorated classes are truly
|
||
"normal" Python classes. The Data Class decorator should not
|
||
interfere with any usage of the class.
|
||
|
||
Data Classes are not, and are not intended to be, a replacement
|
||
mechanism for all of the above libraries. But being in the standard
|
||
library will allow many of the simpler use cases to instead leverage
|
||
Data Classes. Many of the libraries listed have different feature
|
||
sets, and will of course continue to exist and prosper.
|
||
|
||
Where is it not appropriate to use Data Classes?
|
||
|
||
- Compatibility with tuples is required.
|
||
|
||
- True immutability is required.
|
||
|
||
- Type validation beyond that provided by PEPs 484 and 526 is
|
||
required, or value validation is required.
|
||
|
||
XXX Motivation for each dataclass() and field() parameter
|
||
|
||
Specification
|
||
=============
|
||
|
||
All of the functions described in this PEP will live in a module named
|
||
``dataclasses``.
|
||
|
||
A function ``dataclass`` which is typically used as a class decorator
|
||
is provided to post-process classes and add generated member
|
||
functions, described below.
|
||
|
||
The ``dataclass`` decorator examines the class to find ``field``'s. A
|
||
``field`` is defined as any variable identified in
|
||
``__annotations__``. That is, a variable that is decorated with a
|
||
type annotation. With a single exception described below, none of the
|
||
Data Class machinery examines the type specified in the annotation.
|
||
|
||
Note that ``__annotations__`` is guaranteed to be an ordered mapping,
|
||
in class declaration order. The order of the fields in all of the
|
||
generated methods is the order in which they appear in the class.
|
||
|
||
The ``dataclass`` decorator is typically used with no parameters and
|
||
no parentheses. However, it also supports the following logical
|
||
signature::
|
||
|
||
def dataclass(*, init=True, repr=True, hash=None, cmp=True, frozen=False)
|
||
|
||
If ``dataclass`` is used just as a simple decorator with no
|
||
parameters, it acts as if it has the default values documented in this
|
||
signature. That is, these three uses of ``@dataclass`` are equivalent::
|
||
|
||
@dataclass
|
||
class C:
|
||
...
|
||
|
||
@dataclass()
|
||
class C:
|
||
...
|
||
|
||
@dataclass(init=True, repr=True, hash=None, cmp=True, frozen=False)
|
||
class C:
|
||
...
|
||
|
||
The parameters to ``dataclass`` are:
|
||
|
||
- ``init``: If true, a ``__init__`` method will be generated.
|
||
|
||
- ``repr``: If true, a ``__repr__`` function will be generated. The
|
||
generated repr string will have the class name and the name and repr
|
||
of each field, in the order they are defined in the class. Fields
|
||
that are marked as being excluded from the repr are not included.
|
||
For example:
|
||
``InventoryItem(name='widget',unit_price=3.0,quantity_on_hand=10)``.
|
||
|
||
- ``cmp``: If true, ``__eq__``, ``__ne__``, ``__lt__``, ``__le__``,
|
||
``__gt__``, and ``__ge__`` methods will be generated. These compare
|
||
the class as if it were a tuple of its fields, in order. Both
|
||
instances in the comparison must be of the identical type.
|
||
|
||
- ``hash``: Either a bool or ``None``. If ``None`` (the default), the
|
||
``__hash__`` method is generated according to how cmp and frozen are
|
||
set.
|
||
|
||
If ``cmp`` and ``hash`` are both true, Data Classes will generate a
|
||
``__hash__`` for you. If ``cmp`` is true and ``frozen`` is false,
|
||
``__hash__`` will be set to ``None``, marking it unhashable (which
|
||
it is). If cmp is false, ``__hash__`` will be left untouched
|
||
meaning the ``__hash__`` method of the superclass will be used (if
|
||
superclass is object, this means it will fall back to id-based
|
||
hashing).
|
||
|
||
Although not recommended, you can force Data Classes to create a
|
||
``__hash__`` method with ``hash=True``. This might be the case if your
|
||
class is logically immutable but can nonetheless be mutated. This
|
||
is a specialized use case and should be considered carefully.
|
||
|
||
See the Python documentation [#]_ for more information.
|
||
|
||
- ``frozen``: If True, assigning to fields will generate an exception.
|
||
This emulates read-only frozen instances. See the discussion below.
|
||
|
||
``field``'s may optionally specify a default value, using normal
|
||
Python syntax::
|
||
|
||
@dataclass
|
||
class C:
|
||
a: int # 'a' has no default value
|
||
b: int = 0 # assign a default value for 'b'
|
||
|
||
For common and simple use cases, no other functionality is required.
|
||
There are, however, some Data Class features that require additional
|
||
per-field information. To satisfy this need for additional
|
||
information, you can replace the default field value with a call to
|
||
the provided ``field()`` function. The signature of ``field()`` is::
|
||
|
||
def field(*, default=_MISSING, default_factory=_MISSING, repr=True,
|
||
hash=None, init=True, cmp=True)
|
||
|
||
The ``_MISSING`` value is a sentinel object used to detect if the
|
||
``default`` and ``default_factory`` parameters are provided. Users
|
||
should never use ``_MISSING`` or depend on its value. This sentinel
|
||
is used because ``None`` is a valid value for ``default``.
|
||
|
||
The parameters to ``field()`` are:
|
||
|
||
- ``default``: If provided, this will be the default value for this
|
||
field. This is needed because the ``field`` call itself replaces
|
||
the normal position of the default value.
|
||
|
||
- ``default_factory``: If provided, it must be a zero-argument
|
||
callable that will be called when a default value is needed for this
|
||
field. Among other purposes, this can be used to specify fields
|
||
with mutable default values, as discussed below. It is an error to
|
||
specify both ``default`` and ``default_factory``.
|
||
|
||
- ``init``: If True, this field is included as a parameter to the
|
||
generated ``__init__`` function.
|
||
|
||
- ``repr``: If True, this field is included in the string returned by
|
||
the generated ``__repr__`` function.
|
||
|
||
- ``cmp``: If True, this field is included in the generated comparison
|
||
methods (``__eq__`` et al).
|
||
|
||
- ``hash``: This can be a bool or ``None``. If True, this field is
|
||
included in the generated ``__hash__`` method. If ``None`` (the
|
||
default), use the value of ``cmp``: this would normally be the
|
||
expected behavior. A field needs to be considered in the hash if
|
||
it's used for comparisons. Setting this value to anything other
|
||
than ``None`` is discouraged.
|
||
|
||
``Field`` objects
|
||
-----------------
|
||
|
||
``Field`` objects describe each defined field. These objects are
|
||
created internally, and are returned by the ``fields()`` module-level
|
||
method (see below). Users should never instantiate a ``Field``
|
||
object directly. Its attributes are:
|
||
|
||
- ``name``: The name of the field.
|
||
|
||
- ``type``: The type of the field.
|
||
|
||
- ``default``, ``default_factory``, ``init``, ``repr``, ``hash``, and
|
||
``cmp`` have the identical meaning as they do in the ``field()``
|
||
declaration.
|
||
|
||
post-init processing
|
||
--------------------
|
||
|
||
The generated ``__init__`` code will call a method named
|
||
``__dataclass_post_init__``, if it is defined on the class. It will
|
||
be called as ``self.__dataclass_post_init__()``.
|
||
|
||
Among other uses, this allows for initializing field values that
|
||
depend on one or more other fields.
|
||
|
||
Class variables
|
||
---------------
|
||
|
||
The one place where ``dataclass`` actually inspects the type of a
|
||
field is to determine if a field is a class variable. It does this by
|
||
seeing if the type of the field is given as of type
|
||
``typing.ClassVar``. If a field is a ``ClassVar``, it is excluded
|
||
from consideration as a field and is ignored by the Data Class
|
||
mechanisms.
|
||
|
||
Frozen instances
|
||
----------------
|
||
|
||
It is not possible to create truly immutable Python objects. However,
|
||
by passing ``frozen=True`` to the ``@dataclass`` decorator you can
|
||
emulate immutability. In that case, Data Classes will add
|
||
``__setattr__`` and ``__delattr__`` member functions to the class.
|
||
These functions will raise a ``FrozenInstanceError`` when invoked.
|
||
|
||
There is a tiny performance penalty when using ``frozen=True``:
|
||
``__init__`` cannot use simple assignment to initialize fields, and
|
||
must use ``object.__setattr__``.
|
||
|
||
Mutable default values
|
||
----------------------
|
||
|
||
Python stores the default field values in class attributes.
|
||
Consider this example, not using Data Classes::
|
||
|
||
class C:
|
||
x = []
|
||
def __init__(self, x=x):
|
||
self.x = x
|
||
|
||
assert C().x is C().x
|
||
assert C().x is not C([]).x
|
||
|
||
That is, two instances of class ``C`` that do not not specify a value
|
||
for ``x`` when creating a class instance will share the same copy of
|
||
the list. Because Data Classes just use normal Python class creation,
|
||
they also share this problem. There is no general way for Data
|
||
Classes to detect this condition. Instead, Data Classes will raise a
|
||
``TypeError`` if it detects a default parameter of type ``list``,
|
||
``dict``, or ``set``. This is a partial solution, but it does protect
|
||
against many common errors. See `How to support mutable default
|
||
values`_ in the Discussion section for more details.
|
||
|
||
Using default factory functions is a way to create new instances of
|
||
mutable types as default values for fields::
|
||
|
||
@dataclass
|
||
class C:
|
||
x: list = field(default_factory=list)
|
||
|
||
assert C().x is not C().x
|
||
|
||
Inheritance
|
||
-----------
|
||
|
||
When the Data Class is being created by the ``@dataclass`` decorator,
|
||
it looks through all of the class's base classes in reverse MRO (that
|
||
is, starting at ``object``) and, for each Data Class that it finds,
|
||
adds the fields from that base class to an ordered mapping of fields.
|
||
After all of the base classes, it adds its own fields to the ordered
|
||
mapping. Because the fields are in insertion order, derived classes
|
||
override base classes. An example::
|
||
|
||
@dataclass
|
||
class Base:
|
||
x: float = 15.0
|
||
y: int = 0
|
||
|
||
@dataclass
|
||
class C(Base):
|
||
z: int = 10
|
||
x: int = 15
|
||
|
||
The final list of fields is, in order, ``x``, ``y``, ``z``. The final
|
||
type of ``x`` is ``int``, as specified in class ``C``.
|
||
|
||
Default factory functions
|
||
-------------------------
|
||
|
||
If a field specifies a ``default_factory``, it is called with zero
|
||
arguments when a default value for the field is needed. For example,
|
||
to create a new instance of a list, use::
|
||
|
||
l: list = field(default_factory=list)
|
||
|
||
If a field is excluded from ``__init__`` (using ``init=False``) and
|
||
the field also specifies ``default_factory``, then the default factory
|
||
function will always be called from the generated ``__init__``
|
||
function. This happens because there is no other way to give the
|
||
field a default value.
|
||
|
||
Module level helper functions
|
||
-----------------------------
|
||
|
||
- ``fields(class_or_instance)``: Returns a list of ``Field`` objects
|
||
that define the fields for this Data Class. Accepts either a Data
|
||
Class, or an instance of a Data Class.
|
||
|
||
- ``asdict(instance)``: todo: recursion, class factories, etc.
|
||
|
||
- ``astuple(instance)``: todo: recursion, class factories, etc.
|
||
|
||
.. _discussion:
|
||
|
||
Discussion
|
||
==========
|
||
|
||
python-ideas discussion
|
||
-----------------------
|
||
|
||
This discussion started on python-ideas [#]_ and was moved to a GitHub
|
||
repo [#]_ for further discussion. As part of this discussion, we made
|
||
the decision to use PEP 526 syntax to drive the discovery of fields.
|
||
|
||
Support for automatically setting ``__slots__``?
|
||
------------------------------------------------
|
||
|
||
At least for the initial release, ``__slots__`` will not be supported.
|
||
``__slots__`` needs to be added at class creation time. The Data
|
||
Class decorator is called after the class is created, so in order to
|
||
add ``__slots__`` the decorator would have to create a new class, set
|
||
``__slots__``, and return it. Because this behavior is somewhat
|
||
surprising, the initial version of Data Classes will not support
|
||
automatically setting ``__slots__``. There are a number of
|
||
workarounds:
|
||
|
||
- Manually add ``__slots__`` in the class definition.
|
||
|
||
- Write a function (which could be used as a decorator) that
|
||
inspects the class using ``fields()`` and creates a new class with
|
||
``__slots__`` set.
|
||
|
||
For more discussion, see [#]_.
|
||
|
||
Should post-init take params?
|
||
-----------------------------
|
||
|
||
The post-init function ``__dataclass_post_init__`` takes no
|
||
parameters. This was deemed to be simpler than trying to find a
|
||
mechanism to optionally pass a parameter to the
|
||
``__dataclass_post_init__`` function.
|
||
|
||
|
||
Why not just use namedtuple
|
||
---------------------------
|
||
|
||
- Any namedtuple can be compared to any other with the same number of
|
||
fields. For example: ``Point3D(2017, 6, 2) == Date(2017, 6, 2)``.
|
||
With Data Classes, this would return False.
|
||
|
||
- A namedtuple can be compared to a tuple. For example ``Point2D(1,
|
||
10) == (1, 10)``. With Data Classes, this would return False.
|
||
|
||
- Instances are always iterable, which can make it difficult to add
|
||
fields. If a library defines::
|
||
|
||
Time = namedtuple('Time', ['hour', 'minute'])
|
||
def get_time():
|
||
return Time(12, 0)
|
||
|
||
Then if a user uses this code as::
|
||
|
||
hour, minute = get_time()
|
||
|
||
then it would not be possible to add a ``second`` field to ``Time``
|
||
without breaking the user's code.
|
||
|
||
- No option for mutable instances.
|
||
|
||
- Cannot specify default values.
|
||
|
||
- Cannot control which fields are used for ``__init__``, ``__repr__``,
|
||
etc.
|
||
|
||
Why not just use typing.NamedTuple
|
||
----------------------------------
|
||
|
||
For classes with statically defined fields, it does support similar
|
||
syntax to Data Classes, using type annotations. This produces a
|
||
namedtuple, so it shares ``namedtuple``'s benefits and some of its
|
||
downsides.
|
||
|
||
Why not just use attrs
|
||
----------------------
|
||
|
||
- attrs moves faster than could be accommodated if it were moved in to
|
||
the standard library.
|
||
|
||
- attrs supports additional features not being proposed here:
|
||
validators, converters, metadata, etc. Data Classes makes a
|
||
tradeoff to achieve simplicity by not implementing these
|
||
features.
|
||
|
||
For more discussion, see [#]_.
|
||
|
||
Dynamic creation of classes
|
||
---------------------------
|
||
|
||
An earlier version of this PEP and the sample implementation provided
|
||
a ``make_class`` function that dynamically created Data Classes. This
|
||
functionality was later dropped, although it might be added at a later
|
||
time as a helper function. The ``@dataclass`` decorator does not care
|
||
how classes are created, so they could be either statically defined or
|
||
dynamically defined. For this Data Class::
|
||
|
||
@dataclass
|
||
class C:
|
||
x: int
|
||
y: int = field(init=False, default=0)
|
||
|
||
Here is one way of dynamically creating the same Data Class::
|
||
|
||
cls_dict = {'__annotations__': OrderedDict(x=int, y=int),
|
||
'y': field(init=False, default=0),
|
||
}
|
||
C = dataclass(type('C', (object,), cls_dict))
|
||
|
||
How to support mutable default values
|
||
-------------------------------------
|
||
|
||
One proposal was to automatically copy defaults, so that if a literal
|
||
list ``[]`` was a default value, each instance would get a new list.
|
||
There were undesirable side effects of this decision, so the final
|
||
decision is to disallow the 3 known built-in mutable types: list,
|
||
dict, and set. For a complete discussion of this and other options,
|
||
see [#]_.
|
||
|
||
Examples
|
||
========
|
||
|
||
This code exists in a closed source project::
|
||
|
||
class Application:
|
||
def __init__(self, name, requirements, constraints=None, path='', executable_links=None, executables_dir=()):
|
||
self.name = name
|
||
self.requirements = requirements
|
||
self.constraints = {} if constraints is None else constraints
|
||
self.path = path
|
||
self.executable_links = [] if executable_links is None else executable_links
|
||
self.executables_dir = executables_dir
|
||
self.additional_items = []
|
||
|
||
def __repr__(self):
|
||
return f'Application({self.name!r},{self.requirements!r},{self.constraints!r},{self.path!r},{self.executable_links!r},{self.executables_dir!r},{self.additional_items!r})'
|
||
|
||
This can be replaced by::
|
||
|
||
@dataclass
|
||
class Application:
|
||
name: Str
|
||
requirements: List
|
||
constraints: List[str] = field(default_factory=list)
|
||
path: Str = ''
|
||
executable_links: List[str] = field(default_factory=list)
|
||
executable_dir: Tuple[str] = ()
|
||
additional_items: List[str] = field(init=False, default_factory=list)
|
||
|
||
The Data Class version is more declarative, has less code, supports
|
||
``typing``, and includes the other generated functions.
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
The following people provided invaluable input during the development
|
||
of this PEP and code: Ivan Levkivskyi, Guido van Rossum, Hynek
|
||
Schlawack, Raymond Hettinger, and Lisa Roach. I thank them for their
|
||
time and expertise.
|
||
|
||
A special mention must be made about the attrs project. It was a true
|
||
inspiration for this PEP, and I respect the design decisions they
|
||
made.
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#] attrs project on github
|
||
(https://github.com/python-attrs/attrs)
|
||
|
||
.. [#] DictDotLookup recipe
|
||
(http://code.activestate.com/recipes/576586-dot-style-nested-lookups-over-dictionary-based-dat/)
|
||
|
||
.. [#] attrdict package
|
||
(https://pypi.python.org/pypi/attrdict)
|
||
|
||
.. [#] StackOverflow question about data container classes
|
||
(https://stackoverflow.com/questions/3357581/using-python-class-as-a-data-container)
|
||
|
||
.. [#] David Beazley metaclass talk featuring data classes
|
||
(https://www.youtube.com/watch?v=sPiWg5jSoZI)
|
||
|
||
.. [#] Python documentation for __hash__
|
||
(https://docs.python.org/3/reference/datamodel.html#object.__hash__)
|
||
|
||
.. [#] Start of python-ideas discussion
|
||
(https://mail.python.org/pipermail/python-ideas/2017-May/045618.html)
|
||
|
||
.. [#] GitHub repo where discussions and initial development took place
|
||
(https://github.com/ericvsmith/dataclasses)
|
||
|
||
.. [#] Support __slots__?
|
||
(https://github.com/ericvsmith/dataclasses/issues/28)
|
||
|
||
.. [#] why not just attrs?
|
||
(https://github.com/ericvsmith/dataclasses/issues/19)
|
||
|
||
.. [#] Copying mutable defaults
|
||
(https://github.com/ericvsmith/dataclasses/issues/3)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|