From 6be5d81449d53b8e155cf1f4366ce756082037c1 Mon Sep 17 00:00:00 2001 From: "Eric V. Smith" Date: Fri, 8 Sep 2017 07:48:26 -0700 Subject: [PATCH] Added PEP 557: Data Classes --- pep-0557.rst | 611 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 611 insertions(+) create mode 100644 pep-0557.rst diff --git a/pep-0557.rst b/pep-0557.rst new file mode 100644 index 000000000..38aaf6e21 --- /dev/null +++ b/pep-0557.rst @@ -0,0 +1,611 @@ +PEP: 557 +Title: Data Classes +Author: Eric V. Smith +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 02-Jun-2017 +Python-Version: 3.7 +Post-History: 08-Sep-2017 + +Notice for Reviewers +==================== + +This PEP and the initial implementation were drafted in a separate +repo: https://github.com/ericvsmith/dataclasses. Before commenting in +a public forum please at least read the `discussion`_ listed at the +end of this PEP. + +Abstract +======== + +This PEP describes an addition to the standard library called Data +Classes. Although they use a very different mechanism, Data Classes +can be thought of as "mutable namedtuples with defaults". + +A class decorator is provided which inspects a class definition for +variables with type annotations as defined in PEP 526, "Syntax for +Variable Annotations". In this document, such variables are called +fields. Using these fields, the decorator adds generated method +definitions to the class to support instance initialization, a repr, +and comparisons methods. Such a class is called a Data Class, but +there's really nothing special about the class: it is the same class +but with the generated methods added. + +As an example:: + + @dataclass + class InventoryItem: + name: str + unit_price: float + quantity_on_hand: int = 0 + + def total_cost(self) -> float: + return self.unit_price * self.quantity_on_hand + +The ``@dataclass`` decorator will add the equivalent of these methods +to the InventoryItem class:: + + def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None: + self.name = name + self.unit_price = unit_price + self.quantity_on_hand = quantity_on_hand + def __repr__(self): + return f'InventoryItem(name={self.name!r},unit_price={self.unit_price!r},quantity_on_hand={self.quantity_on_hand!r})' + def __eq__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + def __ne__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + def __lt__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + def __le__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + def __gt__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + def __ge__(self, other): + if other.__class__ is self.__class__: + return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand) + return NotImplemented + +Data Classes saves you from writing and maintaining these functions. + +Rationale +========= + +There have been numerous attempts to define classes which exist +primarily to store values which are accessible by attribute lookup. +Some examples include: + +- collection.namedtuple in the standard library. + +- typing.NamedTuple in the standard library. + +- The popular attrs [#]_ project. + +- Many example online recipes [#]_, packages [#]_, and questions [#]_. + David Beazley used a form of data classes as the motivating example + in a PyCon 2013 metaclass talk [#]_. + +So, why is this PEP needed? + +With the addition of PEP 526, Python has a concise way to specify the +type of class members. This PEP leverages that syntax to provide a +simple, unobtrusive way to describe Data Classes. With one exception, +the specified attribute type annotation is completely ignored by Data +Classes. + +No base classes or metaclasses are used by Data Classes. Users of +these classes are free to use inheritance and metaclasses without any +interference from Data Classes. The decorated classes are truly +"normal" Python classes. The Data Class decorator should not +interfere with any usage of the class. + +Data Classes are not, and are not intended to be, a replacement +mechanism for all of the above libraries. But being in the standard +library will allow many of the simpler use cases to instead leverage +Data Classes. Many of the libraries listed have different feature +sets, and will of course continue to exist and prosper. + +Where is it not appropriate to use Data Classes? + +- Compatibility with tuples is required. + +- True immutability is required. + +- Type validation beyond that provided by PEPs 484 and 526 is + required, or value validation is required. + +XXX Motivation for each dataclass() and field() parameter + +Specification +============= + +All of the functions described in this PEP will live in a module named +``dataclasses``. + +A function ``dataclass`` which is typically used as a class decorator +is provided to post-process classes and add generated member +functions, described below. + +The ``dataclass`` decorator examines the class to find ``field``'s. A +``field`` is defined as any variable identified in +``__annotations__``. That is, a variable that is decorated with a +type annotation. With a single exception described below, none of the +Data Class machinery examines the type specified in the annotation. + +Note that ``__annotations__`` is guaranateed to be an ordered mapping, +in class declaration order. The order of the fields in all of the +generated methods is the order in which they appear in the class. + +The ``dataclass`` decorator is typically used with no parameters and +no parenthesis. However, it also supports the following logical +signature:: + + def dataclass(*, init=True, repr=True, hash=None, cmp=True, frozen=False) + +If ``dataclass`` is used just as a simple decorator with no +parameters, it acts as if it has the default values documented in this +signature. That is, these three uses of ``@dataclass`` are equivalent:: + + @dataclass + class C: + ... + + @dataclass() + class C: + ... + + @dataclass(init=True, repr=True, hash=None, cmp=True, frozen=False) + class C: + ... + +The parameters to ``dataclass`` are: + +- ``init``: If true, a ``__init__`` method will be generated. + +- ``repr``: If true, a ``__repr__`` function will be generated. The + generated repr string will have the class name and the name and repr + of each field, in the order they are defined in the class. Fields + that are marked as being excluded from the repr are not included. + For example: + ``InventoryItem(name='widget',unit_price=3.0,quantity_on_hand=10)``. + +- ``cmp``: If true, ``__eq__``, ``__ne__``, ``__lt__``, ``__le__``, + ``__gt__``, and ``__ge__`` methods will be generated. These compare + the class as if it were a tuple of its fields, in order. Both + instances in the comparison must be of the identical type. + +- ``hash``: Either a bool or ``None``. If ``None`` (the default), the + ``__hash__`` method is generated according to how cmp and frozen are + set. + + If ``cmp`` and ``hash`` are both true, Data Classes will generate a + ``__hash__`` for you. If ``cmp`` is true and ``frozen`` is false, + ``__hash__`` will be set to ``None``, marking it unhashable (which + it is). If cmp is false, ``__hash__`` will be left untouched + meaning the ``__hash__`` method of the superclass will be used (if + superclass is object, this means it will fall back to id-based + hashing). + + Although not recommended, you force Data Classes to create a + ``__hash__`` method ``hash=True``. This might be the case if your + class is logically immutable but can none the less be mutated. This + is a specialized use case and should be considered carefully. + + See the Python documentation [#]_ for more information. + +- ``frozen``: If true, assigning to fields will generate an exception. + This emulates read-only frozen instances. See the discussion below. + +``field``'s may optionally specify a default value, using normal +Python syntax:: + + @dataclass + class C: + int a # 'a' has no default value + int b = 0 # assign a default value for 'b' + +For common and simple use cases, no other functionality is required. +There are, however, some Data Class features that require additional +per-field information. To satisfy this need for additional +information, you can replace the default field value with a call to +the provided ``field()`` function. The signature of ``field()`` is:: + + def field(*, default=_MISSING, default_factory=_MISSING, repr=True, + hash=None, init=True, cmp=True) + +The ``_MISSING`` value is a sentinel object used to detect if the +``default`` and ``default_factory`` parameters are provided. Users +should never use ``_MISSING`` or depend on its value. This sentinel +is used because ``None`` is a valid value for ``default``. + +The parameters to ``field()`` are: + +- ``default``: If provided, this will be the default value for this + field. This is needed because the ``field`` call itself replaces + the normal position of the default value. + +- ``default_factory``: If provided, a zero-argument callable that will + be called when a default value is needed for this field. Among + other purposes, this can be used to specify fields with mutable + default values, discussed below. It is an error to specify both + ``default`` and ``default_factory``. + +- ``init``: If true, this field is included as a parameter to the + generated ``__init__`` function. + +- ``repr``: If true, this field is included in the string returned by + the generated ``__repr__`` function. + +- ``cmp``: If true, this field is included in the generated comparison + methods (``__eq__`` et al). + +- ``hash``: This can be a bool or ``None``. If true, this field is + included in the generated ``__hash__`` method. If ``None`` (the + default), use the value of ``cmp``: this would normally be the + expected behavior. A field needs to be considered in the hash if + it's used for comparisons. Setting this value to anything other + than ``None`` is discouraged. + +``Field`` objects +----------------- + +``Field`` objects describe each defined field. These objects are +created internally, and are returned by the ``fields()`` module-level +method (see below). Users should never instantiate a ``Field`` +object directly. Its attributes are: + + - ``name``: The name of the field. + + - ``type``: The type of the field. + + - ``default``, ``default_factory``, ``init``, ``repr``, ``hash``, and + ``cmp`` have the identical meaning as they do in the ``field()`` + declaration. + +post-init processing +-------------------- + +The generated ``__init__`` code will call a method named +``__dataclass_post_init__``, if it is defined on the class. It will +be called as ``self.__dataclass_post_init__()``. + +Among other uses, this allows for initializing field values that +depend on one or more other fields. + +Class variables +--------------- + +The one place where ``dataclass`` actually inspects the type of a +field is to determine if a field is a class variable. It does this by +seeing if the type of the field is given as of type +``typing.ClassVar``. If a field is a ``ClassVar``, it is excluded +from consideration as a field and is ignored by the Data Class +mechanisms. + +Frozen instances +---------------- + +It is not possible to create truly immutable Python objects. However, +by passing ``frozen=True`` to the ``@dataclass`` decorator you can +emulate immutability. In that case, Data Classes will add +``__setattr__`` and ``__delattr__`` member functions to the class. +These functions will raise a ``FrozenInstanceError`` when invoked. + +There is a tiny performance penalty when using ``frozen=True``: +``__init__`` cannot use simple assignment to initialize fields, and +must use ``object.__setattr__``. + +Mutable default values +---------------------- + +Python stores the default field values in class attributes. +Consider this example, not using Data Classes:: + + class C: + x = [] + def __init__(self, x=x): + self.x = x + + assert C().x is C().x + assert C().x is not C([]).x + +That is, two instances of class ``C`` that do not not specify a value +for ``x`` when creating a class instance will share the same copy of +the list. Because Data Classes just use normal Python class creation, +they also share this problem. There is no general way for Data +Classes to detect this condition. Instead, Data Classes will raise a +``TypeError`` if it detects a default parameter of type ``list``, +``dict``, or ``set``. This is a partial solution, but it does protect +against many common errors. See `How to support mutable default +values`_ in the Discussion section for more details. + +Using default factory functions is a way to create new instances of +mutable types as default values for fields:: + + @dataclass + class C: + x: list = field(default_factory=list) + + assert C().x is not C().x + +Inheritance +----------- + +When the Data Class is being created by the ``@dataclass`` decorator, +it looks through all of the class's base classes in reverse MRO (that +is, starting at ``object``) and, for each Data Class that it finds, +adds the fields from that base class to an ordered mapping of fields. +After all of the base classes, it adds its own fields to the ordered +mapping. Because the fields are in insertion order, derived classes +override base classes. An example:: + + @dataclass + class Base: + x: float = 15.0 + y: int = 0 + + @dataclass + class C(Base): + z: int = 10 + x: int = 15 + +The final list of fields is, in order, ``x``, ``y``, ``z``. The final +type of ``x`` is ``int``, as specified in class ``C``. + +Default factory functions +------------------------- + +If a field specifies a ``default_factory``, it is called with zero +arguments when a default value for the field is needed. For example, +to create a new instance of a list, use:: + + l: list = field(default_factory=list) + +If a field is excluded from ``__init__`` (using ``init=False``) and +the field also specifies ``default_factory``, then the default factory +function will always be called from the generated ``__init__`` +function. This happens because there is no other way to give the +field a default value. + +Module level helper functions +----------------------------- + +- ``fields(class_or_instance)``: Returns a list of ``Field`` objects + that define the fields for this Data Class. Accepts either a Data + Class, or an instance of a Data Class. + +- ``asdict(instance)``: todo: recursion, class factories, etc. + +- ``astuple(instance)``: todo: recursion, class factories, etc. + +.. _discussion: + +Discussion +========== + +python-ideas discussion +----------------------- + +This discussion started on python-ideas [#]_ and was moved to a GitHub +repo [#]_ for further discussion. As part of this discussion, we made +the decision to use PEP 526 syntax to drive the discovery of fields. + +Support for automatically setting ``__slots__``? +------------------------------------------------ + +At least for the initial release, ``__slots__`` will not be supported. +``__slots__`` needs to be added at class creation time. The Data +Class decorator is called after the class is created, so in order to +add ``__slots__`` the decorator would have to create a new class, set +``__slots__``, and return it. Because this behavior is somewhat +surprising, the initial version of Data Classes will not support +automatically setting ``__slots__``. There are a number of +workarounds: + + - Manually add ``__slots__`` in the class definition. + + - Write a function (which could be used as a decorator) that + inspects the class using ``fields()`` and creates a new class with + ``__slots__`` set. + +For more discussion, see [#]_. + +Should post-init take params? +----------------------------- + +The post-init function ``__dataclass_post_init__`` takes no +parameters. This was deemed to be simpler than trying to find a +mechanism to optionally pass a parameter to the +``__dataclass_post_init__`` function. + + +Why not just use namedtuple +--------------------------- + +- Any namedtuple can be compared to any other with the same number of + fields. For example: ``Point3D(2017, 6, 2) == Date(2017, 6, 2)``. + With Data Classes, this would return False. + +- A namedtuple can be compared to a tuple. For example ``Point2D(1, + 10) == (1, 10)``. With Data Classes, this would return False. + +- Instances are always iterable, which can make it difficult to add + fields. If a library defines:: + + Time = namedtuple('Time', ['hour', 'minute']) + def get_time(): + return Time(12, 0) + + Then if a user uses this code as:: + + hour, minute = get_time() + + then it would not be possible to add a ``second`` field to ``Time`` + without breaking the user's code. + +- No option for mutable instances. + +- Cannot specify default values. + +- Cannot control which fields are used for ``__init__``, ``__repr__``, + etc. + +Why not just use typing.NamedTuple +---------------------------------- + +For classes with statically defined fields, it does support similar +syntax to Data Classes, using type annotations. This produces a +namedtuple, so it shares ``namedtuple``'s benefits and some of its +downsides. + +Why not just use attrs +---------------------- + +- attrs moves faster than could be accommodated if it were moved in to + the standard library. + +- attrs supports additional features not being proposed here: + validators, converters, metadata, etc. Data Classes makes a + tradeoff to achieve simplicity by not implementing these + features. + +For more discussion, see [#]_. + +Dynamic creation of classes +--------------------------- + +An earlier version of this PEP and the sample implementation provided +a ``make_class`` function that dynamically created Data Classes. This +functionality was later dropped, although it might be added at a later +time as a helper function. The ``@dataclass`` decorator does not care +how classes are created, so they could be either statically defined or +dynamically defined. For this Data Class:: + + @dataclass + class C: + x: int + y: int = field(init=False, default=0) + +Here is one way of dynamically creating the same Data Class:: + + cls_dict = {'__annotations__': OrderedDict(x=int, y=int), + 'y': field(init=False, default=0), + } + C = dataclass(type('C', (object,), cls_dict)) + +How to support mutable default values +------------------------------------- + +One proposal was to automatically copy defaults, so that if a literal +list ``[]`` was a default value, each instance would get a new list. +There were undesirable side effects of this decision, so the final +decision is to disallow the 3 known built-in mutable types: list, +dict, and set. For a complete discussion of this and other options, +see [#]_. + +Examples +======== + +This code exists in a closed source project:: + + class Application: + def __init__(self, name, requirements, constraints=None, path='', executable_links=None, executables_dir=()): + self.name = name + self.requirements = requirements + self.constraints = {} if constraints is None else constraints + self.path = path + self.executable_links = [] if executable_links is None else executable_links + self.executables_dir = executables_dir + self.additional_items = [] + + def __repr__(self): + return f'Application({self.name!r},{self.requirements!r},{self.constraints!r},{self.path!r},{self.executable_links!r},{self.executables_dir!r},{self.additional_items!r})' + +This can be replaced by:: + + @dataclass + class Application: + name: Str + requirements: List + constraints: List[str] = field(default_factory=list) + path: Str = '' + executable_links: List[str] = field(default_factory=list) + executable_dir: Tuple[str] = () + additional_items: List[str] = field(init=False, default_factory=list) + +The Data Class version is more declarative, has less code, supports +``typing``, and includes the other generated functions. + +Acknowledgements +================ + +The following people provided invaluable input during the development +of this PEP and code: Ivan Levkivskyi, Guido van Rossum, Hynek +Schlawack, Raymond Hettinger, and Lisa Roach. I thank them for their +time and expertise. + +A special mention must be made about the attrs project. It was a true +inspiration for this PEP, and I respect the design decisions they +made. + +References +========== + +.. [#] attrs project on github + (https://github.com/python-attrs/attrs) + +.. [#] DictDotLookup recipe + (http://code.activestate.com/recipes/576586-dot-style-nested-lookups-over-dictionary-based-dat/) + +.. [#] attrdict package + (https://pypi.python.org/pypi/attrdict) + +.. [#] StackOverflow question about data container classes + (https://stackoverflow.com/questions/3357581/using-python-class-as-a-data-container) + +.. [#] David Beazley metaclass talk featuring data classes + (https://www.youtube.com/watch?v=sPiWg5jSoZI) + +.. [#] Python documentation for __hash__ + (https://docs.python.org/3/reference/datamodel.html#object.__hash__) + +.. [#] Start of python-ideas discussion + (https://mail.python.org/pipermail/python-ideas/2017-May/045618.html) + +.. [#] GitHub repo where discussions and initial development took place + (https://github.com/ericvsmith/dataclasses) + +.. [#] Support __slots__? + (https://github.com/ericvsmith/dataclasses/issues/28) + +.. [#] why not just attrs? + (https://github.com/ericvsmith/dataclasses/issues/19) + +.. [#] Copying mutable defaults + (https://github.com/ericvsmith/dataclasses/issues/3) + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: