python-peps/pep-0584.rst

335 lines
9.9 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 584
Title: Add + and - operators to the built-in dict class.
Version: $Revision$
Last-Modified: $Date$
Author: Steven D'Aprano <steve@pearwood.info>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Mar-2019
Post-History:
=====================================
PEP-584 Dict addition and subtraction
=====================================
**DRAFT** -- This is a draft document for discussion.
Abstract
--------
This PEP suggests adding merge ``+`` and difference ``-`` operators to
the built-in ``dict`` class.
The merge operator will have the same relationship to the
``dict.update`` method as the list concatenation operator has to
``list.extend``, with dict difference being defined analogously.
Examples
--------
Dict addition will return a new dict containing the left operand
merged with the right operand.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d + e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e + d
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
The augmented assignment version operates in-place.
>>> d += e
>>> print(d)
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Analogously with list addition, the operator version is more
restrictive, and requires that both arguments are dicts, while the
augmented assignment version allows anything the ``update`` method
allows, such as iterables of key/value pairs.
>>> d + [('spam', 999)]
Traceback (most recent call last):
...
TypeError: can only merge dict (not "list") to dict
>>> d += [('spam', 999)]
>>> print(d)
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Dict difference ``-`` will return a new dict containing the items from
the left operand which are not in the right operand.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d - e
{'spam': 1, 'eggs': 2}
>>> e - d
{'aardvark': 'Ethel'}
Augmented assignment will operate in place.
>>> d -= e
>>> print(d)
{'spam': 1, 'eggs': 2}
Like the merge operator and list concatenation, the difference
operator requires both operands to be dicts, while the augmented
version allows any iterable of keys.
>>> d - {'spam', 'parrot'}
Traceback (most recent call last):
...
TypeError: cannot take the difference of dict and set
>>> d -= {'spam', 'parrot'}
>>> print(d)
{'eggs': 2}
Semantics
---------
For the merge operator, if a key appears in both operands, the
last-seen value (i.e. that from the right-hand operand) wins. This
shows that dict addition is not commutative, in general ``d + e`` will
not equal ``e + d``. This joins a number of other non-commutative
addition operators among the builtins, including lists, tuples,
strings and bytes.
Having the last-seen value wins makes the merge operator match the
semantics of the ``update`` method, so that ``d + e`` is an operator
version of ``d.update(e)``.
The error messages shown above are not part of the API, and may change
at any time.
Rejected semantics
~~~~~~~~~~~~~~~~~~
Rejected alternatives semantics for ``d + e`` include:
- Add only new keys from ``e``, without overwriting existing keys in
``d``. This may be done by reversing the operands ``e + d``, or
using dict difference first, ``d + (e - d)``. The later is
especially useful for the in-place version ``d += (e - d)``.
- Raise an exception if there are duplicate keys. This seems
unnecessarily restrictive and is not likely to be useful in
practice. For example, updating default configuration values with
user-supplied values would most often fail under the requirement
that keys are unique::
prefs = site_defaults + user_defaults + document_prefs
- Add the values of d2 to the corresponding values of d1. This is the
behaviour implemented by ``collections.Counter``.
Syntax
------
An alternative to the ``+`` operator is the pipe ``|`` operator, which
is used for set union. This suggestion did not receive much support
on Python-Ideas.
The ``+`` operator was strongly preferred on Python-Ideas.[1] It is
more familiar than the pipe operator, matches nicely with ``-`` as a
pair, and the Counter subclass already uses ``+`` for merging.
Current Alternatives
--------------------
To create a new dict containing the merged items of two (or more)
dicts, one can currently write::
{**d1, **d2}
but this is neither obvious nor easily discoverable. It is only
guaranteed to work if the keys are all strings. If the keys are not
strings, it currently works in CPython, but it may not work with other
implementations, or future versions of CPython[2].
It is also limited to returning a built-in dict, not a subclass,
unless re-written as ``MyDict(**d1, **d2)``, in which case non-string
keys will raise TypeError.
There is currently no way to perform dict subtraction except through a
manual loop.
Implementation
--------------
The implementation will be in C. (The author of this PEP would like
to make it known that he is not able to write the implementation.)
An approximate pure-Python implementation of the merge operator will
be::
def __add__(self, other):
if isinstance(other, dict):
new = type(self)() # May be a subclass of dict.
new.update(self)
new.update(other)
return new
return NotImplemented
def __radd__(self, other):
if isinstance(other, dict):
new = type(other)()
new.update(other)
new.update(self)
return new
return NotImplemented
Note that the result type will be the type of the left operand; in the
event of matching keys, the winner is the right operand.
Augmented assignment will just call the ``update`` method. This is
analogous to the way ``list +=`` calls the ``extend`` method, which
accepts any iterable, not just lists.
def __iadd__(self, other):
self.update(other)
An approximate pure-Python implementation of the difference operator will be::
def __sub__(self, other):
if isinstance(other, dict):
new = type(self)()
for k in self:
if k not in other:
new[k] = self[k]
return new
return NotImplemented
def __rsub__(self, other):
if isinstance(other, dict):
new = type(other)()
for k in other:
if k not in self:
new[k] = other[k]
return new
return NotImplemented
Augmented assignment will operate on equivalent terms to ``update``.
If the operand has a key method, it will be used, otherwise the
operand will be iterated over::
def __isub__(self, other):
if hasattr(other, 'keys'):
for k in other.keys():
if k in self:
del self[k]
else:
for k in other:
if k in self:
del self[k]
These semantics are intended to match those of ``update`` as closely
as possible. For the dict built-in itself, calling ``keys`` is
redundant as iteration over a dict iterates over its keys; but for
subclasses or other mappings, ``update`` prefers to use the keys
method.
.. attention:: The above paragraph may be inaccurate.
Although the dict docstring states that ``keys``
will be called if it exists, this does not seem to
be the case for dict subclasses. Bug or feature?
Contra-indications
------------------
(Or when to avoid using these new operators.)
For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will
suffer from the same unfortunate O(N\*\*2) Big Oh performance as does
list and tuple addition, and for similar reasons. If one expects to
be merging a large number of dicts where performance is an issue, it
may be better to use an explicit loop and in-place merging::
new = {}
for d in many_dicts:
new += d
This is unlikely to be a problem in practice as most uses of the merge
operator are expected to only involve a small number of dicts.
Similarly, most uses of list and tuple concatenation only use a few
objects.
Using the dict augmented assignment operators on a dict inside a tuple
(or other immutable data structure) will lead to the same problem that
occurs with list concatenation[3], namely the in-place addition will
succeed, but the operation will raise an exception.
>>> a_tuple = ({'spam': 1, 'eggs': 2}, None)
>>> a_tuple[0] += {'spam': 999}
Traceback (most recent call last):
...
TypeError: 'tuple' object does not support item assignment
>>> a_tuple[0]
{'spam': 999, 'eggs': 2}
Similar remarks apply to the ``-`` operator.
Other discussions
-----------------
`Latest discussion which motivated this PEP
<https://mail.python.org/pipermail/python-ideas/2019-February/055509.html>`_
`Ticket on the bug tracker <https://bugs.python.org/issue36144>`_
`A previous discussion
<https://mail.python.org/pipermail/python-ideas/2015-February/031748.html>`_
and `commentary on it <https://lwn.net/Articles/635397/>`_. Note that
the author of this PEP was skeptical of this proposal at the time.
`How to merge dictionaries
<https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/>`_
in idiomatic Python.
Open questions
--------------
Should these operators be part of the ABC ``Mapping`` API?
References
----------
[1] Guido's declaration that plus wins over pipe:
https://mail.python.org/pipermail/python-ideas/2019-February/055519.html
[2] Non-string keys: https://bugs.python.org/issue35105 and
https://mail.python.org/pipermail/python-dev/2018-October/155435.html
[3] Behaviour in tuples:
https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
Copyright
---------
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: