python-peps/pep-0584.rst

1103 lines
36 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 584
Title: Add + and += operators to the built-in dict class.
Version: $Revision$
Last-Modified: $Date$
Author: Steven D'Aprano <steve@pearwood.info>,
Brandt Bucher <brandtbucher@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Mar-2019
Post-History: 01-Mar-2019, 16-Oct-2019
Abstract
--------
This PEP suggests adding merge (``+``) and update (``+=``) operators to
the built-in ``dict`` class.
The new operators will have the same relationship to the ``dict.update`` method
as the list concatenate and extend operators have to ``list.extend``.
Examples
--------
Dict addition will return a new dict containing the left operand
merged with the right operand::
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d + e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e + d
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
The augmented assignment version operates in-place::
>>> d += e
>>> d
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Analogously with list addition, the operator version is more
restrictive, and requires that both arguments are dicts, while the
augmented assignment version allows anything the ``update`` method
allows, such as iterables of key/value pairs. Continued from above::
>>> d + [('spam', 999)]
Traceback (most recent call last):
...
TypeError: can only merge dict (not "list") to dict
>>> d += [('spam', 999)]
>>> d
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Semantics
---------
For the merge operator, if a key appears in both operands, the
last-seen value (i.e. that from the right-hand operand) wins. This
shows that dict addition is not commutative, in general ``d + e`` will
not equal ``e + d``. This joins a number of other non-commutative
addition operators among the builtins, including lists, tuples,
strings and bytes.
Having the last-seen value wins makes the merge operator match the
semantics of the ``update`` method, so that ``d += e`` is an operator
version of ``d.update(e)``.
The error messages shown above are not part of the API, and may change
at any time.
Rejected semantics
~~~~~~~~~~~~~~~~~~
What should happen when keys conflict? There are at least five obvious
things that could happen:
1. **Right-most (last seen) value wins.** This matches the existing
behaviour of similar dict expressions, where the last seen
value always wins::
{'a': 1, 'a': 2} # last seen value wins
{**d, **e} # values in e win over values in d
d.update(e) # values in e overwrite existing values
d[k] = v # v over-writes any existing value
{k:v for x in (d, e) for (k, v) in x.items()}
all follow the same rule. This PEP takes the position that this
behaviour is simple, obvious, and usually the behaviour we want,
and should be the default behaviour for dicts.
2. **Raise on conflicting keys.** It isn't clear that this behaviour has
many use-cases or will be often useful, but it will likely be annoying
as any use of the dict plus operator would have to be guarded with a
``try...except`` clause.
But for those who want it, it would be easy to override the method in a
subclass to get the desired behaviour::
def __add__(self, other):
if self.keys() & other.keys():
raise KeyError('duplicate key')
return super().__add__(other)
3. **Add the values (as Counter does).** Too specialised to be used as
the default behaviour.
4. **First seen wins: value-preserving semantics.** It isn't clear that
this behaviour has many use-cases. Probably too specialised to be used
as the default, best to leave this for subclasses, or simply reverse
the order of the arguments::
# d1 merged with d2, keeping existing values in d1
d2 + d1
5. **Concatenate values in a list**::
{'a': 1} + {'a': 2} == {'a': [1, 2]}
This is likely to be too specialised to be the default. It is not clear
what to do if the values are already lists::
{'a': [1, 2]} + {'a': [3, 4]}
Should this give ``{'a': [1, 2, 3, 4]}`` or ``{'a': [[1, 2], [3, 4]]}``?
To summarise this section: this PEP proposes option 1, **last seen wins** as
the default behaviour, with alternatives left to subclasses of dict.
Syntax
------
An alternative to the ``+`` operator is the pipe ``|`` operator, which
is used for set union. This suggestion did not receive much support
on Python-Ideas.
The ``+`` operator was strongly preferred on Python-Ideas[1]. It is
more familiar than the pipe operator, and the ``collections.Counter`` subclass
already uses ``+`` for merging.
Use of plus sign
~~~~~~~~~~~~~~~~
In mathematics, there is a strong but not universal convention for the
``+`` sign to be used for operations that are commutative, such as
ordinary addition: ``2 + 6 == 6 + 2``. Three prominent exceptions are
concatenation, `near rings <https://en.wikipedia.org/wiki/Near-ring>`_
and `ordinal arithmetic <https://en.wikipedia.org/wiki/Ordinal_arithmetic>`_.
The ``+`` sign is occasionally used for the
`disjoint union <https://en.wikipedia.org/wiki/Disjoint_union>`_
set operation.
In `boolean algebra <https://www.nayuki.io/page/boolean-algebra-laws>`_,
logical disjunction ``OR`` normally uses either the ```` or ``+``
symbol. George Boole originally used ``+`` for ``XOR``, but in modern
notation ```` is normally used instead. Logical disjunction is analogous
to set union, usually spelled as ````. In Python, set union uses the
``|`` operator, suggesting an alternative operator instead of ``+``. See
section "Alternative proposals".
In English, "plus" can be used as an additive conjunction similar to *and*,
*together with*, *in addition to* etc:
"The function of education is to teach one to think intensively and
to think critically. Intelligence plus character - that is the goal
of true education." -- Martin Luther King, Jr.
This suggests a connection to *union* or *merge*, not just numeric addition.
Partial survey of other languages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Inada Naoki did `a survey of language support for dict merging <https://discuss.python.org/t/pep-584-survey-of-other-languages-operator-overload/977>`_
and found that Scala uses ``++`` and Kotlin uses ``+``.
An example of dict joining from Kotlin::
fun main() {
var a = mutableMapOf<String,Int>("a" to 1, "b" to 2)
var b = mutableMapOf<String,Int>("c" to 1, "b" to 3)
println(a)
println(b)
println(a + b)
println(b + a)
}
which gives the output::
{a=1, b=2}
{c=1, b=3}
{a=1, b=3, c=1}
{c=1, b=2, a=1}
YAML uses ``<<`` as the `dict merge operator <https://yaml.org/type/merge.html>`_.
`Elixir <https://hexdocs.pm/elixir/Map.html>`_ uses ``|`` to update mappings::
iex> map = %{one: 1, two: 2}
iex> %{map | one: "one"}
%{one: "one", two: 2}
but has the restriction that keys on the right hand side of the ``|`` symbol
must already exist in the map on the left.
`Groovy <https://stackoverflow.com/questions/13326943/does-groovy-have-method-to-merge-2-maps>`_
uses ``+`` to merge two maps into a new map, or ``<<`` to merge the second
into the first.
Current Alternatives
--------------------
To create a new dict containing the merged items of two (or more)
dicts, one can currently write::
{**d1, **d2}
but this is neither obvious nor easily discoverable. It is only
guaranteed to work if the keys are all strings. If the keys are not
strings, it currently works in CPython, but it may not work with other
implementations, or future versions of CPython[2].
It is also limited to returning a built-in dict, not a subclass,
unless re-written as ``MyDict(**d1, **d2)``, in which case non-string
keys will raise a TypeError.
Alternative Proposals
---------------------
At the time of writing the initial version of this PEP, ``+`` was by far the
most popular choice for operator. However further discussion found that many
people are deeply uncomfortable or outright hostile to using the plus symbol,
preferring an alternative.
Use the Pipe operator
~~~~~~~~~~~~~~~~~~~~~
Many people who like the proposed functionality strongly dislike the ``+``
operator but prefer the ``|`` operator.
Advantages
* Avoids the frequent objections to ``+``.
* Similar to the use of ``|`` for set union.
* Using ``|`` leaves the door open for dicts to support the full set API.
Disadvantages
* Using ``|`` encourages people to suggest dicts should support the full
set API.
* Not as intuitive or obvious as ``+``.
* Like ``+`` the union operator ``|`` is normally commutative. But many
people seem to be less disturbed by the idea of using ``|`` for a
non-commutative operation than they are by the idea of using ``+``.
* `Mike Selik and Guido van Rossum
<https://mail.python.org/archives/list/python-ideas@python.org/message/PL3OWY7MIYKAJGXXBTDTLNAREBP2OCZY/>`_
summarized the advantages of ``+`` over ``|``
- Plus is already used in contexts where the operation is not symmetric
such as concatentation; the pipe operator is always symmetric.
- The dict subclass ``collections.Counter`` already implements plus as a
merge operator, treating it as equivalent to ``update``.
Use the Left Shift operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``<<`` operator didn't seem to get much support on Python-Ideas, but no
major objections either. Perhaps the strongest objection was Chris Angelico's
comment
The "cuteness" value of abusing the operator to indicate
information flow got old shortly after C++ did it.
Use a new Left Arrow operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another suggestion was to create a new operator ``<-``. Unfortunately
this would be ambiguous, ``d<-e`` could mean ``d merge e`` or
``d less-than minus e``.
Use a merged method instead of an operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A ``dict.merged()`` method would avoid the need for an operator at all. One
subtlety is that it would likely need slightly different implementations
when called as an unbound method versus as a bound method.
As an unbound method, the behaviour could be similar to::
def merged(cls, *mappings, **kw):
new = cls() # Will this work for defaultdict?
for m in mappings:
new.update(m)
new.update(kw)
return new
As a bound method, the behaviour could be similar to::
def merged(self, *mappings, **kw):
new = self.copy()
for m in mappings:
new.update(m)
new.update(kw)
return new
Advantages
* Arguably, methods are more discoverable than operators.
* The method could accept any number of positional and keyword arguments,
avoiding the inefficiency of creating temporary dicts.
* Accepts sequences of ``(key, value)`` pairs like the ``update`` method.
* Being a method, it is easily to override in a subclass if you need
alternative behaviours such as "first wins", "unique keys", etc.
Disadvantages
* Would likely require a new kind of method decorator which combined the
behaviour of regular instance methods and ``classmethod``. It would need
to be public (but not necessarily a builtin) for those needing to override
the method. There is a `proof of concept <http://code.activestate.com/recipes/577030>`_.
* It isn't an operator. Guido discusses `why operators are useful
<https://mail.python.org/archives/list/python-ideas@python.org/message/52DLME5DKNZYFEETCTRENRNKWJ2B4DD5/>`_.
For another viewpoint, see `Nick Coghlan's blog post
<https://www.curiousefficiency.org/posts/2019/03/what-does-x-equals-a-plus-b-mean.html>`_.
Use a merged function
~~~~~~~~~~~~~~~~~~~~~
Instead of a method, use a new built-in function ``merged()``. One possible
implementation could be something like this::
def merged(*mappings, **kw):
if mappings and isinstance(mappings[0], dict):
# If the first argument is a dict, use its type.
new = mappings[0].copy()
mappings = mappings[1:]
else:
# No positional arguments, or the first argument is a
# sequence of (key, value) pairs.
new = dict()
for m in mappings:
new.update(m)
new.update(kw)
return new
Disadvantages
* May not be important enough to be a builtin.
* Hard to override behaviour if you need something like "first wins".
An alternative might be to forgo the arbitrary keywords, and take a single
keyword parameter that specifies the behaviour on collisions::
def merged(*mappings, *, on_collision=lambda k, v1, v2: v2):
# implementation left as an exercise to the reader
Advantages
* Most of the same advantages of the method or function solutions above.
* Doesn't require a subclass to implement alternative behaviour on collisions,
just a function.
Disadvantages
* Same as function above.
* Cannot use arbitrary keyword arguments.
Do nothing
~~~~~~~~~~
"Status quo wins a stalemate."
We could do nothing, as there are already three possible ways to solve the
problem of merging two dicts:
* ``dict.update``.
* Dict unpacking using ``{**d1, **d2}``.
* Chain maps.
Advantage
* Nothing needs to change.
Disadvantages
* None of the three alternatives match the desired behaviour:
- ``d1.update(d2)`` modifies the first mapping in place.
- ``e = d1.copy(); e.update(d2)`` is not an expression and needs a temporary
variable.
- ``{**d1, **d2}`` ignores the types of the mappings and always returns a
builtin dict.
- Dict unpacking looks ugly and is not easily discoverable. Few people would
be able to guess what it means the first time they see it, or think of it
as the "obvious way" to merge two dicts.
`As Guido said
<https://mail.python.org/archives/list/python-ideas@python.org/message/K4IC74IXE23K4KEL7OUFK3VBC62HGGVF/>`_:
"I'm sorry for PEP 448, but even if you know about ``**d`` in simpler
contexts, if you were to ask a typical Python user how to combine two
dicts into a new one, I doubt many people would think of ``{**d1, **d2}``.
I know I myself had forgotten about it when this thread started!"
- ``type(d1)({**d1, **d2})`` fails for dict subclasses such as
``defaultdict`` that have an incompatible ``__init__`` method.
- ChainMap is unfortunately poorly-known and doesn't qualify as "obvious".
- ChainMap resolves duplicate keys in the opposite order to that expected
("first seen wins" instead of "last seen wins").
- Like dict unpacking, it is tricky to get it to honour the desired subclass,
for the same reason, ``type(d1)(ChainMap(d2, d1))`` fails for some
subclasses of dict.
- ChainMaps wrap their underlying dicts, so writes to the ChainMap will
modify the original dict::
>>> d1 = {'spam': 1}
>>> d2 = {'eggs': 2}
>>> merged = ChainMap(d2, d1)
>>> merged['eggs'] = 999
>>> d2
{'eggs': 999}
Implementation
--------------
One of the authors has `drafted a C implementation
<https://github.com/brandtbucher/cpython/tree/addiction>`_.
An approximate pure-Python implementation of the merge operator will
be::
def __add__(self, other):
if not isinstance(other, dict):
return NotImplemented
new = self.copy()
new.update(other)
return new
def __radd__(self, other):
if not isinstance(other, dict):
return NotImplemented
new = other.copy()
new.update(self)
return new
Note that the result type will be the type of the left operand; in the
event of matching keys, the winner is the right operand.
Augmented assignment will just call the ``update`` method. This is
analogous to the way ``list +=`` calls the ``extend`` method, which
accepts any iterable, not just lists::
def __iadd__(self, other):
self.update(other)
return self
These semantics are intended to match those of ``update`` as closely
as possible.
Contra-indications
------------------
(Or when to avoid using these new operators.)
For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will
suffer from the same unfortunate O(N\*\*2) Big Oh performance as does
list and tuple addition, and for similar reasons. If one expects to
be merging a large number of dicts where performance is an issue, it
may be better to use an explicit loop and in-place merging::
new = {}
for d in many_dicts:
new += d
This is unlikely to be a problem in practice as most uses of the merge
operator are expected to only involve a small number of dicts.
Similarly, most uses of list and tuple concatenation only use a few
objects.
Using the dict augmented assignment operators on a dict inside a tuple
(or other immutable data structure) will lead to the same problem that
occurs with list concatenation[3], namely the in-place addition will
succeed, but the operation will raise an exception::
>>> a_tuple = ({'spam': 1, 'eggs': 2}, None)
>>> a_tuple[0] += {'spam': 999}
Traceback (most recent call last):
...
TypeError: 'tuple' object does not support item assignment
>>> a_tuple[0]
{'spam': 999, 'eggs': 2}
Major Objections
----------------
Dict addition is not commutative
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Addition is commutative, but dict addition will not be (``d + e != e + d``).
Response:
* Neither are list or string concatentation, both of which use the ``+``
operator.
* Dict addition (merge/update) is commutative with regard to the keys (although
not with regard to the values).
* Mathematically, the + operator is usually commutative, but it is not
mandatory. Perhaps the best known example of non-commutative addition
is that of `ordinal numbers
<https://en.wikipedia.org/wiki/Ordinal_arithmetic>`_, where ``ω + 1`` is a
strictly larger ordinal than ``ω`` but ``1 + ω = ω``.
* For non-numbers, `we only require addition to be associative
<https://mail.python.org/archives/list/python-ideas@python.org/message/TZ5POQOB7KTUWQQPLNIC323ZIWOCWHBF/>`_,
that is, ``a + b + c == (a + b) + c == a + (b + c)``. This is satisfied by
the proposed dict merging behaviour.
Dict addition will be inefficient
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Giving a plus-operator to mappings is an invitation to writing code that
doesn't scale well. Repeated dict addition is inefficient:
``d + e + f + g + h`` creates and destroys three temporary mappings.
Response:
* The same argument applies to sequence concatenation. Unlike string
concatenation, it is rare for people to concatenate large numbers of lists or
tuples, and the authors of this PEP believe that it will be rare for people
to add large numbers of dicts.
* A survey of the standard library by the authors found no examples of merging
more than two dicts. This is unlikely to be a performance problem:
"Everything is fast for small enough N".
* ``collections.Counter`` is a dict subclass that supports the ``+`` operator.
There are no known examples of people having performance issues due to adding
large numbers of Counters.
* Sequence concatenation grows with the total number of items in the sequences,
leading to O(N**2) (quadratic) performance. Dict addition is likely to
involve duplicate keys, and so the temporary mappings will not grow as fast.
Repeated addition should be equivalent to multiplication
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The star operator ``*`` represents repeated addition across multiple data
types. ``a * 5 == a + a + a + a + a`` where ``a`` is a number (int, float,
complex) str, bytes, tuple, or list. Dict addition breaks this pattern.
Response:
* "Multiplication is repeated addition" only applies to positive integer
arguments, and breaks down as soon as you start to consider signed or
non-integer multiplicands. Consider ``a * -3.5`` -- how do you interpret
that as ``a`` added to itself negative three and a half times?
Teaching multiplication as repeated addition is something that many educators
and mathematicians stongly oppose. Some discussion on the issue:
- https://www.maa.org/external_archive/devlin/devlin_06_08.html
- https://www.maa.org/external_archive/devlin/devlin_0708_08.html
- https://math.stackexchange.com/questions/64488/if-multiplication-is-not-repeated-addition
- https://denisegaskins.com/2008/07/01/if-it-aint-repeated-addition/
- https://en.wikipedia.org/wiki/Multiplication_and_repeated_addition
* ``collections.Counter`` already supports addition, and already breaks this
pattern.
* Even if we find it useful to demonstrate "multiplication is (sometimes)
repeated addition" for sequences and numbers, that doesn't make it mandatory
for all data types. It is a very weak "nice to have", not a "must have".
Dict addition is lossy
~~~~~~~~~~~~~~~~~~~~~~
Dict addition can lose data (values may disappear); no other form of addition
is lossy.
Response:
* It isn't clear why the first part of this argument is a problem.
``dict.update()`` may throw away values, but not keys; that is expected
behaviour, and will remain expected behaviour regardless of whether it is
spelled as ``update()`` or ``+``.
* Floating point addition is lossy in the sense that the result may depend on
only one addend, even when the other is non-zero::
>>> 1e20 + 1000.1
1e+20
* Integer addition and concatenation are also lossy, in the sense of not being
reversable: you cannot get back the two addends given only the sum.
Two numbers add to give 356; what are the two numbers?
Dict contains tests will fail
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The invariant ``a in (a + b)`` holds for collections like list, tuple, str, and
bytes.
Response:
* This invariant only applies when ``+`` implements concatenation, not numeric
addition, boolean AND, or set union. There is no reason to expect it to
apply to dict union/merge.
* This invariant doesn't apply to other collections, such as arrays, deques or
Counters. For example::
>>> from array import array
>>> a = array("i", [1, 2, 3])
>>> b = array("i", [4, 5, 6])
>>> a in (a + b)
False
Only One Way To Do It
~~~~~~~~~~~~~~~~~~~~~
Dict addition will violate the Only One Way koan from the Zen.
Response:
* There is no such koan. "Only One Way" is a calumny about Python originating
long ago from the Perl community.
More Than One Way To Do It
~~~~~~~~~~~~~~~~~~~~~~~~~~
Okay, the Zen doesn't say that there should be Only One Way To Do It. But it
does have a prohibition against allowing "more than one way to do it".
Response:
* There is no such prohibition. The "Zen of Python" merely expresses a
*preference* for "only one *obvious* way"::
There should be one-- and preferably only one --obvious way to do it.
* The emphasis here is that there should be an obvious way to do "it". In the
case of dict update operations, there are at least two different operations
that we might wish to do:
- *update a dict in place*, in which place the Obvious Way is to use the
``update()`` method. If this proposal is accepted, the ``+=`` augmented
assignment operator will also work, but that is a side-effect of how
augmented assignments are defined. Which you choose is a matter of taste.
- *merge two existing dicts into a third, new dict*, in which case this PEP
proposes that the Obvious Way is to use the ``+`` merge operator.
* In practice, this preference for "only one way" is frequently violated in
Python. For example, every for loop could be re-written as a while loop;
every if-expression could be written as an if-else statement. List, set and
dict comprehensions could all be replaced by generator comprehensions. Lists
offer no fewer than five ways to implement concatenation:
- Addition operator: ``a + b``
- In-place addition operator: ``a += b``
- Slice assignment: ``a[len(a):] = b``
- Sequence unpacking: ``[*a, *b]``
- Extend method: ``a.extend(b)``
We should not be too strict about rejecting useful functionality because it
violates "only one way".
Dict addition is not like concatenation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dict addition is not like concatenation, which obeys the invariant
``len(d + e) == len(d) + len(e)``.
Response:
* Numeric, vector and matrix addition don't obey this invariant either, it
isn't clear why dict *merging* should be expected to obey it. And in the
case of numeric addition, ``len(x)`` is not even defined.
* In a sense, dict addition can be considered to be a kind of concatenation::
{a:v, b:v, c:v} + {b:v, c:v, d:v} => {a:v, b:v, c:v, b:v, c:v, d:v}
Since dicts don't take duplicate keys, standard dict behaviour occurs and the
last-seen (right-most) wins.
Dict addition makes code harder to understand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dict addition makes it harder to tell what code means. To paraphrase the
objection rather than quote anyone in specific: "If I see ``spam + eggs``,
I can't tell what it does unless I know what ``spam`` and ``eggs`` are".
Response:
* This is very true. But it is equally true today, where the use of the ``+``
operator could mean any of:
- numeric addition
- sequence concatenation
- ``Counter`` merging
- any other overloaded operation
Adding dict merging to the set of possibilities doesn't seem to make it
*harder* to understand the code. No more work is required to determine that
``spam`` and ``eggs`` are mappings than it would take to determine that they
are lists, or numbers. And good naming conventions will help::
width + margin # probably numeric addition
prefix + word # probably string concatenation
settings + user_prefs # probably mapping addition
What about the full set API?
----------------------------
Some people have suggested that dicts are "set like", and should support the
full collection of set operators ``|``, ``&``, ``^`` and ``-``.
This PEP does not take a position on whether dicts should support the full
collection of set operators, and would prefer to leave that for a later PEP
(one of the authors is interested in drafting such a PEP). For the benefit of
any later PEP, a brief summary follows.
Set union, ``|``, has a natural analogy to dict update operation, and the pipe
operator is strongly prefered over ``+`` by many people. As described in the
section "Rejected semantics", the most natural behaviour is for the last value
seen to win.
Set intersection ``&`` is more problematic. While it is easy to determine the
intersection of *keys* in two dicts, it is not clear what to do with the
*values*. For example, given two dicts::
d1 = {"spam": 1, "eggs": 2}
d2 = {"ham": 3, "eggs": 4}
it is obvious that the only key of ``d1 & d2`` must be ``"eggs"``. But there
are at least five obvious ways to choose the values:
- first (left-most) value wins: ``2``
- last (right-most) value wins: ``4``
- add/concatenate the values: ``6``
- keep a list of both values: ``[2, 4]``
- raise an exception
but none of them are obviously correct or more useful than the others. "Last
seen wins" has the advantage of consistency with union, but it isn't clear if
that alone is reason enough to choose it.
Set symmetric difference ``^`` is also obvious and natural. Given the two
dicts above, the symmetric difference ``d1 ^ d2`` would be
``{"spam": 1, "ham": 3}``.
Set difference ``-`` is also obvious and natural, and an earlier version of
this PEP included it in the proposal. Given the dicts above, we would have
``d1 - d2`` return ``{"spam": 1}`` and ``d2 - d1`` return ``{"ham": 1}``.
Examples of candidates for the dict merging operator
----------------------------------------------------
The authors of this PEP did a survey of third party libraries for dictionary
merging which might be candidates for dict addition.
(This is a cursory list based on a subset of whatever arbitrary third-party
packages happened to be installed on the author's computer, and may not reflect
the current state of any package.)
From **sympy/abc.py**::
clash = {}
clash.update(clash1)
clash.update(clash2)
return clash1, clash2, clash
Rewrite as ``return clash1, clash2, clash1+clash2``.
From **sympy/utilities/runtests.py**::
globs = globs.copy()
if extraglobs is not None:
globs.update(extraglobs)
Rewrite as ``globs = globs + ({} if extraglobs is None else extraglobs)``
From **sympy/vector/functions.py**::
subs_dict = {}
for f in system_set:
subs_dict.update(f.scalar_map(system))
return expr.subs(subs_dict)
Rewrite as
``return expr.subs(sum((f.scalar_map(system) for f in system_set), {}))``
From **sympy/printing/fcode.py** and **sympy/printing/ccode.py**::
self.known_functions = dict(known_functions)
userfuncs = settings.get('user_functions', {})
self.known_functions.update(userfuncs)
Rewrite as
``self.known_functions = dict(known_functions) + settings.get('user_functions', {})``
From **sympy/parsing/maxima.py**::
dct = MaximaHelpers.__dict__.copy()
dct.update(name_dict)
obj = sympify(str, locals=dct)
Rewrite as ``obj = sympify(str, locals= MaximaHelpers.__dict__ + name_dict)``
From **sphinx/quickstart.py**::
d.setdefault('release', d['version'])
d2 = DEFAULT_VALUE.copy()
d2.update(dict(("ext_"+ext, False) for ext in EXTENSIONS))
d2.update(d)
d = d2
Rewrite as
``d = DEFAULT_VALUE + dict(("ext_"+ext, False) for ext in EXTENSIONS) + d``
From **sphinx/highlighting.py**::
def get_formatter(self, **kwargs):
kwargs.update(self.formatter_args)
return self.formatter(**kwargs)
Rewrite as ``return self.formatter(**(kwargs + self.formatter_args))``
From **sphinx/ext/inheritance_diagram.py**::
n_attrs = self.default_node_attrs.copy()
e_attrs = self.default_edge_attrs.copy()
g_attrs.update(graph_attrs)
n_attrs.update(node_attrs)
e_attrs.update(edge_attrs)
Rewrite as::
g_attrs.update(graph_attrs)
n_attrs = self.default_node_attrs + node_attrs
e_attrs = self.default_edge_attrs + edge_attrs
From **sphinx/ext/doctest.py**::
new_opt = code[0].options.copy()
new_opt.update(example.options)
example.options = new_opt
Rewrite as ``example.options = code[0].options + example.options``
From **sphinx/domains/__init__.py**::
self.attrs = self.known_attrs.copy()
self.attrs.update(attrs)
Rewrite as ``self.attrs = self.known_attrs + attrs``
From **requests/sessions.py**::
merged_setting = dict_class(to_key_val_list(session_setting))
merged_setting.update(to_key_val_list(request_setting))
Rewrite as
``merged_setting = dict_class(to_key_val_list(session_setting)) + to_key_val_list(request_setting)``
From **matplotlib/legend.py**::
if self._handler_map:
hm = default_handler_map.copy()
hm.update(self._handler_map)
return hm
Rewrite as ``return default_handler_map + self._handler_map``
From **pygments/lexer.py**::
if kwargs:
kwargs.update(lexer.options)
lx = lexer.__class__(**kwargs)
Rewrite as ``lx = lexer.__class__(**(kwargs + lexer.options))``
From **praw/internal.py**::
data = {'name': six.text_type(user), 'type': relationship}
data.update(kwargs)
Rewrite as
``data = {'name': six.text_type(user), 'type': relationship} + kwargs``
From **IPython/zmq/ipkernel.py**::
aliases = dict(kernel_aliases)
aliases.update(shell_aliases)
Rewrite as ``aliases = dict(kernel_aliases) + shell_aliases``
From **matplotlib/backends/backend_svg.py**::
attrib = attrib.copy()
attrib.update(extra)
attrib = attrib.items()
Rewrite as ``attrib = (attrib + extra).items()``
From **matplotlib/delaunay/triangulate.py**::
edges = {}
edges.update(dict(zip(self.triangle_nodes[border[:,0]][:,1],
self.triangle_nodes[border[:,0]][:,2])))
edges.update(dict(zip(self.triangle_nodes[border[:,1]][:,2],
self.triangle_nodes[border[:,1]][:,0])))
edges.update(dict(zip(self.triangle_nodes[border[:,2]][:,0],
self.triangle_nodes[border[:,2]][:,1])))
Rewrite as::
edges = (dict(zip(self.triangle_nodes[border[:,0]][:,1],
self.triangle_nodes[border[:,0]][:,2]))
+ dict(zip(self.triangle_nodes[border[:,1]][:,2],
self.triangle_nodes[border[:,1]][:,0]))
+ dict(zip(self.triangle_nodes[border[:,2]][:,0],
self.triangle_nodes[border[:,2]][:,1]))
)
From **numpy/ma/core.py**::
# We need to copy the _basedict to avoid backward propagation
_optinfo = {}
_optinfo.update(getattr(obj, '_optinfo', {}))
_optinfo.update(getattr(obj, '_basedict', {}))
if not isinstance(obj, MaskedArray):
_optinfo.update(getattr(obj, '__dict__', {}))
Rewrite as::
_optinfo = getattr(obj, '_optinfo', {}) + getattr(obj, '_basedict', {})
if not isinstance(obj, MaskedArray):
_optinfo += getattr(obj, '__dict__', {})
The above examples show that sometimes the ``+`` operator leads to a clear
increase in readability, reducing the number of lines of code and improving
clarity. However other examples using the ``+`` operator lead to long, complex
single expressions, possibly well over the PEP 8 maximum line length of 80
columns. As with any other language feature, the programmer should use their
own judgement about whether ``+`` improves their code.
Related discussions
-------------------
`Latest discussion which motivated this PEP
<https://mail.python.org/archives/list/python-ideas@python.org/thread/BHIJX6MHGMMD3S6D7GVTPZQL4N5V7T42>`_
`Ticket on the bug tracker <https://bugs.python.org/issue36144>`_
Merging two dictionaries in an expression is a frequently requested feature.
For example:
https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression
https://stackoverflow.com/questions/1781571/how-to-concatenate-two-dictionaries-to-create-a-new-one-in-python
https://stackoverflow.com/questions/6005066/adding-dictionaries-together-python
Occasionally people request alternative behaviour for the merge:
https://stackoverflow.com/questions/1031199/adding-dictionaries-in-python
https://stackoverflow.com/questions/877295/python-dict-add-by-valuedict-2
...including one proposal to treat dicts as `sets of keys
<https://mail.python.org/archives/list/python-ideas@python.org/message/YY3KZZGEX6VEFX5QZJ33P7NTTXGPZQ7N/>`_.
`Ian Lee's proto-PEP <https://lwn.net/Articles/635444/>`_, and `discussion
<https://lwn.net/Articles/635397/>`_ in 2015. Further discussion took place on
`Python-Ideas <https://mail.python.org/archives/list/python-ideas@python.org/thread/43OZV3MR4XLFRPCI27I7BB6HVBD25M2E/>`_.
(Observant readers will notice that one of the authors of this PEP was more
skeptical of the idea in 2015.)
Adding `a full complement of operators to dicts
<https://mail.python.org/archives/list/python-ideas@python.org/thread/EKWMDJKMVOJCOROQVHJFQX7W2L55I5RA/>`_.
`Discussion on Y-Combinator <https://news.ycombinator.com/item?id=19314646>`_.
https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/
https://code.tutsplus.com/tutorials/how-to-merge-two-python-dictionaries--cms-26230
In direct response to an earlier draft of this PEP, Serhiy Storchaka raised `a
ticket in the bug tracker <https://bugs.python.org/issue36431>`_ to replace the
``copy(); update()`` idiom with dict unpacking.
Open questions
--------------
Should these operators be part of the ABC ``Mapping`` API?
References
----------
[1] Guido's declaration that plus wins over pipe:
https://mail.python.org/pipermail/python-ideas/2019-February/055519.html
[2] Non-string keys: https://bugs.python.org/issue35105 and
https://mail.python.org/pipermail/python-dev/2018-October/155435.html
[3] Behaviour in tuples:
https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
Copyright
---------
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: