1105 lines
36 KiB
ReStructuredText
1105 lines
36 KiB
ReStructuredText
PEP: 584
|
||
Title: Add + and += operators to the built-in dict class.
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Steven D'Aprano <steve@pearwood.info>,
|
||
Brandt Bucher <brandtbucher@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 01-Mar-2019
|
||
Post-History:
|
||
|
||
**DRAFT** -- This is a draft document for discussion.
|
||
|
||
Abstract
|
||
--------
|
||
|
||
This PEP suggests adding merge (``+``) and update (``+=``) operators to
|
||
the built-in ``dict`` class.
|
||
|
||
The new operators will have the same relationship to the ``dict.update`` method
|
||
as the list concatenate and extend operators have to ``list.extend``.
|
||
|
||
|
||
Examples
|
||
--------
|
||
|
||
Dict addition will return a new dict containing the left operand
|
||
merged with the right operand::
|
||
|
||
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
|
||
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
|
||
>>> d + e
|
||
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
|
||
>>> e + d
|
||
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
|
||
|
||
The augmented assignment version operates in-place::
|
||
|
||
>>> d += e
|
||
>>> d
|
||
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
|
||
|
||
Analogously with list addition, the operator version is more
|
||
restrictive, and requires that both arguments are dicts, while the
|
||
augmented assignment version allows anything the ``update`` method
|
||
allows, such as iterables of key/value pairs. Continued from above::
|
||
|
||
>>> d + [('spam', 999)]
|
||
Traceback (most recent call last):
|
||
...
|
||
TypeError: can only merge dict (not "list") to dict
|
||
|
||
>>> d += [('spam', 999)]
|
||
>>> d
|
||
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
|
||
|
||
|
||
Semantics
|
||
---------
|
||
|
||
For the merge operator, if a key appears in both operands, the
|
||
last-seen value (i.e. that from the right-hand operand) wins. This
|
||
shows that dict addition is not commutative, in general ``d + e`` will
|
||
not equal ``e + d``. This joins a number of other non-commutative
|
||
addition operators among the builtins, including lists, tuples,
|
||
strings and bytes.
|
||
|
||
Having the last-seen value wins makes the merge operator match the
|
||
semantics of the ``update`` method, so that ``d += e`` is an operator
|
||
version of ``d.update(e)``.
|
||
|
||
The error messages shown above are not part of the API, and may change
|
||
at any time.
|
||
|
||
|
||
Rejected semantics
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
What should happen when keys conflict? There are at least five obvious
|
||
things that could happen:
|
||
|
||
1. **Right-most (last seen) value wins.** This matches the existing
|
||
behaviour of similar dict expressions, where the last seen
|
||
value always wins::
|
||
|
||
{'a': 1, 'a': 2} # last seen value wins
|
||
{**d, **e} # values in e win over values in d
|
||
d.update(e) # values in e overwrite existing values
|
||
d[k] = v # v over-writes any existing value
|
||
{k:v for x in (d, e) for (k, v) in x.items()}
|
||
|
||
all follow the same rule. This PEP takes the position that this
|
||
behaviour is simple, obvious, and usually the behaviour we want,
|
||
and should be the default behaviour for dicts.
|
||
|
||
2. **Raise on conflicting keys.** It isn't clear that this behaviour has
|
||
many use-cases or will be often useful, but it will likely be annoying
|
||
as any use of the dict plus operator would have to be guarded with a
|
||
``try...except`` clause.
|
||
|
||
But for those who want it, it would be easy to override the method in a
|
||
subclass to get the desired behaviour::
|
||
|
||
def __add__(self, other):
|
||
if self.keys() & other.keys():
|
||
raise KeyError('duplicate key')
|
||
return super().__add__(other)
|
||
|
||
3. **Add the values (as Counter does).** Too specialised to be used as
|
||
the default behaviour.
|
||
|
||
4. **First seen wins: value-preserving semantics.** It isn't clear that
|
||
this behaviour has many use-cases. Probably too specialised to be used
|
||
as the default, best to leave this for subclasses, or simply reverse
|
||
the order of the arguments::
|
||
|
||
# d1 merged with d2, keeping existing values in d1
|
||
d2 + d1
|
||
|
||
5. **Concatenate values in a list**::
|
||
|
||
{'a': 1} + {'a': 2} == {'a': [1, 2]}
|
||
|
||
This is likely to be too specialised to be the default. It is not clear
|
||
what to do if the values are already lists::
|
||
|
||
{'a': [1, 2]} + {'a': [3, 4]}
|
||
|
||
Should this give ``{'a': [1, 2, 3, 4]}`` or ``{'a': [[1, 2], [3, 4]]}``?
|
||
|
||
To summarise this section: this PEP proposes option 1, **last seen wins** as
|
||
the default behaviour, with alternatives left to subclasses of dict.
|
||
|
||
|
||
Syntax
|
||
------
|
||
|
||
An alternative to the ``+`` operator is the pipe ``|`` operator, which
|
||
is used for set union. This suggestion did not receive much support
|
||
on Python-Ideas.
|
||
|
||
The ``+`` operator was strongly preferred on Python-Ideas[1]. It is
|
||
more familiar than the pipe operator, and the ``collections.Counter`` subclass
|
||
already uses ``+`` for merging.
|
||
|
||
|
||
Use of plus sign
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
In mathematics, there is a strong but not universal convention for the
|
||
``+`` sign to be used for operations that are commutative, such as
|
||
ordinary addition: ``2 + 6 == 6 + 2``. Three prominent exceptions are
|
||
concatenation, `near rings <https://en.wikipedia.org/wiki/Near-ring>`_
|
||
and `ordinal arithmetic <https://en.wikipedia.org/wiki/Ordinal_arithmetic>`_.
|
||
|
||
The ``+`` sign is occasionally used for the
|
||
`disjoint union <https://en.wikipedia.org/wiki/Disjoint_union>`_
|
||
set operation.
|
||
|
||
In `boolean algebra <https://www.nayuki.io/page/boolean-algebra-laws>`_,
|
||
logical disjunction ``OR`` normally uses either the ``∨`` or ``+``
|
||
symbol. George Boole originally used ``+`` for ``XOR``, but in modern
|
||
notation ``⊕`` is normally used instead. Logical disjunction is analogous
|
||
to set union, usually spelled as ``∪``. In Python, set union uses the
|
||
``|`` operator, suggesting an alternative operator instead of ``+``. See
|
||
section "Alternative proposals".
|
||
|
||
In English, "plus" can be used as an additive conjunction similar to *and*,
|
||
*together with*, *in addition to* etc:
|
||
|
||
"The function of education is to teach one to think intensively and
|
||
to think critically. Intelligence plus character - that is the goal
|
||
of true education." -- Martin Luther King, Jr.
|
||
|
||
This suggests a connection to *union* or *merge*, not just numeric addition.
|
||
|
||
Partial survey of other languages
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Inada Naoki did `a survey of language support for dict merging <https://discuss.python.org/t/pep-584-survey-of-other-languages-operator-overload/977>`_
|
||
and found that Scala uses ``++`` and Kotlin uses ``+``.
|
||
|
||
An example of dict joining from Kotlin::
|
||
|
||
fun main() {
|
||
var a = mutableMapOf<String,Int>("a" to 1, "b" to 2)
|
||
var b = mutableMapOf<String,Int>("c" to 1, "b" to 3)
|
||
println(a)
|
||
println(b)
|
||
println(a + b)
|
||
println(b + a)
|
||
}
|
||
|
||
which gives the output::
|
||
|
||
{a=1, b=2}
|
||
{c=1, b=3}
|
||
{a=1, b=3, c=1}
|
||
{c=1, b=2, a=1}
|
||
|
||
YAML uses ``<<`` as the `dict merge operator <https://yaml.org/type/merge.html>`_.
|
||
|
||
`Elixir <https://hexdocs.pm/elixir/Map.html>`_ uses ``|`` to update mappings::
|
||
|
||
iex> map = %{one: 1, two: 2}
|
||
iex> %{map | one: "one"}
|
||
%{one: "one", two: 2}
|
||
|
||
but has the restriction that keys on the right hand side of the ``|`` symbol
|
||
must already exist in the map on the left.
|
||
|
||
`Groovy <https://stackoverflow.com/questions/13326943/does-groovy-have-method-to-merge-2-maps>`_
|
||
uses ``+`` to merge two maps into a new map, or ``<<`` to merge the second
|
||
into the first.
|
||
|
||
|
||
Current Alternatives
|
||
--------------------
|
||
|
||
To create a new dict containing the merged items of two (or more)
|
||
dicts, one can currently write::
|
||
|
||
{**d1, **d2}
|
||
|
||
but this is neither obvious nor easily discoverable. It is only
|
||
guaranteed to work if the keys are all strings. If the keys are not
|
||
strings, it currently works in CPython, but it may not work with other
|
||
implementations, or future versions of CPython[2].
|
||
|
||
It is also limited to returning a built-in dict, not a subclass,
|
||
unless re-written as ``MyDict(**d1, **d2)``, in which case non-string
|
||
keys will raise a TypeError.
|
||
|
||
|
||
Alternative Proposals
|
||
---------------------
|
||
|
||
At the time of writing the initial version of this PEP, ``+`` was by far the
|
||
most popular choice for operator. However further discussion found that many
|
||
people are deeply uncomfortable or outright hostile to using the plus symbol,
|
||
preferring an alternative.
|
||
|
||
|
||
Use the Pipe operator
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Many people who like the proposed functionality strongly dislike the ``+``
|
||
operator but prefer the ``|`` operator.
|
||
|
||
Advantages
|
||
|
||
* Avoids the frequent objections to ``+``.
|
||
|
||
* Similar to the use of ``|`` for set union.
|
||
|
||
* Using ``|`` leaves the door open for dicts to support the full set API.
|
||
|
||
Disadvantages
|
||
|
||
* Using ``|`` encourages people to suggest dicts should support the full
|
||
set API.
|
||
|
||
* Not as intuitive or obvious as ``+``.
|
||
|
||
* Like ``+`` the union operator ``|`` is normally commutative. But many
|
||
people seem to be less disturbed by the idea of using ``|`` for a
|
||
non-commutative operation than they are by the idea of using ``+``.
|
||
|
||
* `Mike Selik and Guido van Rossum
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/message/PL3OWY7MIYKAJGXXBTDTLNAREBP2OCZY/>`_
|
||
summarized the advantages of ``+`` over ``|``
|
||
|
||
- Plus is already used in contexts where the operation is not symmetric
|
||
such as concatentation; the pipe operator is always symmetric.
|
||
|
||
- The dict subclass ``collections.Counter`` already implements plus as a
|
||
merge operator, treating it as equivalent to ``update``.
|
||
|
||
|
||
Use the Left Shift operator
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The ``<<`` operator didn't seem to get much support on Python-Ideas, but no
|
||
major objections either. Perhaps the strongest objection was Chris Angelico's
|
||
comment
|
||
|
||
The "cuteness" value of abusing the operator to indicate
|
||
information flow got old shortly after C++ did it.
|
||
|
||
|
||
Use a new Left Arrow operator
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Another suggestion was to create a new operator ``<-``. Unfortunately
|
||
this would be ambiguous, ``d<-e`` could mean ``d merge e`` or
|
||
``d less-than minus e``.
|
||
|
||
|
||
Use a merged method instead of an operator
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
A ``dict.merged()`` method would avoid the need for an operator at all. One
|
||
subtlety is that it would likely need slightly different implementations
|
||
when called as an unbound method versus as a bound method.
|
||
|
||
As an unbound method, the behaviour could be similar to::
|
||
|
||
def merged(cls, *mappings, **kw):
|
||
new = cls() # Will this work for defaultdict?
|
||
for m in mappings:
|
||
new.update(m)
|
||
new.update(kw)
|
||
return new
|
||
|
||
As a bound method, the behaviour could be similar to::
|
||
|
||
def merged(self, *mappings, **kw):
|
||
new = self.copy()
|
||
for m in mappings:
|
||
new.update(m)
|
||
new.update(kw)
|
||
return new
|
||
|
||
Advantages
|
||
|
||
* Arguably, methods are more discoverable than operators.
|
||
|
||
* The method could accept any number of positional and keyword arguments,
|
||
avoiding the inefficiency of creating temporary dicts.
|
||
|
||
* Accepts sequences of ``(key, value)`` pairs like the ``update`` method.
|
||
|
||
* Being a method, it is easily to override in a subclass if you need
|
||
alternative behaviours such as "first wins", "unique keys", etc.
|
||
|
||
Disadvantages
|
||
|
||
* Would likely require a new kind of method decorator which combined the
|
||
behaviour of regular instance methods and ``classmethod``. It would need
|
||
to be public (but not necessarily a builtin) for those needing to override
|
||
the method. There is a `proof of concept <http://code.activestate.com/recipes/577030>`_.
|
||
|
||
* It isn't an operator. Guido discusses `why operators are useful
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/message/52DLME5DKNZYFEETCTRENRNKWJ2B4DD5/>`_.
|
||
For another viewpoint, see `Nick Coghlan's blog post
|
||
<https://www.curiousefficiency.org/posts/2019/03/what-does-x-equals-a-plus-b-mean.html>`_.
|
||
|
||
|
||
Use a merged function
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Instead of a method, use a new built-in function ``merged()``. One possible
|
||
implementation could be something like this::
|
||
|
||
def merged(*mappings, **kw):
|
||
if mappings and isinstance(mappings[0], dict):
|
||
# If the first argument is a dict, use its type.
|
||
new = mappings[0].copy()
|
||
mappings = mappings[1:]
|
||
else:
|
||
# No positional arguments, or the first argument is a
|
||
# sequence of (key, value) pairs.
|
||
new = dict()
|
||
for m in mappings:
|
||
new.update(m)
|
||
new.update(kw)
|
||
return new
|
||
|
||
|
||
Disadvantages
|
||
|
||
* May not be important enough to be a builtin.
|
||
|
||
* Hard to override behaviour if you need something like "first wins".
|
||
|
||
|
||
An alternative might be to forgo the arbitrary keywords, and take a single
|
||
keyword parameter that specifies the behaviour on collisions::
|
||
|
||
def merged(*mappings, *, on_collision=lambda k, v1, v2: v2):
|
||
# implementation left as an exercise to the reader
|
||
|
||
|
||
Advantages
|
||
|
||
* Most of the same advantages of the method or function solutions above.
|
||
|
||
* Doesn't require a subclass to implement alternative behaviour on collisions,
|
||
just a function.
|
||
|
||
Disadvantages
|
||
|
||
* Same as function above.
|
||
|
||
* Cannot use arbitrary keyword arguments.
|
||
|
||
|
||
Do nothing
|
||
~~~~~~~~~~
|
||
|
||
"Status quo wins a stalemate."
|
||
|
||
We could do nothing, as there are already three possible ways to solve the
|
||
problem of merging two dicts:
|
||
|
||
* ``dict.update``.
|
||
|
||
* Dict unpacking using ``{**d1, **d2}``.
|
||
|
||
* Chain maps.
|
||
|
||
Advantage
|
||
|
||
* Nothing needs to change.
|
||
|
||
Disadvantages
|
||
|
||
* None of the three alternatives match the desired behaviour:
|
||
|
||
- ``d1.update(d2)`` modifies the first mapping in place.
|
||
|
||
- ``e = d1.copy(); e.update(d2)`` is not an expression and needs a temporary
|
||
variable.
|
||
|
||
- ``{**d1, **d2}`` ignores the types of the mappings and always returns a
|
||
builtin dict.
|
||
|
||
- Dict unpacking looks ugly and is not easily discoverable. Few people would
|
||
be able to guess what it means the first time they see it, or think of it
|
||
as the "obvious way" to merge two dicts.
|
||
|
||
`As Guido said
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/message/K4IC74IXE23K4KEL7OUFK3VBC62HGGVF/>`_:
|
||
|
||
"I'm sorry for PEP 448, but even if you know about ``**d`` in simpler
|
||
contexts, if you were to ask a typical Python user how to combine two
|
||
dicts into a new one, I doubt many people would think of ``{**d1, **d2}``.
|
||
I know I myself had forgotten about it when this thread started!"
|
||
|
||
- ``type(d1)({**d1, **d2})`` fails for dict subclasses such as
|
||
``defaultdict`` that have an incompatible ``__init__`` method.
|
||
|
||
- ChainMap is unfortunately poorly-known and doesn't qualify as "obvious".
|
||
|
||
- ChainMap resolves duplicate keys in the opposite order to that expected
|
||
("first seen wins" instead of "last seen wins").
|
||
|
||
- Like dict unpacking, it is tricky to get it to honour the desired subclass,
|
||
for the same reason, ``type(d1)(ChainMap(d2, d1))`` fails for some
|
||
subclasses of dict.
|
||
|
||
- ChainMaps wrap their underlying dicts, so writes to the ChainMap will
|
||
modify the original dict::
|
||
|
||
>>> d1 = {'spam': 1}
|
||
>>> d2 = {'eggs': 2}
|
||
>>> merged = ChainMap(d2, d1)
|
||
>>> merged['eggs'] = 999
|
||
>>> d2
|
||
{'eggs': 999}
|
||
|
||
|
||
Implementation
|
||
--------------
|
||
|
||
One of the authors has `drafted a C implementation
|
||
<https://github.com/brandtbucher/cpython/tree/addiction>`_.
|
||
|
||
An approximate pure-Python implementation of the merge operator will
|
||
be::
|
||
|
||
def __add__(self, other):
|
||
if not isinstance(other, dict):
|
||
return NotImplemented
|
||
new = self.copy()
|
||
new.update(other)
|
||
return new
|
||
|
||
def __radd__(self, other):
|
||
if not isinstance(other, dict):
|
||
return NotImplemented
|
||
new = other.copy()
|
||
new.update(self)
|
||
return new
|
||
|
||
Note that the result type will be the type of the left operand; in the
|
||
event of matching keys, the winner is the right operand.
|
||
|
||
Augmented assignment will just call the ``update`` method. This is
|
||
analogous to the way ``list +=`` calls the ``extend`` method, which
|
||
accepts any iterable, not just lists::
|
||
|
||
def __iadd__(self, other):
|
||
self.update(other)
|
||
return self
|
||
|
||
These semantics are intended to match those of ``update`` as closely
|
||
as possible.
|
||
|
||
|
||
Contra-indications
|
||
------------------
|
||
|
||
(Or when to avoid using these new operators.)
|
||
|
||
For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will
|
||
suffer from the same unfortunate O(N\*\*2) Big Oh performance as does
|
||
list and tuple addition, and for similar reasons. If one expects to
|
||
be merging a large number of dicts where performance is an issue, it
|
||
may be better to use an explicit loop and in-place merging::
|
||
|
||
new = {}
|
||
for d in many_dicts:
|
||
new += d
|
||
|
||
This is unlikely to be a problem in practice as most uses of the merge
|
||
operator are expected to only involve a small number of dicts.
|
||
Similarly, most uses of list and tuple concatenation only use a few
|
||
objects.
|
||
|
||
Using the dict augmented assignment operators on a dict inside a tuple
|
||
(or other immutable data structure) will lead to the same problem that
|
||
occurs with list concatenation[3], namely the in-place addition will
|
||
succeed, but the operation will raise an exception::
|
||
|
||
>>> a_tuple = ({'spam': 1, 'eggs': 2}, None)
|
||
>>> a_tuple[0] += {'spam': 999}
|
||
Traceback (most recent call last):
|
||
...
|
||
TypeError: 'tuple' object does not support item assignment
|
||
>>> a_tuple[0]
|
||
{'spam': 999, 'eggs': 2}
|
||
|
||
|
||
Major Objections
|
||
----------------
|
||
|
||
|
||
Dict addition is not commutative
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Addition is commutative, but dict addition will not be (``d + e != e + d``).
|
||
|
||
Response:
|
||
|
||
* Neither are list or string concatentation, both of which use the ``+``
|
||
operator.
|
||
|
||
* Dict addition (merge/update) is commutative with regard to the keys (although
|
||
not with regard to the values).
|
||
|
||
* Mathematically, the + operator is usually commutative, but it is not
|
||
mandatory. Perhaps the best known example of non-commutative addition
|
||
is that of `ordinal numbers
|
||
<https://en.wikipedia.org/wiki/Ordinal_arithmetic>`_, where ``ω + 1`` is a
|
||
strictly larger ordinal than ``ω`` but ``1 + ω = ω``.
|
||
|
||
* For non-numbers, `we only require addition to be associative
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/message/TZ5POQOB7KTUWQQPLNIC323ZIWOCWHBF/>`_,
|
||
that is, ``a + b + c == (a + b) + c == a + (b + c)``. This is satisfied by
|
||
the proposed dict merging behaviour.
|
||
|
||
|
||
Dict addition will be inefficient
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Giving a plus-operator to mappings is an invitation to writing code that
|
||
doesn't scale well. Repeated dict addition is inefficient:
|
||
``d + e + f + g + h`` creates and destroys three temporary mappings.
|
||
|
||
Response:
|
||
|
||
* The same argument applies to sequence concatenation. Unlike string
|
||
concatenation, it is rare for people to concatenate large numbers of lists or
|
||
tuples, and the authors of this PEP believe that it will be rare for people
|
||
to add large numbers of dicts.
|
||
|
||
* A survey of the standard library by the authors found no examples of merging
|
||
more than two dicts. This is unlikely to be a performance problem:
|
||
"Everything is fast for small enough N".
|
||
|
||
* ``collections.Counter`` is a dict subclass that supports the ``+`` operator.
|
||
There are no known examples of people having performance issues due to adding
|
||
large numbers of Counters.
|
||
|
||
* Sequence concatenation grows with the total number of items in the sequences,
|
||
leading to O(N**2) (quadratic) performance. Dict addition is likely to
|
||
involve duplicate keys, and so the temporary mappings will not grow as fast.
|
||
|
||
|
||
Repeated addition should be equivalent to multiplication
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The star operator ``*`` represents repeated addition across multiple data
|
||
types. ``a * 5 == a + a + a + a + a`` where ``a`` is a number (int, float,
|
||
complex) str, bytes, tuple, or list. Dict addition breaks this pattern.
|
||
|
||
Response:
|
||
|
||
* "Multiplication is repeated addition" only applies to positive integer
|
||
arguments, and breaks down as soon as you start to consider signed or
|
||
non-integer multiplicands. Consider ``a * -3.5`` -- how do you interpret
|
||
that as ``a`` added to itself negative three and a half times?
|
||
|
||
Teaching multiplication as repeated addition is something that many educators
|
||
and mathematicians stongly oppose. Some discussion on the issue:
|
||
|
||
- https://www.maa.org/external_archive/devlin/devlin_06_08.html
|
||
- https://www.maa.org/external_archive/devlin/devlin_0708_08.html
|
||
- https://math.stackexchange.com/questions/64488/if-multiplication-is-not-repeated-addition
|
||
- https://denisegaskins.com/2008/07/01/if-it-aint-repeated-addition/
|
||
- https://en.wikipedia.org/wiki/Multiplication_and_repeated_addition
|
||
|
||
* ``collections.Counter`` already supports addition, and already breaks this
|
||
pattern.
|
||
|
||
* Even if we find it useful to demonstrate "multiplication is (sometimes)
|
||
repeated addition" for sequences and numbers, that doesn't make it mandatory
|
||
for all data types. It is a very weak "nice to have", not a "must have".
|
||
|
||
|
||
Dict addition is lossy
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Dict addition can lose data (values may disappear); no other form of addition
|
||
is lossy.
|
||
|
||
Response:
|
||
|
||
* It isn't clear why the first part of this argument is a problem.
|
||
``dict.update()`` may throw away values, but not keys; that is expected
|
||
behaviour, and will remain expected behaviour regardless of whether it is
|
||
spelled as ``update()`` or ``+``.
|
||
|
||
* Floating point addition is lossy in the sense that the result may depend on
|
||
only one addend, even when the other is non-zero::
|
||
|
||
>>> 1e20 + 1000.1
|
||
1e+20
|
||
|
||
* Integer addition and concatenation are also lossy, in the sense of not being
|
||
reversable: you cannot get back the two addends given only the sum.
|
||
Two numbers add to give 356; what are the two numbers?
|
||
|
||
|
||
Dict contains tests will fail
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The invariant ``a in (a + b)`` holds for collections like list, tuple, str, and
|
||
bytes.
|
||
|
||
Response:
|
||
|
||
* This invariant only applies when ``+`` implements concatenation, not numeric
|
||
addition, boolean AND, or set union. There is no reason to expect it to
|
||
apply to dict union/merge.
|
||
|
||
* This invariant doesn't apply to other collections, such as arrays, deques or
|
||
Counters. For example::
|
||
|
||
>>> from array import array
|
||
>>> a = array("i", [1, 2, 3])
|
||
>>> b = array("i", [4, 5, 6])
|
||
>>> a in (a + b)
|
||
False
|
||
|
||
|
||
Only One Way To Do It
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Dict addition will violate the Only One Way koan from the Zen.
|
||
|
||
Response:
|
||
|
||
* There is no such koan. "Only One Way" is a calumny about Python originating
|
||
long ago from the Perl community.
|
||
|
||
|
||
More Than One Way To Do It
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Okay, the Zen doesn't say that there should be Only One Way To Do It. But it
|
||
does have a prohibition against allowing "more than one way to do it".
|
||
|
||
Response:
|
||
|
||
* There is no such prohibition. The "Zen of Python" merely expresses a
|
||
*preference* for "only one *obvious* way"::
|
||
|
||
There should be one-- and preferably only one --obvious way to do it.
|
||
|
||
* The emphasis here is that there should be an obvious way to do "it". In the
|
||
case of dict update operations, there are at least two different operations
|
||
that we might wish to do:
|
||
|
||
- *update a dict in place*, in which place the Obvious Way is to use the
|
||
``update()`` method. If this proposal is accepted, the ``+=`` augmented
|
||
assignment operator will also work, but that is a side-effect of how
|
||
augmented assignments are defined. Which you choose is a matter of taste.
|
||
|
||
- *merge two existing dicts into a third, new dict*, in which case this PEP
|
||
proposes that the Obvious Way is to use the ``+`` merge operator.
|
||
|
||
* In practice, this preference for "only one way" is frequently violated in
|
||
Python. For example, every for loop could be re-written as a while loop;
|
||
every if-expression could be written as an if-else statement. List, set and
|
||
dict comprehensions could all be replaced by generator comprehensions. Lists
|
||
offer no fewer than five ways to implement concatenation:
|
||
|
||
- Addition operator: ``a + b``
|
||
- In-place addition operator: ``a += b``
|
||
- Slice assignment: ``a[len(a):] = b``
|
||
- Sequence unpacking: ``[*a, *b]``
|
||
- Extend method: ``a.extend(b)``
|
||
|
||
We should not be too strict about rejecting useful functionality because it
|
||
violates "only one way".
|
||
|
||
|
||
Dict addition is not like concatenation
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Dict addition is not like concatenation, which obeys the invariant
|
||
``len(d + e) == len(d) + len(e)``.
|
||
|
||
Response:
|
||
|
||
* Numeric, vector and matrix addition don't obey this invariant either, it
|
||
isn't clear why dict *merging* should be expected to obey it. And in the
|
||
case of numeric addition, ``len(x)`` is not even defined.
|
||
|
||
* In a sense, dict addition can be considered to be a kind of concatenation::
|
||
|
||
{a:v, b:v, c:v} + {b:v, c:v, d:v} => {a:v, b:v, c:v, b:v, c:v, d:v}
|
||
|
||
Since dicts don't take duplicate keys, standard dict behaviour occurs and the
|
||
last-seen (right-most) wins.
|
||
|
||
|
||
Dict addition makes code harder to understand
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Dict addition makes it harder to tell what code means. To paraphrase the
|
||
objection rather than quote anyone in specific: "If I see ``spam + eggs``,
|
||
I can't tell what it does unless I know what ``spam`` and ``eggs`` are".
|
||
|
||
Response:
|
||
|
||
* This is very true. But it is equally true today, where the use of the ``+``
|
||
operator could mean any of:
|
||
|
||
- numeric addition
|
||
- sequence concatenation
|
||
- ``Counter`` merging
|
||
- any other overloaded operation
|
||
|
||
Adding dict merging to the set of possibilities doesn't seem to make it
|
||
*harder* to understand the code. No more work is required to determine that
|
||
``spam`` and ``eggs`` are mappings than it would take to determine that they
|
||
are lists, or numbers. And good naming conventions will help::
|
||
|
||
width + margin # probably numeric addition
|
||
prefix + word # probably string concatenation
|
||
settings + user_prefs # probably mapping addition
|
||
|
||
|
||
What about the full set API?
|
||
----------------------------
|
||
|
||
Some people have suggested that dicts are "set like", and should support the
|
||
full collection of set operators ``|``, ``&``, ``^`` and ``-``.
|
||
|
||
This PEP does not take a position on whether dicts should support the full
|
||
collection of set operators, and would prefer to leave that for a later PEP
|
||
(one of the authors is interested in drafting such a PEP). For the benefit of
|
||
any later PEP, a brief summary follows.
|
||
|
||
Set union, ``|``, has a natural analogy to dict update operation, and the pipe
|
||
operator is strongly prefered over ``+`` by many people. As described in the
|
||
section "Rejected semantics", the most natural behaviour is for the last value
|
||
seen to win.
|
||
|
||
Set intersection ``&`` is more problematic. While it is easy to determine the
|
||
intersection of *keys* in two dicts, it is not clear what to do with the
|
||
*values*. For example, given two dicts::
|
||
|
||
d1 = {"spam": 1, "eggs": 2}
|
||
d2 = {"ham": 3, "eggs": 4}
|
||
|
||
it is obvious that the only key of ``d1 & d2`` must be ``"eggs"``. But there
|
||
are at least five obvious ways to choose the values:
|
||
|
||
- first (left-most) value wins: ``2``
|
||
- last (right-most) value wins: ``4``
|
||
- add/concatenate the values: ``6``
|
||
- keep a list of both values: ``[2, 4]``
|
||
- raise an exception
|
||
|
||
but none of them are obviously correct or more useful than the others. "Last
|
||
seen wins" has the advantage of consistency with union, but it isn't clear if
|
||
that alone is reason enough to choose it.
|
||
|
||
Set symmetric difference ``^`` is also obvious and natural. Given the two
|
||
dicts above, the symmetric difference ``d1 ^ d2`` would be
|
||
``{"spam": 1, "ham": 3}``.
|
||
|
||
Set difference ``-`` is also obvious and natural, and an earlier version of
|
||
this PEP included it in the proposal. Given the dicts above, we would have
|
||
``d1 - d2`` return ``{"spam": 1}`` and ``d2 - d1`` return ``{"ham": 1}``.
|
||
|
||
|
||
Examples of candidates for the dict merging operator
|
||
----------------------------------------------------
|
||
|
||
The authors of this PEP did a survey of third party libraries for dictionary
|
||
merging which might be candidates for dict addition.
|
||
|
||
(This is a cursory list based on a subset of whatever arbitrary third-party
|
||
packages happened to be installed on the author's computer, and may not reflect
|
||
the current state of any package.)
|
||
|
||
|
||
From **sympy/abc.py**::
|
||
|
||
clash = {}
|
||
clash.update(clash1)
|
||
clash.update(clash2)
|
||
return clash1, clash2, clash
|
||
|
||
Rewrite as ``return clash1, clash2, clash1+clash2``.
|
||
|
||
|
||
From **sympy/utilities/runtests.py**::
|
||
|
||
globs = globs.copy()
|
||
if extraglobs is not None:
|
||
globs.update(extraglobs)
|
||
|
||
Rewrite as ``globs = globs + ({} if extraglobs is None else extraglobs)``
|
||
|
||
|
||
From **sympy/vector/functions.py**::
|
||
|
||
subs_dict = {}
|
||
for f in system_set:
|
||
subs_dict.update(f.scalar_map(system))
|
||
return expr.subs(subs_dict)
|
||
|
||
Rewrite as
|
||
``return expr.subs(sum((f.scalar_map(system) for f in system_set), {}))``
|
||
|
||
|
||
From **sympy/printing/fcode.py** and **sympy/printing/ccode.py**::
|
||
|
||
self.known_functions = dict(known_functions)
|
||
userfuncs = settings.get('user_functions', {})
|
||
self.known_functions.update(userfuncs)
|
||
|
||
Rewrite as
|
||
``self.known_functions = dict(known_functions) + settings.get('user_functions', {})``
|
||
|
||
|
||
From **sympy/parsing/maxima.py**::
|
||
|
||
dct = MaximaHelpers.__dict__.copy()
|
||
dct.update(name_dict)
|
||
obj = sympify(str, locals=dct)
|
||
|
||
Rewrite as ``obj = sympify(str, locals= MaximaHelpers.__dict__ + name_dict)``
|
||
|
||
|
||
From **sphinx/quickstart.py**::
|
||
|
||
d.setdefault('release', d['version'])
|
||
d2 = DEFAULT_VALUE.copy()
|
||
d2.update(dict(("ext_"+ext, False) for ext in EXTENSIONS))
|
||
d2.update(d)
|
||
d = d2
|
||
|
||
Rewrite as
|
||
``d = DEFAULT_VALUE + dict(("ext_"+ext, False) for ext in EXTENSIONS) + d``
|
||
|
||
|
||
From **sphinx/highlighting.py**::
|
||
|
||
def get_formatter(self, **kwargs):
|
||
kwargs.update(self.formatter_args)
|
||
return self.formatter(**kwargs)
|
||
|
||
Rewrite as ``return self.formatter(**(kwargs + self.formatter_args))``
|
||
|
||
|
||
From **sphinx/ext/inheritance_diagram.py**::
|
||
|
||
n_attrs = self.default_node_attrs.copy()
|
||
e_attrs = self.default_edge_attrs.copy()
|
||
g_attrs.update(graph_attrs)
|
||
n_attrs.update(node_attrs)
|
||
e_attrs.update(edge_attrs)
|
||
|
||
Rewrite as::
|
||
|
||
g_attrs.update(graph_attrs)
|
||
n_attrs = self.default_node_attrs + node_attrs
|
||
e_attrs = self.default_edge_attrs + edge_attrs
|
||
|
||
|
||
From **sphinx/ext/doctest.py**::
|
||
|
||
new_opt = code[0].options.copy()
|
||
new_opt.update(example.options)
|
||
example.options = new_opt
|
||
|
||
Rewrite as ``example.options = code[0].options + example.options``
|
||
|
||
|
||
From **sphinx/domains/__init__.py**::
|
||
|
||
self.attrs = self.known_attrs.copy()
|
||
self.attrs.update(attrs)
|
||
|
||
Rewrite as ``self.attrs = self.known_attrs + attrs``
|
||
|
||
|
||
From **requests/sessions.py**::
|
||
|
||
merged_setting = dict_class(to_key_val_list(session_setting))
|
||
merged_setting.update(to_key_val_list(request_setting))
|
||
|
||
Rewrite as
|
||
``merged_setting = dict_class(to_key_val_list(session_setting)) + to_key_val_list(request_setting)``
|
||
|
||
|
||
From **matplotlib/legend.py**::
|
||
|
||
if self._handler_map:
|
||
hm = default_handler_map.copy()
|
||
hm.update(self._handler_map)
|
||
return hm
|
||
|
||
Rewrite as ``return default_handler_map + self._handler_map``
|
||
|
||
|
||
From **pygments/lexer.py**::
|
||
|
||
if kwargs:
|
||
kwargs.update(lexer.options)
|
||
lx = lexer.__class__(**kwargs)
|
||
|
||
Rewrite as ``lx = lexer.__class__(**(kwargs + lexer.options))``
|
||
|
||
|
||
From **praw/internal.py**::
|
||
|
||
data = {'name': six.text_type(user), 'type': relationship}
|
||
data.update(kwargs)
|
||
|
||
Rewrite as
|
||
``data = {'name': six.text_type(user), 'type': relationship} + kwargs``
|
||
|
||
|
||
From **IPython/zmq/ipkernel.py**::
|
||
|
||
aliases = dict(kernel_aliases)
|
||
aliases.update(shell_aliases)
|
||
|
||
Rewrite as ``aliases = dict(kernel_aliases) + shell_aliases``
|
||
|
||
|
||
From **matplotlib/backends/backend_svg.py**::
|
||
|
||
attrib = attrib.copy()
|
||
attrib.update(extra)
|
||
attrib = attrib.items()
|
||
|
||
Rewrite as ``attrib = (attrib + extra).items()``
|
||
|
||
|
||
From **matplotlib/delaunay/triangulate.py**::
|
||
|
||
edges = {}
|
||
edges.update(dict(zip(self.triangle_nodes[border[:,0]][:,1],
|
||
self.triangle_nodes[border[:,0]][:,2])))
|
||
edges.update(dict(zip(self.triangle_nodes[border[:,1]][:,2],
|
||
self.triangle_nodes[border[:,1]][:,0])))
|
||
edges.update(dict(zip(self.triangle_nodes[border[:,2]][:,0],
|
||
self.triangle_nodes[border[:,2]][:,1])))
|
||
|
||
Rewrite as::
|
||
|
||
edges = (dict(zip(self.triangle_nodes[border[:,0]][:,1],
|
||
self.triangle_nodes[border[:,0]][:,2]))
|
||
+ dict(zip(self.triangle_nodes[border[:,1]][:,2],
|
||
self.triangle_nodes[border[:,1]][:,0]))
|
||
+ dict(zip(self.triangle_nodes[border[:,2]][:,0],
|
||
self.triangle_nodes[border[:,2]][:,1]))
|
||
)
|
||
|
||
|
||
From **numpy/ma/core.py**::
|
||
|
||
# We need to copy the _basedict to avoid backward propagation
|
||
_optinfo = {}
|
||
_optinfo.update(getattr(obj, '_optinfo', {}))
|
||
_optinfo.update(getattr(obj, '_basedict', {}))
|
||
if not isinstance(obj, MaskedArray):
|
||
_optinfo.update(getattr(obj, '__dict__', {}))
|
||
|
||
Rewrite as::
|
||
|
||
_optinfo = getattr(obj, '_optinfo', {}) + getattr(obj, '_basedict', {})
|
||
if not isinstance(obj, MaskedArray):
|
||
_optinfo += getattr(obj, '__dict__', {})
|
||
|
||
|
||
The above examples show that sometimes the ``+`` operator leads to a clear
|
||
increase in readability, reducing the number of lines of code and improving
|
||
clarity. However other examples using the ``+`` operator lead to long, complex
|
||
single expressions, possibly well over the PEP 8 maximum line length of 80
|
||
columns. As with any other language feature, the programmer should use their
|
||
own judgement about whether ``+`` improves their code.
|
||
|
||
|
||
Related discussions
|
||
-------------------
|
||
|
||
`Latest discussion which motivated this PEP
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/thread/BHIJX6MHGMMD3S6D7GVTPZQL4N5V7T42>`_
|
||
|
||
`Ticket on the bug tracker <https://bugs.python.org/issue36144>`_
|
||
|
||
Merging two dictionaries in an expression is a frequently requested feature.
|
||
For example:
|
||
|
||
https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression
|
||
|
||
https://stackoverflow.com/questions/1781571/how-to-concatenate-two-dictionaries-to-create-a-new-one-in-python
|
||
|
||
https://stackoverflow.com/questions/6005066/adding-dictionaries-together-python
|
||
|
||
Occasionally people request alternative behaviour for the merge:
|
||
|
||
https://stackoverflow.com/questions/1031199/adding-dictionaries-in-python
|
||
|
||
https://stackoverflow.com/questions/877295/python-dict-add-by-valuedict-2
|
||
|
||
...including one proposal to treat dicts as `sets of keys
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/message/YY3KZZGEX6VEFX5QZJ33P7NTTXGPZQ7N/>`_.
|
||
|
||
`Ian Lee's proto-PEP <https://lwn.net/Articles/635444/>`_, and `discussion
|
||
<https://lwn.net/Articles/635397/>`_ in 2015. Further discussion took place on
|
||
`Python-Ideas <https://mail.python.org/archives/list/python-ideas@python.org/thread/43OZV3MR4XLFRPCI27I7BB6HVBD25M2E/>`_.
|
||
|
||
(Observant readers will notice that one of the authors of this PEP was more
|
||
skeptical of the idea in 2015.)
|
||
|
||
Adding `a full complement of operators to dicts
|
||
<https://mail.python.org/archives/list/python-ideas@python.org/thread/EKWMDJKMVOJCOROQVHJFQX7W2L55I5RA/>`_.
|
||
|
||
`Discussion on Y-Combinator <https://news.ycombinator.com/item?id=19314646>`_.
|
||
|
||
https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/
|
||
|
||
https://code.tutsplus.com/tutorials/how-to-merge-two-python-dictionaries--cms-26230
|
||
|
||
In direct response to an earlier draft of this PEP, Serhiy Storchaka raised `a
|
||
ticket in the bug tracker <https://bugs.python.org/issue36431>`_ to replace the
|
||
``copy(); update()`` idiom with dict unpacking.
|
||
|
||
|
||
Open questions
|
||
--------------
|
||
|
||
Should these operators be part of the ABC ``Mapping`` API?
|
||
|
||
|
||
References
|
||
----------
|
||
|
||
[1] Guido's declaration that plus wins over pipe:
|
||
https://mail.python.org/pipermail/python-ideas/2019-February/055519.html
|
||
|
||
[2] Non-string keys: https://bugs.python.org/issue35105 and
|
||
https://mail.python.org/pipermail/python-dev/2018-October/155435.html
|
||
|
||
[3] Behaviour in tuples:
|
||
https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
|
||
|
||
|
||
Copyright
|
||
---------
|
||
|
||
This document is placed in the public domain or under the CC0-1.0-Universal
|
||
license, whichever is more permissive.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|