413 lines
16 KiB
Plaintext
413 lines
16 KiB
Plaintext
PEP: 469
|
|
Title: Migration of dict iteration code to Python 3
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Withdrawn
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 18-Apr-2014
|
|
Python-Version: 3.5
|
|
Post-History: 18-Apr-2014, 21-Apr-2014
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
For Python 3, :pep:`3106` changed the design of the ``dict`` builtin and the
|
|
mapping API in general to replace the separate list based and iterator based
|
|
APIs in Python 2 with a merged, memory efficient set and multiset view
|
|
based API. This new style of dict iteration was also added to the Python 2.7
|
|
``dict`` type as a new set of iteration methods.
|
|
|
|
This means that there are now 3 different kinds of dict iteration that may
|
|
need to be migrated to Python 3 when an application makes the transition:
|
|
|
|
* Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())``
|
|
* Iterator objects: ``d.iteritems()`` -> ``iter(d.items())``
|
|
* Set based dynamic views: ``d.viewitems()`` -> ``d.items()``
|
|
|
|
There is currently no widely agreed best practice on how to reliably convert
|
|
all Python 2 dict iteration code to the common subset of Python 2 and 3,
|
|
especially when test coverage of the ported code is limited. This PEP
|
|
reviews the various ways the Python 2 iteration APIs may be accessed, and
|
|
looks at the available options for migrating that code to Python 3 by way of
|
|
the common subset of Python 2.6+ and Python 3.0+.
|
|
|
|
The PEP also considers the question of whether or not there are any
|
|
additions that may be worth making to Python 3.5 that may ease the
|
|
transition process for application code that doesn't need to worry about
|
|
supporting earlier versions when eventually making the leap to Python 3.
|
|
|
|
|
|
PEP Withdrawal
|
|
==============
|
|
|
|
In writing the second draft of this PEP, I came to the conclusion that
|
|
the readability of hybrid Python 2/3 mapping code can actually be best
|
|
enhanced by better helper functions rather than by making changes to
|
|
Python 3.5+. The main value I now see in this PEP is as a clear record
|
|
of the recommended approaches to migrating mapping iteration code from
|
|
Python 2 to Python 3, as well as suggesting ways to keep things readable
|
|
and maintainable when writing hybrid code that supports both versions.
|
|
|
|
Notably, I recommend that hybrid code avoid calling mapping iteration
|
|
methods directly, and instead rely on builtin functions where possible,
|
|
and some additional helper functions for cases that would be a simple
|
|
combination of a builtin and a mapping method in pure Python 3 code, but
|
|
need to be handled slightly differently to get the exact same semantics in
|
|
Python 2.
|
|
|
|
Static code checkers like pylint could potentially be extended with an
|
|
optional warning regarding direct use of the mapping iteration methods in
|
|
a hybrid code base.
|
|
|
|
|
|
Mapping iteration models
|
|
========================
|
|
|
|
Python 2.7 provides three different sets of methods to extract the keys,
|
|
values and items from a ``dict`` instance, accounting for 9 out of the
|
|
18 public methods of the ``dict`` type.
|
|
|
|
In Python 3, this has been rationalised to just 3 out of 11 public methods
|
|
(as the ``has_key`` method has also been removed).
|
|
|
|
|
|
Lists as mutable snapshots
|
|
--------------------------
|
|
|
|
This is the oldest of the three styles of dict iteration, and hence the
|
|
one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()``
|
|
methods in Python 2.
|
|
|
|
These methods all return lists that are snapshots of the state of the
|
|
mapping at the time the method was called. This has a few consequences:
|
|
|
|
* the original object can be mutated freely without affecting iteration
|
|
over the snapshot
|
|
* the snapshot can be modified independently of the original object
|
|
* the snapshot consumes memory proportional to the size of the original
|
|
mapping
|
|
|
|
The semantic equivalent of these operations in Python 3 are
|
|
``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``.
|
|
|
|
|
|
Iterator objects
|
|
----------------
|
|
|
|
In Python 2.2, ``dict`` objects gained support for the then-new iterator
|
|
protocol, allowing direct iteration over the keys stored in the dictionary,
|
|
thus avoiding the need to build a list just to iterate over the dictionary
|
|
contents one entry at a time. ``iter(d)`` provides direct access to the
|
|
iterator object for the keys.
|
|
|
|
Python 2 also provides a ``d.iterkeys()`` method that is essentially
|
|
synonymous with ``iter(d)``, along with ``d.itervalues()`` and
|
|
``d.iteritems()`` methods.
|
|
|
|
These iterators provide live views of the underlying object, and hence may
|
|
fail if the set of keys in the underlying object is changed during
|
|
iteration::
|
|
|
|
>>> d = dict(a=1)
|
|
>>> for k in d:
|
|
... del d[k]
|
|
...
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
RuntimeError: dictionary changed size during iteration
|
|
|
|
As iterators, iteration over these objects is also a one-time operation:
|
|
once the iterator is exhausted, you have to go back to the original mapping
|
|
in order to iterate again.
|
|
|
|
In Python 3, direct iteration over mappings works the same way as it does
|
|
in Python 2. There are no method based equivalents - the semantic equivalents
|
|
of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are
|
|
``iter(d.values())`` and ``iter(d.items())``.
|
|
|
|
The ``six`` and ``future.utils`` compatibility modules also both provide
|
|
``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that
|
|
provide efficient iterator semantics in both Python 2 and 3.
|
|
|
|
|
|
Set based dynamic views
|
|
-----------------------
|
|
|
|
The model that is provided in Python 3 as a method based API is that of set
|
|
based dynamic views (technically multisets in the case of the ``values()``
|
|
view).
|
|
|
|
In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and
|
|
``d. items()`` provide a live view of the current state of
|
|
the underlying object, rather than taking a full snapshot of the current
|
|
state as they did in Python 2. This change is safe in many circumstances,
|
|
but does mean that, as with the direct iteration API, it is necessary to
|
|
avoid adding or removing keys during iteration, in order to avoid
|
|
encountering the following error::
|
|
|
|
>>> d = dict(a=1)
|
|
>>> for k, v in d.items():
|
|
... del d[k]
|
|
...
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
RuntimeError: dictionary changed size during iteration
|
|
|
|
Unlike the iteration API, these objects are iterables, rather than iterators:
|
|
you can iterate over them multiple times, and each time they will iterate
|
|
over the entire underlying mapping.
|
|
|
|
These semantics are also available in Python 2.7 as the ``d.viewkeys()``,
|
|
``d.viewvalues()`` and ``d.viewitems()`` methods.
|
|
|
|
The ``future.utils`` compatibility module also provides
|
|
``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions
|
|
when running on Python 2.7 or Python 3.x.
|
|
|
|
|
|
Migrating directly to Python 3
|
|
==============================
|
|
|
|
The ``2to3`` migration tool handles direct migrations to Python 3 in
|
|
accordance with the semantic equivalents described above:
|
|
|
|
* ``d.keys()`` -> ``list(d.keys())``
|
|
* ``d.values()`` -> ``list(d.values())``
|
|
* ``d.items()`` -> ``list(d.items())``
|
|
* ``d.iterkeys()`` -> ``iter(d.keys())``
|
|
* ``d.itervalues()`` -> ``iter(d.values())``
|
|
* ``d.iteritems()`` -> ``iter(d.items())``
|
|
* ``d.viewkeys()`` -> ``d.keys()``
|
|
* ``d.viewvalues()`` -> ``d.values()``
|
|
* ``d.viewitems()`` -> ``d.items()``
|
|
|
|
Rather than 9 distinct mapping methods for iteration, there are now only the
|
|
3 view methods, which combine in straightforward ways with the two relevant
|
|
builtin functions to cover all of the behaviours that are available as
|
|
``dict`` methods in Python 2.7.
|
|
|
|
Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the
|
|
``2to3`` migration tool doesn't attempt that replacement.
|
|
|
|
The ``2to3`` migration tool also *does not* provide any automatic assistance
|
|
for migrating references to these objects as bound or unbound methods - it
|
|
only automates conversions where the API is called immediately.
|
|
|
|
|
|
Migrating to the common subset of Python 2 and 3
|
|
================================================
|
|
|
|
When migrating to the common subset of Python 2 and 3, the above
|
|
transformations are not generally appropriate, as they all either result in
|
|
the creation of a redundant list in Python 2, have unexpectedly different
|
|
semantics in at least some cases, or both.
|
|
|
|
Since most code running in the common subset of Python 2 and 3 supports
|
|
at least as far back as Python 2.6, the currently recommended approach to
|
|
conversion of mapping iteration operation depends on two helper functions
|
|
for efficient iteration over mapping values and mapping item tuples:
|
|
|
|
* ``d.keys()`` -> ``list(d)``
|
|
* ``d.values()`` -> ``list(itervalues(d))``
|
|
* ``d.items()`` -> ``list(iteritems(d))``
|
|
* ``d.iterkeys()`` -> ``iter(d)``
|
|
* ``d.itervalues()`` -> ``itervalues(d)``
|
|
* ``d.iteritems()`` -> ``iteritems(d)``
|
|
|
|
Both ``six`` and ``future.utils`` provide appropriate definitions of
|
|
``itervalues()`` and ``iteritems()`` (along with essentially redundant
|
|
definitions of ``iterkeys()``). Creating your own definitions of these
|
|
functions in a custom compatibility module is also relatively
|
|
straightforward::
|
|
|
|
try:
|
|
dict.iteritems
|
|
except AttributeError:
|
|
# Python 3
|
|
def itervalues(d):
|
|
return iter(d.values())
|
|
def iteritems(d):
|
|
return iter(d.items())
|
|
else:
|
|
# Python 2
|
|
def itervalues(d):
|
|
return d.itervalues()
|
|
def iteritems(d):
|
|
return d.iteritems()
|
|
|
|
The greatest loss of readability currently arises when converting code that
|
|
actually *needs* the list based snapshots that were the default in Python
|
|
2. This readability loss could likely be mitigated by also providing
|
|
``listvalues`` and ``listitems`` helper functions, allowing the affected
|
|
conversions to be simplified to:
|
|
|
|
* ``d.values()`` -> ``listvalues(d)``
|
|
* ``d.items()`` -> ``listitems(d)``
|
|
|
|
The corresponding compatibility function definitions are as straightforward
|
|
as their iterator counterparts::
|
|
|
|
try:
|
|
dict.iteritems
|
|
except AttributeError:
|
|
# Python 3
|
|
def listvalues(d):
|
|
return list(d.values())
|
|
def listitems(d):
|
|
return list(d.items())
|
|
else:
|
|
# Python 2
|
|
def listvalues(d):
|
|
return d.values()
|
|
def listitems(d):
|
|
return d.items()
|
|
|
|
With that expanded set of compatibility functions, Python 2 code would
|
|
then be converted to "idiomatic" hybrid 2/3 code as:
|
|
|
|
* ``d.keys()`` -> ``list(d)``
|
|
* ``d.values()`` -> ``listvalues(d)``
|
|
* ``d.items()`` -> ``listitems(d)``
|
|
* ``d.iterkeys()`` -> ``iter(d)``
|
|
* ``d.itervalues()`` -> ``itervalues(d)``
|
|
* ``d.iteritems()`` -> ``iteritems(d)``
|
|
|
|
This compares well for readability with the idiomatic pure Python 3
|
|
code that uses the mapping methods and builtins directly:
|
|
|
|
* ``d.keys()`` -> ``list(d)``
|
|
* ``d.values()`` -> ``list(d.values())``
|
|
* ``d.items()`` -> ``list(d.items())``
|
|
* ``d.iterkeys()`` -> ``iter(d)``
|
|
* ``d.itervalues()`` -> ``iter(d.values())``
|
|
* ``d.iteritems()`` -> ``iter(d.items())``
|
|
|
|
It's also notable that when using this approach, hybrid code would *never*
|
|
invoke the mapping methods directly: it would always invoke either a
|
|
builtin or helper function instead, in order to ensure the exact same
|
|
semantics on both Python 2 and 3.
|
|
|
|
|
|
Migrating from Python 3 to the common subset with Python 2.7
|
|
============================================================
|
|
|
|
While the majority of migrations are currently from Python 2 either directly
|
|
to Python 3 or to the common subset of Python 2 and Python 3, there are also
|
|
some migrations of newer projects that start in Python 3 and then later
|
|
add Python 2 support, either due to user demand, or to gain access to
|
|
Python 2 libraries that are not yet available in Python 3 (and porting them
|
|
to Python 3 or creating a Python 3 compatible replacement is not a trivial
|
|
exercise).
|
|
|
|
In these cases, Python 2.7 compatibility is often sufficient, and the 2.7+
|
|
only view based helper functions provided by ``future.utils`` allow the bare
|
|
accesses to the Python 3 mapping view methods to be replaced with code that
|
|
is compatible with both Python 2.7 and Python 3 (note, this is the only
|
|
migration chart in the PEP that has Python 3 code on the left of the
|
|
conversion):
|
|
|
|
* ``d.keys()`` -> ``viewkeys(d)``
|
|
* ``d.values()`` -> ``viewvalues(d)``
|
|
* ``d.items()`` -> ``viewitems(d)``
|
|
* ``list(d.keys())`` -> ``list(d)``
|
|
* ``list(d.values())`` -> ``listvalues(d)``
|
|
* ``list(d.items())`` -> ``listitems(d)``
|
|
* ``iter(d.keys())`` -> ``iter(d)``
|
|
* ``iter(d.values())`` -> ``itervalues(d)``
|
|
* ``iter(d.items())`` -> ``iteritems(d)``
|
|
|
|
As with migrations from Python 2 to the common subset, note that the hybrid
|
|
code ends up never invoking the mapping methods directly - it only calls
|
|
builtins and helper methods, with the latter addressing the semantic
|
|
differences between Python 2 and Python 3.
|
|
|
|
|
|
Possible changes to Python 3.5+
|
|
===============================
|
|
|
|
The main proposal put forward to potentially aid migration of existing
|
|
Python 2 code to Python 3 is the restoration of some or all of the
|
|
alternate iteration APIs to the Python 3 mapping API. In particular,
|
|
the initial draft of this PEP proposed making the following conversions
|
|
possible when migrating to the common subset of Python 2 and Python 3.5+:
|
|
|
|
* ``d.keys()`` -> ``list(d)``
|
|
* ``d.values()`` -> ``list(d.itervalues())``
|
|
* ``d.items()`` -> ``list(d.iteritems())``
|
|
* ``d.iterkeys()`` -> ``d.iterkeys()``
|
|
* ``d.itervalues()`` -> ``d.itervalues()``
|
|
* ``d.iteritems()`` -> ``d.iteritems()``
|
|
|
|
Possible mitigations of the additional language complexity in Python 3
|
|
created by restoring these methods included immediately deprecating them,
|
|
as well as potentially hiding them from the ``dir()`` function (or perhaps
|
|
even defining a way to make ``pydoc`` aware of function deprecations).
|
|
|
|
However, in the case where the list output is actually desired, the end
|
|
result of that proposal is actually less readable than an appropriately
|
|
defined helper function, and the function and method forms of the iterator
|
|
versions are pretty much equivalent from a readability perspective.
|
|
|
|
So unless I've missed something critical, readily available ``listvalues()``
|
|
and ``listitems()`` helper functions look like they will improve the
|
|
readability of hybrid code more than anything we could add back to the
|
|
Python 3.5+ mapping API, and won't have any long-term impact on the
|
|
complexity of Python 3 itself.
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
The fact that 5 years in to the Python 3 migration we still have users
|
|
considering the dict API changes a significant barrier to migration suggests
|
|
that there are problems with previously recommended approaches. This PEP
|
|
attempts to explore those issues and tries to isolate those cases where
|
|
previous advice (such as it was) could prove problematic.
|
|
|
|
My assessment (largely based on feedback from Twisted devs) is that
|
|
problems are most likely to arise when attempting to use ``d.keys()``,
|
|
``d.values()``, and ``d.items()`` in hybrid code. While superficially it
|
|
seems as though there should be cases where it is safe to ignore the
|
|
semantic differences, in practice, the change from "mutable snapshot" to
|
|
"dynamic view" is significant enough that it is likely better
|
|
to just force the use of either list or iterator semantics for hybrid code,
|
|
and leave the use of the view semantics to pure Python 3 code.
|
|
|
|
This approach also creates rules that are simple enough and safe enough that
|
|
it should be possible to automate them in code modernisation scripts that
|
|
target the common subset of Python 2 and Python 3, just as ``2to3`` converts
|
|
them automatically when targeting pure Python 3 code.
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
Thanks to the folks at the Twisted sprint table at PyCon for a very
|
|
vigorous discussion of this idea (and several other topics), and especially
|
|
to Hynek Schlawack for acting as a moderator when things got a little too
|
|
heated :)
|
|
|
|
Thanks also to JP Calderone and Itamar Turner-Trauring for their email
|
|
feedback, as well to the participants in the `python-dev review
|
|
<https://mail.python.org/pipermail/python-dev/2014-April/134168.html>`__ of
|
|
the initial version of the PEP.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|