PEP 465: updated & withdrawn based on feedback
This PEP now reviews exactly what is involved in migrating mapping iteration code to Python 3, as well as to the hybrid 2/3 subset. It is now withdrawn, as I now believe enhancements to migration tools and libraries are a better option than making changes to Python 3.5+
This commit is contained in:
parent
c245d58ca5
commit
e58c3947d5
388
pep-0469.txt
388
pep-0469.txt
|
@ -1,14 +1,14 @@
|
||||||
PEP: 469
|
PEP: 469
|
||||||
Title: Simplified migration of iterator-based mapping code to Python 3
|
Title: Migration of dict iteration code to Python 3
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||||
Status: Draft
|
Status: Withdrawn
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
Created: 2014-04-18
|
Created: 2014-04-18
|
||||||
Python-Version: 3.5
|
Python-Version: 3.5
|
||||||
Post-History: 2014-04-18
|
Post-History: 2014-04-18, 2014-04-21
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
|
@ -17,88 +17,335 @@ Abstract
|
||||||
For Python 3, PEP 3106 changed the design of the ``dict`` builtin and the
|
For Python 3, PEP 3106 changed the design of the ``dict`` builtin and the
|
||||||
mapping API in general to replace the separate list based and iterator based
|
mapping API in general to replace the separate list based and iterator based
|
||||||
APIs in Python 2 with a merged, memory efficient set and multiset view
|
APIs in Python 2 with a merged, memory efficient set and multiset view
|
||||||
based API.
|
based API. This new style of dict iteration was also added to the Python 2.7
|
||||||
|
``dict`` type as a new set of iteration methods.
|
||||||
|
|
||||||
This means that Python 3 code always requires an additional qualifier to
|
This means that there are now 3 different kinds of dict iteration that may
|
||||||
reliably reproduce classic Python 2 mapping semantics:
|
need to be migrated to Python 3 when an application makes the transition:
|
||||||
|
|
||||||
* List based (e.g. ``d.keys()``): ``list(d.keys())``
|
* Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())``
|
||||||
* Iterator based (e.g. ``d.iterkeys()``): ``iter(d.keys())``
|
* Iterator objects: ``d.iteritems()`` -> ``iter(d.items())``
|
||||||
|
* Set based dynamic views: ``d.viewitems()`` -> ``d.items()``
|
||||||
|
|
||||||
Some Python 2 code that uses ``d.keys()`` may be migrated to Python 3
|
There is currently no widely agreed best practice on how to reliably convert
|
||||||
(or the common subset of Python 2 and Python 3) without alteration, but
|
all Python 2 dict iteration code to the common subset of Python 2 and 3,
|
||||||
*all* code using the iterator based API requires modification. Code that
|
especially when test coverage of the ported code is limited. This PEP
|
||||||
is migrating to the common subset of Python 2 and 3 and needs to retain the
|
reviews the various ways the Python 2 iteration APIs may be accessed, and
|
||||||
memory efficient implementation that avoids creating an unnecessary list
|
looks at the available options for migrating that code to Python 3 by way of
|
||||||
object must switch away from using a method to instead using a helper
|
the common subset of Python 2.6+ and Python 3.0+.
|
||||||
function (such as those provided by the ``six`` module)
|
|
||||||
|
|
||||||
To simplify the process of migrating Python 2 code that uses the existing
|
The PEP also considers the question of whether or not there are any
|
||||||
iterator based APIs to Python 3, this PEP proposes the reintroduction
|
additions that may be worth making to Python 3.5 that may ease the
|
||||||
of the Python 2 spelling of the iterator based semantics in Python 3.5, by
|
transition process for application code that doesn't need to worry about
|
||||||
restoring the following methods to the builtin ``dict`` API and the
|
supporting earlier versions when eventually making the leap to Python 3.
|
||||||
``collections.abc.Mapping`` ABC definition:
|
|
||||||
|
|
||||||
* ``iterkeys()``
|
|
||||||
* ``itervalues()``
|
|
||||||
* ``iteritems()``
|
|
||||||
|
|
||||||
|
|
||||||
Proposal
|
PEP Withdrawal
|
||||||
========
|
==============
|
||||||
|
|
||||||
Methods with the following exact semantics will be added to the builtin
|
In writing the second draft of this PEP, I came to the conclusion that
|
||||||
``dict`` type and ``collections.abc.Mapping`` ABC::
|
the readability of hybrid Python 2/3 mapping code can actually be best
|
||||||
|
enhanced by better helper functions rather than by making changes to
|
||||||
|
Python 3.5+. The main value I now see in this PEP is as a clear record
|
||||||
|
of the recommended approaches to migrating mapping iteration code from
|
||||||
|
Python 2 to Python 3, as well as suggesting ways to keep things readable
|
||||||
|
and maintainable when writing hybrid code that supports both versions.
|
||||||
|
|
||||||
def iterkeys(self):
|
Notably, I recommend that hybrid code avoid calling mapping iteration
|
||||||
return iter(self.keys())
|
methods directly, and instead rely on builtin functions where possible,
|
||||||
|
and some additional helper functions for cases that would be a simple
|
||||||
|
combination of a builtin and a mapping method in pure Python 3 code, but
|
||||||
|
need to be handled slightly differently to get the exact same semantics in
|
||||||
|
Python 2.
|
||||||
|
|
||||||
def itervalues(self):
|
Static code checkers like pylint could potentially be extended with an
|
||||||
return iter(self.values())
|
optional warning regarding direct use of the mapping iteration methods in
|
||||||
|
a hybrid code base.
|
||||||
def iteritems(self):
|
|
||||||
return iter(self.items())
|
|
||||||
|
|
||||||
These semantics ensure that the methods also work as expected for subclasses
|
|
||||||
of these base types.
|
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Mapping iteration models
|
||||||
=========
|
========================
|
||||||
|
|
||||||
Similar in spirit to PEP 414 (which restored explicit Unicode literal
|
Python 2.7 provides three different sets of methods to extract the keys,
|
||||||
support in Python 3.3), this PEP is aimed primarily at helping users
|
values and items from a ``dict`` instance, accounting for 9 out of the
|
||||||
that currently feel punished for making use of a feature that needed to be
|
18 public methods of the ``dict`` type.
|
||||||
requested explicitly in Python 2, but was effectively made the default
|
|
||||||
behaviour in Python 3.
|
|
||||||
|
|
||||||
Users of list-based iteration in Python 2 that aren't actually relying on
|
In Python 3, this has been rationalised to just 3 out of 11 public methods
|
||||||
those semantics get a free memory efficiency improvement when migrating to
|
(as the ``has_key`` method has also been removed).
|
||||||
Python 3, and face no additional difficulties when migrating via the common
|
|
||||||
subset of Python 2 and 3.
|
|
||||||
|
|
||||||
By contrast, users that actually want the increased efficiency may have
|
|
||||||
faced a three phase migration process by the time they have fully migrated
|
|
||||||
to Python 3:
|
|
||||||
|
|
||||||
* original migration to the iterator based APIs after they were added in
|
Lists as mutable snapshots
|
||||||
Python 2.2
|
--------------------------
|
||||||
* migration to a separate function based API in order to run in the common
|
|
||||||
subset of Python 2 and 3
|
|
||||||
* eventual migration back to unprefixed method APIs when finally dropping
|
|
||||||
Python 2.7 support at some point in the future
|
|
||||||
|
|
||||||
The view based APIs that were added to Python 2.7 don't actually help with
|
This is the oldest of the three styles of dict iteration, and hence the
|
||||||
the transition process, as they don't exist in Python 3 and hence aren't
|
one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()``
|
||||||
part of the common subset of Python 2 and Python 3, and also aren't supported
|
methods in Python 2.
|
||||||
by most Python 2 mappings (including the collection ABCs).
|
|
||||||
|
|
||||||
This PEP proposes to just eliminate all that annoyance by making the iterator
|
These methods all return lists that are snapshots of the state of the
|
||||||
based APIs work again in Python 3.5+. As with the restoration of Unicode
|
mapping at the time the method was called. This has a few consequences:
|
||||||
literals, it does add a bit of additional noise to the definition of Python
|
|
||||||
3, but it does so while bringing a significant benefit in increasing the size
|
* the original object can be mutated freely without affecting iteration
|
||||||
of the common subset of Python 2 and Python 3 and so simplifying the process
|
over the snapshot
|
||||||
of migrating to Python 3 for affected Python 2 users.
|
* the snapshot can be modified independently of the original object
|
||||||
|
* the snapshot consumes memory proportional to the size of the original
|
||||||
|
mapping
|
||||||
|
|
||||||
|
The semantic equivalent of these operations in Python 3 are
|
||||||
|
``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``.
|
||||||
|
|
||||||
|
|
||||||
|
Iterator objects
|
||||||
|
----------------
|
||||||
|
|
||||||
|
In Python 2.2, ``dict`` objects gained support for the then-new iterator
|
||||||
|
protocol, allowing direct iteration over the keys stored in the dictionary,
|
||||||
|
thus avoiding the need to build a list just to iterate over the dictionary
|
||||||
|
contents one entry at a time. ``iter(d)`` provides direct access to the
|
||||||
|
iterator object for the keys.
|
||||||
|
|
||||||
|
Python 2 also provides a ``d.iterkeys()`` method that is essentially
|
||||||
|
synonymous with ``iter(d)``, along with ``d.itervalues()`` and
|
||||||
|
``d.iteritems()`` methods.
|
||||||
|
|
||||||
|
These iterators provide live views of the underlying object, and hence may
|
||||||
|
fail if the set of keys in the underlying object is changed during
|
||||||
|
iteration::
|
||||||
|
|
||||||
|
>>> d = dict(a=1)
|
||||||
|
>>> for k in d:
|
||||||
|
... del d[k]
|
||||||
|
...
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
RuntimeError: dictionary changed size during iteration
|
||||||
|
|
||||||
|
As iterators, iteration over these objects is also a one-time operation:
|
||||||
|
once the iterator is exhausted, you have to go back to the original mapping
|
||||||
|
in order to iterate again.
|
||||||
|
|
||||||
|
In Python 3, direct iteration over mappings works the same way as it does
|
||||||
|
in Python 2. There are no method based equivalents - the semantic equivalents
|
||||||
|
of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are
|
||||||
|
``iter(d.values())`` and ``iter(d.iteritems())``.
|
||||||
|
|
||||||
|
The ``six`` and ``future.utils`` compatibility modules also both provide
|
||||||
|
``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that
|
||||||
|
provide efficient iterator semantics in both Python 2 and 3.
|
||||||
|
|
||||||
|
|
||||||
|
Set based dynamic views
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
The model that is provided in Python 3 as a method based API is that of set
|
||||||
|
based dynamic views (technically multisets in the case of the ``values()``
|
||||||
|
view).
|
||||||
|
|
||||||
|
In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and
|
||||||
|
``d. items()`` provide a live view of the current state of
|
||||||
|
the underlying object, rather than taking a full snapshot of the current
|
||||||
|
state as they did in Python 2. This change is safe in many circumstances,
|
||||||
|
but does mean that, as with the direct iteration API, it is necessary to
|
||||||
|
avoid adding or removing keys during iteration, in order to avoid
|
||||||
|
encountering the following error::
|
||||||
|
|
||||||
|
>>> d = dict(a=1)
|
||||||
|
>>> for k, v in d.items():
|
||||||
|
... del d[k]
|
||||||
|
...
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<stdin>", line 1, in <module>
|
||||||
|
RuntimeError: dictionary changed size during iteration
|
||||||
|
|
||||||
|
Unlike the iteration API, these objects are iterables, rather than iterators:
|
||||||
|
you can iterate over them multiple times, and each time they will iterate
|
||||||
|
over the entire underlying mapping.
|
||||||
|
|
||||||
|
These semantics are also available in Python 2.7 as the ``d.viewkeys()``,
|
||||||
|
``d.viewvalues()`` and ```d.viewitems()`` methods.
|
||||||
|
|
||||||
|
The ``future.utils`` compatibility module also provides
|
||||||
|
``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions
|
||||||
|
when running on Python 2.7 or Python 3.x.
|
||||||
|
|
||||||
|
|
||||||
|
Migrating directly to Python 3
|
||||||
|
==============================
|
||||||
|
|
||||||
|
The ``2to3`` migration tool handles direct migrations to Python 3 in
|
||||||
|
accordance with the semantic equivalents described above:
|
||||||
|
|
||||||
|
* ``d.keys()`` -> ``list(d.keys())``
|
||||||
|
* ``d.values()`` -> ``list(d.values())``
|
||||||
|
* ``d.items()`` -> ``list(d.items())``
|
||||||
|
* ``d.iterkeys()`` -> ``iter(d.keys())``
|
||||||
|
* ``d.itervalues()`` -> ``iter(d.values())``
|
||||||
|
* ``d.iteritems()`` -> ``iter(d.items())``
|
||||||
|
* ``d.viewkeys()`` -> ``d.keys()``
|
||||||
|
* ``d.viewvalues()`` -> ``d.values()``
|
||||||
|
* ``d.viewitems()`` -> ``d.items()``
|
||||||
|
|
||||||
|
Rather than 9 distinct mapping methods for iteration, there are now only the
|
||||||
|
3 view methods, which combine in straightforward ways with the two relevant
|
||||||
|
builtin functions to cover all of the behaviours that are available as
|
||||||
|
``dict`` methods in Python 2.7.
|
||||||
|
|
||||||
|
Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the
|
||||||
|
``2to3`` migration tool doesn't attempt that replacement.
|
||||||
|
|
||||||
|
The ``2to3`` migration tool also *does not* provide any automatic assistance
|
||||||
|
for migrating references to these objects as bound or unbound methods - it
|
||||||
|
only automates conversions where the API is called immediately.
|
||||||
|
|
||||||
|
|
||||||
|
Migrating to the common subset of Python 2 and 3
|
||||||
|
================================================
|
||||||
|
|
||||||
|
When migrating to the common subset of Python 2 and 3, the above
|
||||||
|
transformations are not generally appropriate, as they all either result in
|
||||||
|
the creation of a redundant list in Python 2, have unexpectedly different
|
||||||
|
semantics in at least some cases, or both.
|
||||||
|
|
||||||
|
Since most code running in the common subset of Python 2 and 3 supports
|
||||||
|
at least as far back as Python 2.6, the currently recommended approach to
|
||||||
|
conversion of mapping iteration operation depends on two helper functions
|
||||||
|
for efficient iteration over mapping values and mapping item tuples:
|
||||||
|
|
||||||
|
* ``d.keys()`` -> ``list(d)``
|
||||||
|
* ``d.values()`` -> ``list(itervalues(d))``
|
||||||
|
* ``d.items()`` -> ``list(iteritems(d))``
|
||||||
|
* ``d.iterkeys()`` -> ``iter(d)``
|
||||||
|
* ``d.itervalues()`` -> ``itervalues(d)``
|
||||||
|
* ``d.iteritems()`` -> ``iteritems(d)``
|
||||||
|
|
||||||
|
Both ``six`` and ``future.utils`` provide appropriate definitions of
|
||||||
|
``itervalues()`` and ``iteritems()`` (along with essentially redundant
|
||||||
|
definitions of ``iterkeys()``). Creating your own definitions of these
|
||||||
|
functions in a custom compatibility module is also relatively
|
||||||
|
straightforward::
|
||||||
|
|
||||||
|
try:
|
||||||
|
dict.iteritems
|
||||||
|
except AttributeError:
|
||||||
|
# Python 3
|
||||||
|
def itervalues(d):
|
||||||
|
return iter(d.values())
|
||||||
|
def iteritems(d):
|
||||||
|
return iter(d.items())
|
||||||
|
else:
|
||||||
|
# Python 2
|
||||||
|
def itervalues(d):
|
||||||
|
return d.itervalues()
|
||||||
|
def iteritems(d):
|
||||||
|
return d.iteritems()
|
||||||
|
|
||||||
|
The greatest loss of readability currently arises when converting code that
|
||||||
|
actually *needs* the list based snapshots that were the default in Python
|
||||||
|
2. This readability loss could likely be mitigated by also providing
|
||||||
|
``listvalues`` and ``listitems`` helper functions, allowing the affected
|
||||||
|
conversions to be simplified to:
|
||||||
|
|
||||||
|
* ``d.values()`` -> ``listvalues(d)``
|
||||||
|
* ``d.items()`` -> ``listitems(d)``
|
||||||
|
|
||||||
|
The corresponding compatibility function definitions are as straightforward
|
||||||
|
as their iterator counterparts::
|
||||||
|
|
||||||
|
try:
|
||||||
|
dict.iteritems
|
||||||
|
except AttributeError:
|
||||||
|
# Python 3
|
||||||
|
def listvalues(d):
|
||||||
|
return list(d.values())
|
||||||
|
def listitems(d):
|
||||||
|
return list(d.items())
|
||||||
|
else:
|
||||||
|
# Python 2
|
||||||
|
def listvalues(d):
|
||||||
|
return d.values()
|
||||||
|
def listitems(d):
|
||||||
|
return d.items()
|
||||||
|
|
||||||
|
With that expanded set of compatibility functions, Python 2 code would
|
||||||
|
then be converted to "idiomatic" hybrid 2/3 code as:
|
||||||
|
|
||||||
|
* ``d.keys()`` -> ``list(d)``
|
||||||
|
* ``d.values()`` -> ``listvalues(d)``
|
||||||
|
* ``d.items()`` -> ``listitems(d)``
|
||||||
|
* ``d.iterkeys()`` -> ``iter(d)``
|
||||||
|
* ``d.itervalues()`` -> ``itervalues(d)``
|
||||||
|
* ``d.iteritems()`` -> ``iteritems(d)``
|
||||||
|
|
||||||
|
This compares well for readability with the idiomatic pure Python 3
|
||||||
|
code that uses the mapping methods and builtins directly:
|
||||||
|
|
||||||
|
* ``d.keys()`` -> ``list(d)``
|
||||||
|
* ``d.values()`` -> ``list(d.values())``
|
||||||
|
* ``d.items()`` -> ``list(d.items())``
|
||||||
|
* ``d.iterkeys()`` -> ``iter(d)``
|
||||||
|
* ``d.itervalues()`` -> ``iter(d.values())``
|
||||||
|
* ``d.iteritems()`` -> ``iter(d.items())``
|
||||||
|
|
||||||
|
It's also notable that when using this approach, hybrid code would *never*
|
||||||
|
invoke the mapping methods directly: it would always invoke either a
|
||||||
|
builtin or helper function instead, in order to ensure the exact same
|
||||||
|
semantics on both Python 2 and 3.
|
||||||
|
|
||||||
|
|
||||||
|
Possible changes to Python 3.5+
|
||||||
|
===============================
|
||||||
|
|
||||||
|
The main proposal put forward to potentially aid migration of existing
|
||||||
|
Python 2 code to Python 3 is the restoration of some or all of the
|
||||||
|
alternate iteration APIs to the Python 3 mapping API. In particular,
|
||||||
|
the initial draft of this PEP proposed making the following conversions
|
||||||
|
possible when migrating to the common subset of Python 2 and Python 3.5+:
|
||||||
|
|
||||||
|
* ``d.keys()`` -> ``list(d)``
|
||||||
|
* ``d.values()`` -> ``list(d.itervalues())``
|
||||||
|
* ``d.items()`` -> ``list(d.iteritems())``
|
||||||
|
* ``d.iterkeys()`` -> ``d.iterkeys()``
|
||||||
|
* ``d.itervalues()`` -> ``d.itervalues()``
|
||||||
|
* ``d.iteritems()`` -> ``d.iteritems()``
|
||||||
|
|
||||||
|
Possible mitigations of the additional language complexity in Python 3
|
||||||
|
created by restoring these methods included immediately deprecating them,
|
||||||
|
as well as potentially hiding them from the ``dir()`` function (or perhaps
|
||||||
|
even defining a way to make ``pydoc`` aware of function deprecations).
|
||||||
|
|
||||||
|
However, in the case where the list output is actually desired, the end
|
||||||
|
result of that proposal is actually less readable than an appropriately
|
||||||
|
defined helper function, and the function and method forms of the iterator
|
||||||
|
versions are pretty much equivalent from a readability perspective.
|
||||||
|
|
||||||
|
So unless I've missed something critical, readily available ``listvalues()``
|
||||||
|
and ``listitems()`` helper functions look like they will improve the
|
||||||
|
readability of hybrid code more than anything we could add back to the
|
||||||
|
Python 3.5+ mapping API, and won't have any long term impact on the
|
||||||
|
complexity of Python 3 itself.
|
||||||
|
|
||||||
|
|
||||||
|
Discussion
|
||||||
|
==========
|
||||||
|
|
||||||
|
The fact that 5 years in to the Python 3 migration we still have users
|
||||||
|
considering the dict API changes a significant barrier to migration suggests
|
||||||
|
that there are problems with previously recommended approaches. This PEP
|
||||||
|
attempts to explore those issues and tries to isolate those cases where
|
||||||
|
previous advice (such as it was) could prove problematic.
|
||||||
|
|
||||||
|
My assessment (largely based on feedback from Twisted devs) is that
|
||||||
|
problems are most likely to arise when attempting to use ``d.keys()``,
|
||||||
|
``d.values()``, and ``d.items()`` in hybrid code. While superficially it
|
||||||
|
seems as though there should be cases where it is safe to ignore the
|
||||||
|
semantic differences, in practice, the change from "mutable snapshot" to
|
||||||
|
"dynamic view" is significant enough that it is likely better
|
||||||
|
to just force the use of either list or iterator semantics for hybrid code,
|
||||||
|
and leave the use of the view semantics to pure Python 3 code.
|
||||||
|
|
||||||
|
This approach also creates rules that are simple enough and safe enough that
|
||||||
|
it should be possible to automate them in code modernisation scripts that
|
||||||
|
target the common subset of Python 2 and Python 3, just as ``2to3`` converts
|
||||||
|
them automatically when targeting pure Python 3 code.
|
||||||
|
|
||||||
|
|
||||||
Acknowledgements
|
Acknowledgements
|
||||||
|
@ -109,6 +356,11 @@ vigorous discussion of this idea (and several other topics), and especially
|
||||||
to Hynek Schlawack for acting as a moderator when things got a little too
|
to Hynek Schlawack for acting as a moderator when things got a little too
|
||||||
heated :)
|
heated :)
|
||||||
|
|
||||||
|
Thanks also to JP Calderone and Itamar Turner-Trauring for their email
|
||||||
|
feedback, as well to the participants in the `python-dev review
|
||||||
|
<https://mail.python.org/pipermail/python-dev/2014-April/134168.html>`__ of
|
||||||
|
the initial version of the PEP.
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
=========
|
=========
|
||||||
|
|
Loading…
Reference in New Issue