PEP 465: updated & withdrawn based on feedback
This PEP now reviews exactly what is involved in migrating mapping iteration code to Python 3, as well as to the hybrid 2/3 subset. It is now withdrawn, as I now believe enhancements to migration tools and libraries are a better option than making changes to Python 3.5+
This commit is contained in:
parent
c245d58ca5
commit
e58c3947d5
388
pep-0469.txt
388
pep-0469.txt
|
@ -1,14 +1,14 @@
|
|||
PEP: 469
|
||||
Title: Simplified migration of iterator-based mapping code to Python 3
|
||||
Title: Migration of dict iteration code to Python 3
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||
Status: Draft
|
||||
Status: Withdrawn
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 2014-04-18
|
||||
Python-Version: 3.5
|
||||
Post-History: 2014-04-18
|
||||
Post-History: 2014-04-18, 2014-04-21
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -17,88 +17,335 @@ Abstract
|
|||
For Python 3, PEP 3106 changed the design of the ``dict`` builtin and the
|
||||
mapping API in general to replace the separate list based and iterator based
|
||||
APIs in Python 2 with a merged, memory efficient set and multiset view
|
||||
based API.
|
||||
based API. This new style of dict iteration was also added to the Python 2.7
|
||||
``dict`` type as a new set of iteration methods.
|
||||
|
||||
This means that Python 3 code always requires an additional qualifier to
|
||||
reliably reproduce classic Python 2 mapping semantics:
|
||||
This means that there are now 3 different kinds of dict iteration that may
|
||||
need to be migrated to Python 3 when an application makes the transition:
|
||||
|
||||
* List based (e.g. ``d.keys()``): ``list(d.keys())``
|
||||
* Iterator based (e.g. ``d.iterkeys()``): ``iter(d.keys())``
|
||||
* Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())``
|
||||
* Iterator objects: ``d.iteritems()`` -> ``iter(d.items())``
|
||||
* Set based dynamic views: ``d.viewitems()`` -> ``d.items()``
|
||||
|
||||
Some Python 2 code that uses ``d.keys()`` may be migrated to Python 3
|
||||
(or the common subset of Python 2 and Python 3) without alteration, but
|
||||
*all* code using the iterator based API requires modification. Code that
|
||||
is migrating to the common subset of Python 2 and 3 and needs to retain the
|
||||
memory efficient implementation that avoids creating an unnecessary list
|
||||
object must switch away from using a method to instead using a helper
|
||||
function (such as those provided by the ``six`` module)
|
||||
There is currently no widely agreed best practice on how to reliably convert
|
||||
all Python 2 dict iteration code to the common subset of Python 2 and 3,
|
||||
especially when test coverage of the ported code is limited. This PEP
|
||||
reviews the various ways the Python 2 iteration APIs may be accessed, and
|
||||
looks at the available options for migrating that code to Python 3 by way of
|
||||
the common subset of Python 2.6+ and Python 3.0+.
|
||||
|
||||
To simplify the process of migrating Python 2 code that uses the existing
|
||||
iterator based APIs to Python 3, this PEP proposes the reintroduction
|
||||
of the Python 2 spelling of the iterator based semantics in Python 3.5, by
|
||||
restoring the following methods to the builtin ``dict`` API and the
|
||||
``collections.abc.Mapping`` ABC definition:
|
||||
|
||||
* ``iterkeys()``
|
||||
* ``itervalues()``
|
||||
* ``iteritems()``
|
||||
The PEP also considers the question of whether or not there are any
|
||||
additions that may be worth making to Python 3.5 that may ease the
|
||||
transition process for application code that doesn't need to worry about
|
||||
supporting earlier versions when eventually making the leap to Python 3.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
PEP Withdrawal
|
||||
==============
|
||||
|
||||
Methods with the following exact semantics will be added to the builtin
|
||||
``dict`` type and ``collections.abc.Mapping`` ABC::
|
||||
In writing the second draft of this PEP, I came to the conclusion that
|
||||
the readability of hybrid Python 2/3 mapping code can actually be best
|
||||
enhanced by better helper functions rather than by making changes to
|
||||
Python 3.5+. The main value I now see in this PEP is as a clear record
|
||||
of the recommended approaches to migrating mapping iteration code from
|
||||
Python 2 to Python 3, as well as suggesting ways to keep things readable
|
||||
and maintainable when writing hybrid code that supports both versions.
|
||||
|
||||
def iterkeys(self):
|
||||
return iter(self.keys())
|
||||
Notably, I recommend that hybrid code avoid calling mapping iteration
|
||||
methods directly, and instead rely on builtin functions where possible,
|
||||
and some additional helper functions for cases that would be a simple
|
||||
combination of a builtin and a mapping method in pure Python 3 code, but
|
||||
need to be handled slightly differently to get the exact same semantics in
|
||||
Python 2.
|
||||
|
||||
def itervalues(self):
|
||||
return iter(self.values())
|
||||
|
||||
def iteritems(self):
|
||||
return iter(self.items())
|
||||
|
||||
These semantics ensure that the methods also work as expected for subclasses
|
||||
of these base types.
|
||||
Static code checkers like pylint could potentially be extended with an
|
||||
optional warning regarding direct use of the mapping iteration methods in
|
||||
a hybrid code base.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
Mapping iteration models
|
||||
========================
|
||||
|
||||
Similar in spirit to PEP 414 (which restored explicit Unicode literal
|
||||
support in Python 3.3), this PEP is aimed primarily at helping users
|
||||
that currently feel punished for making use of a feature that needed to be
|
||||
requested explicitly in Python 2, but was effectively made the default
|
||||
behaviour in Python 3.
|
||||
Python 2.7 provides three different sets of methods to extract the keys,
|
||||
values and items from a ``dict`` instance, accounting for 9 out of the
|
||||
18 public methods of the ``dict`` type.
|
||||
|
||||
Users of list-based iteration in Python 2 that aren't actually relying on
|
||||
those semantics get a free memory efficiency improvement when migrating to
|
||||
Python 3, and face no additional difficulties when migrating via the common
|
||||
subset of Python 2 and 3.
|
||||
In Python 3, this has been rationalised to just 3 out of 11 public methods
|
||||
(as the ``has_key`` method has also been removed).
|
||||
|
||||
By contrast, users that actually want the increased efficiency may have
|
||||
faced a three phase migration process by the time they have fully migrated
|
||||
to Python 3:
|
||||
|
||||
* original migration to the iterator based APIs after they were added in
|
||||
Python 2.2
|
||||
* migration to a separate function based API in order to run in the common
|
||||
subset of Python 2 and 3
|
||||
* eventual migration back to unprefixed method APIs when finally dropping
|
||||
Python 2.7 support at some point in the future
|
||||
Lists as mutable snapshots
|
||||
--------------------------
|
||||
|
||||
The view based APIs that were added to Python 2.7 don't actually help with
|
||||
the transition process, as they don't exist in Python 3 and hence aren't
|
||||
part of the common subset of Python 2 and Python 3, and also aren't supported
|
||||
by most Python 2 mappings (including the collection ABCs).
|
||||
This is the oldest of the three styles of dict iteration, and hence the
|
||||
one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()``
|
||||
methods in Python 2.
|
||||
|
||||
This PEP proposes to just eliminate all that annoyance by making the iterator
|
||||
based APIs work again in Python 3.5+. As with the restoration of Unicode
|
||||
literals, it does add a bit of additional noise to the definition of Python
|
||||
3, but it does so while bringing a significant benefit in increasing the size
|
||||
of the common subset of Python 2 and Python 3 and so simplifying the process
|
||||
of migrating to Python 3 for affected Python 2 users.
|
||||
These methods all return lists that are snapshots of the state of the
|
||||
mapping at the time the method was called. This has a few consequences:
|
||||
|
||||
* the original object can be mutated freely without affecting iteration
|
||||
over the snapshot
|
||||
* the snapshot can be modified independently of the original object
|
||||
* the snapshot consumes memory proportional to the size of the original
|
||||
mapping
|
||||
|
||||
The semantic equivalent of these operations in Python 3 are
|
||||
``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``.
|
||||
|
||||
|
||||
Iterator objects
|
||||
----------------
|
||||
|
||||
In Python 2.2, ``dict`` objects gained support for the then-new iterator
|
||||
protocol, allowing direct iteration over the keys stored in the dictionary,
|
||||
thus avoiding the need to build a list just to iterate over the dictionary
|
||||
contents one entry at a time. ``iter(d)`` provides direct access to the
|
||||
iterator object for the keys.
|
||||
|
||||
Python 2 also provides a ``d.iterkeys()`` method that is essentially
|
||||
synonymous with ``iter(d)``, along with ``d.itervalues()`` and
|
||||
``d.iteritems()`` methods.
|
||||
|
||||
These iterators provide live views of the underlying object, and hence may
|
||||
fail if the set of keys in the underlying object is changed during
|
||||
iteration::
|
||||
|
||||
>>> d = dict(a=1)
|
||||
>>> for k in d:
|
||||
... del d[k]
|
||||
...
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
RuntimeError: dictionary changed size during iteration
|
||||
|
||||
As iterators, iteration over these objects is also a one-time operation:
|
||||
once the iterator is exhausted, you have to go back to the original mapping
|
||||
in order to iterate again.
|
||||
|
||||
In Python 3, direct iteration over mappings works the same way as it does
|
||||
in Python 2. There are no method based equivalents - the semantic equivalents
|
||||
of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are
|
||||
``iter(d.values())`` and ``iter(d.iteritems())``.
|
||||
|
||||
The ``six`` and ``future.utils`` compatibility modules also both provide
|
||||
``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that
|
||||
provide efficient iterator semantics in both Python 2 and 3.
|
||||
|
||||
|
||||
Set based dynamic views
|
||||
-----------------------
|
||||
|
||||
The model that is provided in Python 3 as a method based API is that of set
|
||||
based dynamic views (technically multisets in the case of the ``values()``
|
||||
view).
|
||||
|
||||
In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and
|
||||
``d. items()`` provide a live view of the current state of
|
||||
the underlying object, rather than taking a full snapshot of the current
|
||||
state as they did in Python 2. This change is safe in many circumstances,
|
||||
but does mean that, as with the direct iteration API, it is necessary to
|
||||
avoid adding or removing keys during iteration, in order to avoid
|
||||
encountering the following error::
|
||||
|
||||
>>> d = dict(a=1)
|
||||
>>> for k, v in d.items():
|
||||
... del d[k]
|
||||
...
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
RuntimeError: dictionary changed size during iteration
|
||||
|
||||
Unlike the iteration API, these objects are iterables, rather than iterators:
|
||||
you can iterate over them multiple times, and each time they will iterate
|
||||
over the entire underlying mapping.
|
||||
|
||||
These semantics are also available in Python 2.7 as the ``d.viewkeys()``,
|
||||
``d.viewvalues()`` and ```d.viewitems()`` methods.
|
||||
|
||||
The ``future.utils`` compatibility module also provides
|
||||
``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions
|
||||
when running on Python 2.7 or Python 3.x.
|
||||
|
||||
|
||||
Migrating directly to Python 3
|
||||
==============================
|
||||
|
||||
The ``2to3`` migration tool handles direct migrations to Python 3 in
|
||||
accordance with the semantic equivalents described above:
|
||||
|
||||
* ``d.keys()`` -> ``list(d.keys())``
|
||||
* ``d.values()`` -> ``list(d.values())``
|
||||
* ``d.items()`` -> ``list(d.items())``
|
||||
* ``d.iterkeys()`` -> ``iter(d.keys())``
|
||||
* ``d.itervalues()`` -> ``iter(d.values())``
|
||||
* ``d.iteritems()`` -> ``iter(d.items())``
|
||||
* ``d.viewkeys()`` -> ``d.keys()``
|
||||
* ``d.viewvalues()`` -> ``d.values()``
|
||||
* ``d.viewitems()`` -> ``d.items()``
|
||||
|
||||
Rather than 9 distinct mapping methods for iteration, there are now only the
|
||||
3 view methods, which combine in straightforward ways with the two relevant
|
||||
builtin functions to cover all of the behaviours that are available as
|
||||
``dict`` methods in Python 2.7.
|
||||
|
||||
Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the
|
||||
``2to3`` migration tool doesn't attempt that replacement.
|
||||
|
||||
The ``2to3`` migration tool also *does not* provide any automatic assistance
|
||||
for migrating references to these objects as bound or unbound methods - it
|
||||
only automates conversions where the API is called immediately.
|
||||
|
||||
|
||||
Migrating to the common subset of Python 2 and 3
|
||||
================================================
|
||||
|
||||
When migrating to the common subset of Python 2 and 3, the above
|
||||
transformations are not generally appropriate, as they all either result in
|
||||
the creation of a redundant list in Python 2, have unexpectedly different
|
||||
semantics in at least some cases, or both.
|
||||
|
||||
Since most code running in the common subset of Python 2 and 3 supports
|
||||
at least as far back as Python 2.6, the currently recommended approach to
|
||||
conversion of mapping iteration operation depends on two helper functions
|
||||
for efficient iteration over mapping values and mapping item tuples:
|
||||
|
||||
* ``d.keys()`` -> ``list(d)``
|
||||
* ``d.values()`` -> ``list(itervalues(d))``
|
||||
* ``d.items()`` -> ``list(iteritems(d))``
|
||||
* ``d.iterkeys()`` -> ``iter(d)``
|
||||
* ``d.itervalues()`` -> ``itervalues(d)``
|
||||
* ``d.iteritems()`` -> ``iteritems(d)``
|
||||
|
||||
Both ``six`` and ``future.utils`` provide appropriate definitions of
|
||||
``itervalues()`` and ``iteritems()`` (along with essentially redundant
|
||||
definitions of ``iterkeys()``). Creating your own definitions of these
|
||||
functions in a custom compatibility module is also relatively
|
||||
straightforward::
|
||||
|
||||
try:
|
||||
dict.iteritems
|
||||
except AttributeError:
|
||||
# Python 3
|
||||
def itervalues(d):
|
||||
return iter(d.values())
|
||||
def iteritems(d):
|
||||
return iter(d.items())
|
||||
else:
|
||||
# Python 2
|
||||
def itervalues(d):
|
||||
return d.itervalues()
|
||||
def iteritems(d):
|
||||
return d.iteritems()
|
||||
|
||||
The greatest loss of readability currently arises when converting code that
|
||||
actually *needs* the list based snapshots that were the default in Python
|
||||
2. This readability loss could likely be mitigated by also providing
|
||||
``listvalues`` and ``listitems`` helper functions, allowing the affected
|
||||
conversions to be simplified to:
|
||||
|
||||
* ``d.values()`` -> ``listvalues(d)``
|
||||
* ``d.items()`` -> ``listitems(d)``
|
||||
|
||||
The corresponding compatibility function definitions are as straightforward
|
||||
as their iterator counterparts::
|
||||
|
||||
try:
|
||||
dict.iteritems
|
||||
except AttributeError:
|
||||
# Python 3
|
||||
def listvalues(d):
|
||||
return list(d.values())
|
||||
def listitems(d):
|
||||
return list(d.items())
|
||||
else:
|
||||
# Python 2
|
||||
def listvalues(d):
|
||||
return d.values()
|
||||
def listitems(d):
|
||||
return d.items()
|
||||
|
||||
With that expanded set of compatibility functions, Python 2 code would
|
||||
then be converted to "idiomatic" hybrid 2/3 code as:
|
||||
|
||||
* ``d.keys()`` -> ``list(d)``
|
||||
* ``d.values()`` -> ``listvalues(d)``
|
||||
* ``d.items()`` -> ``listitems(d)``
|
||||
* ``d.iterkeys()`` -> ``iter(d)``
|
||||
* ``d.itervalues()`` -> ``itervalues(d)``
|
||||
* ``d.iteritems()`` -> ``iteritems(d)``
|
||||
|
||||
This compares well for readability with the idiomatic pure Python 3
|
||||
code that uses the mapping methods and builtins directly:
|
||||
|
||||
* ``d.keys()`` -> ``list(d)``
|
||||
* ``d.values()`` -> ``list(d.values())``
|
||||
* ``d.items()`` -> ``list(d.items())``
|
||||
* ``d.iterkeys()`` -> ``iter(d)``
|
||||
* ``d.itervalues()`` -> ``iter(d.values())``
|
||||
* ``d.iteritems()`` -> ``iter(d.items())``
|
||||
|
||||
It's also notable that when using this approach, hybrid code would *never*
|
||||
invoke the mapping methods directly: it would always invoke either a
|
||||
builtin or helper function instead, in order to ensure the exact same
|
||||
semantics on both Python 2 and 3.
|
||||
|
||||
|
||||
Possible changes to Python 3.5+
|
||||
===============================
|
||||
|
||||
The main proposal put forward to potentially aid migration of existing
|
||||
Python 2 code to Python 3 is the restoration of some or all of the
|
||||
alternate iteration APIs to the Python 3 mapping API. In particular,
|
||||
the initial draft of this PEP proposed making the following conversions
|
||||
possible when migrating to the common subset of Python 2 and Python 3.5+:
|
||||
|
||||
* ``d.keys()`` -> ``list(d)``
|
||||
* ``d.values()`` -> ``list(d.itervalues())``
|
||||
* ``d.items()`` -> ``list(d.iteritems())``
|
||||
* ``d.iterkeys()`` -> ``d.iterkeys()``
|
||||
* ``d.itervalues()`` -> ``d.itervalues()``
|
||||
* ``d.iteritems()`` -> ``d.iteritems()``
|
||||
|
||||
Possible mitigations of the additional language complexity in Python 3
|
||||
created by restoring these methods included immediately deprecating them,
|
||||
as well as potentially hiding them from the ``dir()`` function (or perhaps
|
||||
even defining a way to make ``pydoc`` aware of function deprecations).
|
||||
|
||||
However, in the case where the list output is actually desired, the end
|
||||
result of that proposal is actually less readable than an appropriately
|
||||
defined helper function, and the function and method forms of the iterator
|
||||
versions are pretty much equivalent from a readability perspective.
|
||||
|
||||
So unless I've missed something critical, readily available ``listvalues()``
|
||||
and ``listitems()`` helper functions look like they will improve the
|
||||
readability of hybrid code more than anything we could add back to the
|
||||
Python 3.5+ mapping API, and won't have any long term impact on the
|
||||
complexity of Python 3 itself.
|
||||
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
The fact that 5 years in to the Python 3 migration we still have users
|
||||
considering the dict API changes a significant barrier to migration suggests
|
||||
that there are problems with previously recommended approaches. This PEP
|
||||
attempts to explore those issues and tries to isolate those cases where
|
||||
previous advice (such as it was) could prove problematic.
|
||||
|
||||
My assessment (largely based on feedback from Twisted devs) is that
|
||||
problems are most likely to arise when attempting to use ``d.keys()``,
|
||||
``d.values()``, and ``d.items()`` in hybrid code. While superficially it
|
||||
seems as though there should be cases where it is safe to ignore the
|
||||
semantic differences, in practice, the change from "mutable snapshot" to
|
||||
"dynamic view" is significant enough that it is likely better
|
||||
to just force the use of either list or iterator semantics for hybrid code,
|
||||
and leave the use of the view semantics to pure Python 3 code.
|
||||
|
||||
This approach also creates rules that are simple enough and safe enough that
|
||||
it should be possible to automate them in code modernisation scripts that
|
||||
target the common subset of Python 2 and Python 3, just as ``2to3`` converts
|
||||
them automatically when targeting pure Python 3 code.
|
||||
|
||||
|
||||
Acknowledgements
|
||||
|
@ -109,6 +356,11 @@ vigorous discussion of this idea (and several other topics), and especially
|
|||
to Hynek Schlawack for acting as a moderator when things got a little too
|
||||
heated :)
|
||||
|
||||
Thanks also to JP Calderone and Itamar Turner-Trauring for their email
|
||||
feedback, as well to the participants in the `python-dev review
|
||||
<https://mail.python.org/pipermail/python-dev/2014-April/134168.html>`__ of
|
||||
the initial version of the PEP.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue