PEP 465: updated & withdrawn based on feedback

This PEP now reviews exactly what is involved in migrating
mapping iteration code to Python 3, as well as to the
hybrid 2/3 subset.

It is now withdrawn, as I now believe enhancements to
migration tools and libraries are a better option than
making changes to Python 3.5+
This commit is contained in:
Nick Coghlan 2014-04-21 00:19:58 -04:00
parent c245d58ca5
commit e58c3947d5
1 changed files with 320 additions and 68 deletions

View File

@ -1,14 +1,14 @@
PEP: 469
Title: Simplified migration of iterator-based mapping code to Python 3
Title: Migration of dict iteration code to Python 3
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Status: Withdrawn
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-04-18
Python-Version: 3.5
Post-History: 2014-04-18
Post-History: 2014-04-18, 2014-04-21
Abstract
@ -17,88 +17,335 @@ Abstract
For Python 3, PEP 3106 changed the design of the ``dict`` builtin and the
mapping API in general to replace the separate list based and iterator based
APIs in Python 2 with a merged, memory efficient set and multiset view
based API.
based API. This new style of dict iteration was also added to the Python 2.7
``dict`` type as a new set of iteration methods.
This means that Python 3 code always requires an additional qualifier to
reliably reproduce classic Python 2 mapping semantics:
This means that there are now 3 different kinds of dict iteration that may
need to be migrated to Python 3 when an application makes the transition:
* List based (e.g. ``d.keys()``): ``list(d.keys())``
* Iterator based (e.g. ``d.iterkeys()``): ``iter(d.keys())``
* Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())``
* Iterator objects: ``d.iteritems()`` -> ``iter(d.items())``
* Set based dynamic views: ``d.viewitems()`` -> ``d.items()``
Some Python 2 code that uses ``d.keys()`` may be migrated to Python 3
(or the common subset of Python 2 and Python 3) without alteration, but
*all* code using the iterator based API requires modification. Code that
is migrating to the common subset of Python 2 and 3 and needs to retain the
memory efficient implementation that avoids creating an unnecessary list
object must switch away from using a method to instead using a helper
function (such as those provided by the ``six`` module)
There is currently no widely agreed best practice on how to reliably convert
all Python 2 dict iteration code to the common subset of Python 2 and 3,
especially when test coverage of the ported code is limited. This PEP
reviews the various ways the Python 2 iteration APIs may be accessed, and
looks at the available options for migrating that code to Python 3 by way of
the common subset of Python 2.6+ and Python 3.0+.
To simplify the process of migrating Python 2 code that uses the existing
iterator based APIs to Python 3, this PEP proposes the reintroduction
of the Python 2 spelling of the iterator based semantics in Python 3.5, by
restoring the following methods to the builtin ``dict`` API and the
``collections.abc.Mapping`` ABC definition:
* ``iterkeys()``
* ``itervalues()``
* ``iteritems()``
The PEP also considers the question of whether or not there are any
additions that may be worth making to Python 3.5 that may ease the
transition process for application code that doesn't need to worry about
supporting earlier versions when eventually making the leap to Python 3.
Proposal
========
PEP Withdrawal
==============
Methods with the following exact semantics will be added to the builtin
``dict`` type and ``collections.abc.Mapping`` ABC::
In writing the second draft of this PEP, I came to the conclusion that
the readability of hybrid Python 2/3 mapping code can actually be best
enhanced by better helper functions rather than by making changes to
Python 3.5+. The main value I now see in this PEP is as a clear record
of the recommended approaches to migrating mapping iteration code from
Python 2 to Python 3, as well as suggesting ways to keep things readable
and maintainable when writing hybrid code that supports both versions.
def iterkeys(self):
return iter(self.keys())
Notably, I recommend that hybrid code avoid calling mapping iteration
methods directly, and instead rely on builtin functions where possible,
and some additional helper functions for cases that would be a simple
combination of a builtin and a mapping method in pure Python 3 code, but
need to be handled slightly differently to get the exact same semantics in
Python 2.
def itervalues(self):
return iter(self.values())
def iteritems(self):
return iter(self.items())
These semantics ensure that the methods also work as expected for subclasses
of these base types.
Static code checkers like pylint could potentially be extended with an
optional warning regarding direct use of the mapping iteration methods in
a hybrid code base.
Rationale
=========
Mapping iteration models
========================
Similar in spirit to PEP 414 (which restored explicit Unicode literal
support in Python 3.3), this PEP is aimed primarily at helping users
that currently feel punished for making use of a feature that needed to be
requested explicitly in Python 2, but was effectively made the default
behaviour in Python 3.
Python 2.7 provides three different sets of methods to extract the keys,
values and items from a ``dict`` instance, accounting for 9 out of the
18 public methods of the ``dict`` type.
Users of list-based iteration in Python 2 that aren't actually relying on
those semantics get a free memory efficiency improvement when migrating to
Python 3, and face no additional difficulties when migrating via the common
subset of Python 2 and 3.
In Python 3, this has been rationalised to just 3 out of 11 public methods
(as the ``has_key`` method has also been removed).
By contrast, users that actually want the increased efficiency may have
faced a three phase migration process by the time they have fully migrated
to Python 3:
* original migration to the iterator based APIs after they were added in
Python 2.2
* migration to a separate function based API in order to run in the common
subset of Python 2 and 3
* eventual migration back to unprefixed method APIs when finally dropping
Python 2.7 support at some point in the future
Lists as mutable snapshots
--------------------------
The view based APIs that were added to Python 2.7 don't actually help with
the transition process, as they don't exist in Python 3 and hence aren't
part of the common subset of Python 2 and Python 3, and also aren't supported
by most Python 2 mappings (including the collection ABCs).
This is the oldest of the three styles of dict iteration, and hence the
one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()``
methods in Python 2.
This PEP proposes to just eliminate all that annoyance by making the iterator
based APIs work again in Python 3.5+. As with the restoration of Unicode
literals, it does add a bit of additional noise to the definition of Python
3, but it does so while bringing a significant benefit in increasing the size
of the common subset of Python 2 and Python 3 and so simplifying the process
of migrating to Python 3 for affected Python 2 users.
These methods all return lists that are snapshots of the state of the
mapping at the time the method was called. This has a few consequences:
* the original object can be mutated freely without affecting iteration
over the snapshot
* the snapshot can be modified independently of the original object
* the snapshot consumes memory proportional to the size of the original
mapping
The semantic equivalent of these operations in Python 3 are
``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``.
Iterator objects
----------------
In Python 2.2, ``dict`` objects gained support for the then-new iterator
protocol, allowing direct iteration over the keys stored in the dictionary,
thus avoiding the need to build a list just to iterate over the dictionary
contents one entry at a time. ``iter(d)`` provides direct access to the
iterator object for the keys.
Python 2 also provides a ``d.iterkeys()`` method that is essentially
synonymous with ``iter(d)``, along with ``d.itervalues()`` and
``d.iteritems()`` methods.
These iterators provide live views of the underlying object, and hence may
fail if the set of keys in the underlying object is changed during
iteration::
>>> d = dict(a=1)
>>> for k in d:
... del d[k]
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
As iterators, iteration over these objects is also a one-time operation:
once the iterator is exhausted, you have to go back to the original mapping
in order to iterate again.
In Python 3, direct iteration over mappings works the same way as it does
in Python 2. There are no method based equivalents - the semantic equivalents
of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are
``iter(d.values())`` and ``iter(d.iteritems())``.
The ``six`` and ``future.utils`` compatibility modules also both provide
``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that
provide efficient iterator semantics in both Python 2 and 3.
Set based dynamic views
-----------------------
The model that is provided in Python 3 as a method based API is that of set
based dynamic views (technically multisets in the case of the ``values()``
view).
In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and
``d. items()`` provide a live view of the current state of
the underlying object, rather than taking a full snapshot of the current
state as they did in Python 2. This change is safe in many circumstances,
but does mean that, as with the direct iteration API, it is necessary to
avoid adding or removing keys during iteration, in order to avoid
encountering the following error::
>>> d = dict(a=1)
>>> for k, v in d.items():
... del d[k]
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Unlike the iteration API, these objects are iterables, rather than iterators:
you can iterate over them multiple times, and each time they will iterate
over the entire underlying mapping.
These semantics are also available in Python 2.7 as the ``d.viewkeys()``,
``d.viewvalues()`` and ```d.viewitems()`` methods.
The ``future.utils`` compatibility module also provides
``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions
when running on Python 2.7 or Python 3.x.
Migrating directly to Python 3
==============================
The ``2to3`` migration tool handles direct migrations to Python 3 in
accordance with the semantic equivalents described above:
* ``d.keys()`` -> ``list(d.keys())``
* ``d.values()`` -> ``list(d.values())``
* ``d.items()`` -> ``list(d.items())``
* ``d.iterkeys()`` -> ``iter(d.keys())``
* ``d.itervalues()`` -> ``iter(d.values())``
* ``d.iteritems()`` -> ``iter(d.items())``
* ``d.viewkeys()`` -> ``d.keys()``
* ``d.viewvalues()`` -> ``d.values()``
* ``d.viewitems()`` -> ``d.items()``
Rather than 9 distinct mapping methods for iteration, there are now only the
3 view methods, which combine in straightforward ways with the two relevant
builtin functions to cover all of the behaviours that are available as
``dict`` methods in Python 2.7.
Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the
``2to3`` migration tool doesn't attempt that replacement.
The ``2to3`` migration tool also *does not* provide any automatic assistance
for migrating references to these objects as bound or unbound methods - it
only automates conversions where the API is called immediately.
Migrating to the common subset of Python 2 and 3
================================================
When migrating to the common subset of Python 2 and 3, the above
transformations are not generally appropriate, as they all either result in
the creation of a redundant list in Python 2, have unexpectedly different
semantics in at least some cases, or both.
Since most code running in the common subset of Python 2 and 3 supports
at least as far back as Python 2.6, the currently recommended approach to
conversion of mapping iteration operation depends on two helper functions
for efficient iteration over mapping values and mapping item tuples:
* ``d.keys()`` -> ``list(d)``
* ``d.values()`` -> ``list(itervalues(d))``
* ``d.items()`` -> ``list(iteritems(d))``
* ``d.iterkeys()`` -> ``iter(d)``
* ``d.itervalues()`` -> ``itervalues(d)``
* ``d.iteritems()`` -> ``iteritems(d)``
Both ``six`` and ``future.utils`` provide appropriate definitions of
``itervalues()`` and ``iteritems()`` (along with essentially redundant
definitions of ``iterkeys()``). Creating your own definitions of these
functions in a custom compatibility module is also relatively
straightforward::
try:
dict.iteritems
except AttributeError:
# Python 3
def itervalues(d):
return iter(d.values())
def iteritems(d):
return iter(d.items())
else:
# Python 2
def itervalues(d):
return d.itervalues()
def iteritems(d):
return d.iteritems()
The greatest loss of readability currently arises when converting code that
actually *needs* the list based snapshots that were the default in Python
2. This readability loss could likely be mitigated by also providing
``listvalues`` and ``listitems`` helper functions, allowing the affected
conversions to be simplified to:
* ``d.values()`` -> ``listvalues(d)``
* ``d.items()`` -> ``listitems(d)``
The corresponding compatibility function definitions are as straightforward
as their iterator counterparts::
try:
dict.iteritems
except AttributeError:
# Python 3
def listvalues(d):
return list(d.values())
def listitems(d):
return list(d.items())
else:
# Python 2
def listvalues(d):
return d.values()
def listitems(d):
return d.items()
With that expanded set of compatibility functions, Python 2 code would
then be converted to "idiomatic" hybrid 2/3 code as:
* ``d.keys()`` -> ``list(d)``
* ``d.values()`` -> ``listvalues(d)``
* ``d.items()`` -> ``listitems(d)``
* ``d.iterkeys()`` -> ``iter(d)``
* ``d.itervalues()`` -> ``itervalues(d)``
* ``d.iteritems()`` -> ``iteritems(d)``
This compares well for readability with the idiomatic pure Python 3
code that uses the mapping methods and builtins directly:
* ``d.keys()`` -> ``list(d)``
* ``d.values()`` -> ``list(d.values())``
* ``d.items()`` -> ``list(d.items())``
* ``d.iterkeys()`` -> ``iter(d)``
* ``d.itervalues()`` -> ``iter(d.values())``
* ``d.iteritems()`` -> ``iter(d.items())``
It's also notable that when using this approach, hybrid code would *never*
invoke the mapping methods directly: it would always invoke either a
builtin or helper function instead, in order to ensure the exact same
semantics on both Python 2 and 3.
Possible changes to Python 3.5+
===============================
The main proposal put forward to potentially aid migration of existing
Python 2 code to Python 3 is the restoration of some or all of the
alternate iteration APIs to the Python 3 mapping API. In particular,
the initial draft of this PEP proposed making the following conversions
possible when migrating to the common subset of Python 2 and Python 3.5+:
* ``d.keys()`` -> ``list(d)``
* ``d.values()`` -> ``list(d.itervalues())``
* ``d.items()`` -> ``list(d.iteritems())``
* ``d.iterkeys()`` -> ``d.iterkeys()``
* ``d.itervalues()`` -> ``d.itervalues()``
* ``d.iteritems()`` -> ``d.iteritems()``
Possible mitigations of the additional language complexity in Python 3
created by restoring these methods included immediately deprecating them,
as well as potentially hiding them from the ``dir()`` function (or perhaps
even defining a way to make ``pydoc`` aware of function deprecations).
However, in the case where the list output is actually desired, the end
result of that proposal is actually less readable than an appropriately
defined helper function, and the function and method forms of the iterator
versions are pretty much equivalent from a readability perspective.
So unless I've missed something critical, readily available ``listvalues()``
and ``listitems()`` helper functions look like they will improve the
readability of hybrid code more than anything we could add back to the
Python 3.5+ mapping API, and won't have any long term impact on the
complexity of Python 3 itself.
Discussion
==========
The fact that 5 years in to the Python 3 migration we still have users
considering the dict API changes a significant barrier to migration suggests
that there are problems with previously recommended approaches. This PEP
attempts to explore those issues and tries to isolate those cases where
previous advice (such as it was) could prove problematic.
My assessment (largely based on feedback from Twisted devs) is that
problems are most likely to arise when attempting to use ``d.keys()``,
``d.values()``, and ``d.items()`` in hybrid code. While superficially it
seems as though there should be cases where it is safe to ignore the
semantic differences, in practice, the change from "mutable snapshot" to
"dynamic view" is significant enough that it is likely better
to just force the use of either list or iterator semantics for hybrid code,
and leave the use of the view semantics to pure Python 3 code.
This approach also creates rules that are simple enough and safe enough that
it should be possible to automate them in code modernisation scripts that
target the common subset of Python 2 and Python 3, just as ``2to3`` converts
them automatically when targeting pure Python 3 code.
Acknowledgements
@ -109,6 +356,11 @@ vigorous discussion of this idea (and several other topics), and especially
to Hynek Schlawack for acting as a moderator when things got a little too
heated :)
Thanks also to JP Calderone and Itamar Turner-Trauring for their email
feedback, as well to the participants in the `python-dev review
<https://mail.python.org/pipermail/python-dev/2014-April/134168.html>`__ of
the initial version of the PEP.
Copyright
=========