PEP: 469 Title: Migration of dict iteration code to Python 3 Version: $Revision$ Last-Modified: $Date$ Author: Alyssa Coghlan Status: Withdrawn Type: Standards Track Content-Type: text/x-rst Created: 18-Apr-2014 Python-Version: 3.5 Post-History: 18-Apr-2014, 21-Apr-2014 Abstract ======== For Python 3, :pep:`3106` changed the design of the ``dict`` builtin and the mapping API in general to replace the separate list based and iterator based APIs in Python 2 with a merged, memory efficient set and multiset view based API. This new style of dict iteration was also added to the Python 2.7 ``dict`` type as a new set of iteration methods. This means that there are now 3 different kinds of dict iteration that may need to be migrated to Python 3 when an application makes the transition: * Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())`` * Iterator objects: ``d.iteritems()`` -> ``iter(d.items())`` * Set based dynamic views: ``d.viewitems()`` -> ``d.items()`` There is currently no widely agreed best practice on how to reliably convert all Python 2 dict iteration code to the common subset of Python 2 and 3, especially when test coverage of the ported code is limited. This PEP reviews the various ways the Python 2 iteration APIs may be accessed, and looks at the available options for migrating that code to Python 3 by way of the common subset of Python 2.6+ and Python 3.0+. The PEP also considers the question of whether or not there are any additions that may be worth making to Python 3.5 that may ease the transition process for application code that doesn't need to worry about supporting earlier versions when eventually making the leap to Python 3. PEP Withdrawal ============== In writing the second draft of this PEP, I came to the conclusion that the readability of hybrid Python 2/3 mapping code can actually be best enhanced by better helper functions rather than by making changes to Python 3.5+. The main value I now see in this PEP is as a clear record of the recommended approaches to migrating mapping iteration code from Python 2 to Python 3, as well as suggesting ways to keep things readable and maintainable when writing hybrid code that supports both versions. Notably, I recommend that hybrid code avoid calling mapping iteration methods directly, and instead rely on builtin functions where possible, and some additional helper functions for cases that would be a simple combination of a builtin and a mapping method in pure Python 3 code, but need to be handled slightly differently to get the exact same semantics in Python 2. Static code checkers like pylint could potentially be extended with an optional warning regarding direct use of the mapping iteration methods in a hybrid code base. Mapping iteration models ======================== Python 2.7 provides three different sets of methods to extract the keys, values and items from a ``dict`` instance, accounting for 9 out of the 18 public methods of the ``dict`` type. In Python 3, this has been rationalised to just 3 out of 11 public methods (as the ``has_key`` method has also been removed). Lists as mutable snapshots -------------------------- This is the oldest of the three styles of dict iteration, and hence the one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()`` methods in Python 2. These methods all return lists that are snapshots of the state of the mapping at the time the method was called. This has a few consequences: * the original object can be mutated freely without affecting iteration over the snapshot * the snapshot can be modified independently of the original object * the snapshot consumes memory proportional to the size of the original mapping The semantic equivalent of these operations in Python 3 are ``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``. Iterator objects ---------------- In Python 2.2, ``dict`` objects gained support for the then-new iterator protocol, allowing direct iteration over the keys stored in the dictionary, thus avoiding the need to build a list just to iterate over the dictionary contents one entry at a time. ``iter(d)`` provides direct access to the iterator object for the keys. Python 2 also provides a ``d.iterkeys()`` method that is essentially synonymous with ``iter(d)``, along with ``d.itervalues()`` and ``d.iteritems()`` methods. These iterators provide live views of the underlying object, and hence may fail if the set of keys in the underlying object is changed during iteration:: >>> d = dict(a=1) >>> for k in d: ... del d[k] ... Traceback (most recent call last): File "", line 1, in RuntimeError: dictionary changed size during iteration As iterators, iteration over these objects is also a one-time operation: once the iterator is exhausted, you have to go back to the original mapping in order to iterate again. In Python 3, direct iteration over mappings works the same way as it does in Python 2. There are no method based equivalents - the semantic equivalents of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are ``iter(d.values())`` and ``iter(d.items())``. The ``six`` and ``future.utils`` compatibility modules also both provide ``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that provide efficient iterator semantics in both Python 2 and 3. Set based dynamic views ----------------------- The model that is provided in Python 3 as a method based API is that of set based dynamic views (technically multisets in the case of the ``values()`` view). In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and ``d. items()`` provide a live view of the current state of the underlying object, rather than taking a full snapshot of the current state as they did in Python 2. This change is safe in many circumstances, but does mean that, as with the direct iteration API, it is necessary to avoid adding or removing keys during iteration, in order to avoid encountering the following error:: >>> d = dict(a=1) >>> for k, v in d.items(): ... del d[k] ... Traceback (most recent call last): File "", line 1, in RuntimeError: dictionary changed size during iteration Unlike the iteration API, these objects are iterables, rather than iterators: you can iterate over them multiple times, and each time they will iterate over the entire underlying mapping. These semantics are also available in Python 2.7 as the ``d.viewkeys()``, ``d.viewvalues()`` and ``d.viewitems()`` methods. The ``future.utils`` compatibility module also provides ``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions when running on Python 2.7 or Python 3.x. Migrating directly to Python 3 ============================== The ``2to3`` migration tool handles direct migrations to Python 3 in accordance with the semantic equivalents described above: * ``d.keys()`` -> ``list(d.keys())`` * ``d.values()`` -> ``list(d.values())`` * ``d.items()`` -> ``list(d.items())`` * ``d.iterkeys()`` -> ``iter(d.keys())`` * ``d.itervalues()`` -> ``iter(d.values())`` * ``d.iteritems()`` -> ``iter(d.items())`` * ``d.viewkeys()`` -> ``d.keys()`` * ``d.viewvalues()`` -> ``d.values()`` * ``d.viewitems()`` -> ``d.items()`` Rather than 9 distinct mapping methods for iteration, there are now only the 3 view methods, which combine in straightforward ways with the two relevant builtin functions to cover all of the behaviours that are available as ``dict`` methods in Python 2.7. Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the ``2to3`` migration tool doesn't attempt that replacement. The ``2to3`` migration tool also *does not* provide any automatic assistance for migrating references to these objects as bound or unbound methods - it only automates conversions where the API is called immediately. Migrating to the common subset of Python 2 and 3 ================================================ When migrating to the common subset of Python 2 and 3, the above transformations are not generally appropriate, as they all either result in the creation of a redundant list in Python 2, have unexpectedly different semantics in at least some cases, or both. Since most code running in the common subset of Python 2 and 3 supports at least as far back as Python 2.6, the currently recommended approach to conversion of mapping iteration operation depends on two helper functions for efficient iteration over mapping values and mapping item tuples: * ``d.keys()`` -> ``list(d)`` * ``d.values()`` -> ``list(itervalues(d))`` * ``d.items()`` -> ``list(iteritems(d))`` * ``d.iterkeys()`` -> ``iter(d)`` * ``d.itervalues()`` -> ``itervalues(d)`` * ``d.iteritems()`` -> ``iteritems(d)`` Both ``six`` and ``future.utils`` provide appropriate definitions of ``itervalues()`` and ``iteritems()`` (along with essentially redundant definitions of ``iterkeys()``). Creating your own definitions of these functions in a custom compatibility module is also relatively straightforward:: try: dict.iteritems except AttributeError: # Python 3 def itervalues(d): return iter(d.values()) def iteritems(d): return iter(d.items()) else: # Python 2 def itervalues(d): return d.itervalues() def iteritems(d): return d.iteritems() The greatest loss of readability currently arises when converting code that actually *needs* the list based snapshots that were the default in Python 2. This readability loss could likely be mitigated by also providing ``listvalues`` and ``listitems`` helper functions, allowing the affected conversions to be simplified to: * ``d.values()`` -> ``listvalues(d)`` * ``d.items()`` -> ``listitems(d)`` The corresponding compatibility function definitions are as straightforward as their iterator counterparts:: try: dict.iteritems except AttributeError: # Python 3 def listvalues(d): return list(d.values()) def listitems(d): return list(d.items()) else: # Python 2 def listvalues(d): return d.values() def listitems(d): return d.items() With that expanded set of compatibility functions, Python 2 code would then be converted to "idiomatic" hybrid 2/3 code as: * ``d.keys()`` -> ``list(d)`` * ``d.values()`` -> ``listvalues(d)`` * ``d.items()`` -> ``listitems(d)`` * ``d.iterkeys()`` -> ``iter(d)`` * ``d.itervalues()`` -> ``itervalues(d)`` * ``d.iteritems()`` -> ``iteritems(d)`` This compares well for readability with the idiomatic pure Python 3 code that uses the mapping methods and builtins directly: * ``d.keys()`` -> ``list(d)`` * ``d.values()`` -> ``list(d.values())`` * ``d.items()`` -> ``list(d.items())`` * ``d.iterkeys()`` -> ``iter(d)`` * ``d.itervalues()`` -> ``iter(d.values())`` * ``d.iteritems()`` -> ``iter(d.items())`` It's also notable that when using this approach, hybrid code would *never* invoke the mapping methods directly: it would always invoke either a builtin or helper function instead, in order to ensure the exact same semantics on both Python 2 and 3. Migrating from Python 3 to the common subset with Python 2.7 ============================================================ While the majority of migrations are currently from Python 2 either directly to Python 3 or to the common subset of Python 2 and Python 3, there are also some migrations of newer projects that start in Python 3 and then later add Python 2 support, either due to user demand, or to gain access to Python 2 libraries that are not yet available in Python 3 (and porting them to Python 3 or creating a Python 3 compatible replacement is not a trivial exercise). In these cases, Python 2.7 compatibility is often sufficient, and the 2.7+ only view based helper functions provided by ``future.utils`` allow the bare accesses to the Python 3 mapping view methods to be replaced with code that is compatible with both Python 2.7 and Python 3 (note, this is the only migration chart in the PEP that has Python 3 code on the left of the conversion): * ``d.keys()`` -> ``viewkeys(d)`` * ``d.values()`` -> ``viewvalues(d)`` * ``d.items()`` -> ``viewitems(d)`` * ``list(d.keys())`` -> ``list(d)`` * ``list(d.values())`` -> ``listvalues(d)`` * ``list(d.items())`` -> ``listitems(d)`` * ``iter(d.keys())`` -> ``iter(d)`` * ``iter(d.values())`` -> ``itervalues(d)`` * ``iter(d.items())`` -> ``iteritems(d)`` As with migrations from Python 2 to the common subset, note that the hybrid code ends up never invoking the mapping methods directly - it only calls builtins and helper methods, with the latter addressing the semantic differences between Python 2 and Python 3. Possible changes to Python 3.5+ =============================== The main proposal put forward to potentially aid migration of existing Python 2 code to Python 3 is the restoration of some or all of the alternate iteration APIs to the Python 3 mapping API. In particular, the initial draft of this PEP proposed making the following conversions possible when migrating to the common subset of Python 2 and Python 3.5+: * ``d.keys()`` -> ``list(d)`` * ``d.values()`` -> ``list(d.itervalues())`` * ``d.items()`` -> ``list(d.iteritems())`` * ``d.iterkeys()`` -> ``d.iterkeys()`` * ``d.itervalues()`` -> ``d.itervalues()`` * ``d.iteritems()`` -> ``d.iteritems()`` Possible mitigations of the additional language complexity in Python 3 created by restoring these methods included immediately deprecating them, as well as potentially hiding them from the ``dir()`` function (or perhaps even defining a way to make ``pydoc`` aware of function deprecations). However, in the case where the list output is actually desired, the end result of that proposal is actually less readable than an appropriately defined helper function, and the function and method forms of the iterator versions are pretty much equivalent from a readability perspective. So unless I've missed something critical, readily available ``listvalues()`` and ``listitems()`` helper functions look like they will improve the readability of hybrid code more than anything we could add back to the Python 3.5+ mapping API, and won't have any long-term impact on the complexity of Python 3 itself. Discussion ========== The fact that 5 years in to the Python 3 migration we still have users considering the dict API changes a significant barrier to migration suggests that there are problems with previously recommended approaches. This PEP attempts to explore those issues and tries to isolate those cases where previous advice (such as it was) could prove problematic. My assessment (largely based on feedback from Twisted devs) is that problems are most likely to arise when attempting to use ``d.keys()``, ``d.values()``, and ``d.items()`` in hybrid code. While superficially it seems as though there should be cases where it is safe to ignore the semantic differences, in practice, the change from "mutable snapshot" to "dynamic view" is significant enough that it is likely better to just force the use of either list or iterator semantics for hybrid code, and leave the use of the view semantics to pure Python 3 code. This approach also creates rules that are simple enough and safe enough that it should be possible to automate them in code modernisation scripts that target the common subset of Python 2 and Python 3, just as ``2to3`` converts them automatically when targeting pure Python 3 code. Acknowledgements ================ Thanks to the folks at the Twisted sprint table at PyCon for a very vigorous discussion of this idea (and several other topics), and especially to Hynek Schlawack for acting as a moderator when things got a little too heated :) Thanks also to JP Calderone and Itamar Turner-Trauring for their email feedback, as well to the participants in the `python-dev review `__ of the initial version of the PEP. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: