python-peps/pep-0443.txt

409 lines
14 KiB
Plaintext
Raw Normal View History

PEP: 443
Title: Single-dispatch generic functions
Version: $Revision$
Last-Modified: $Date$
Author: Łukasz Langa <lukasz@langa.pl>
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-May-2013
Post-History: 22-May-2013, 25-May-2013, 31-May-2013
Replaces: 245, 246, 3124
Abstract
========
This PEP proposes a new mechanism in the ``functools`` standard library
module that provides a simple form of generic programming known as
single-dispatch generic functions.
A **generic function** is composed of multiple functions implementing
the same operation for different types. Which implementation should be
used during a call is determined by the dispatch algorithm. When the
implementation is chosen based on the type of a single argument, this is
known as **single dispatch**.
Rationale and Goals
===================
Python has always provided a variety of built-in and standard-library
generic functions, such as ``len()``, ``iter()``, ``pprint.pprint()``,
``copy.copy()``, and most of the functions in the ``operator`` module.
However, it currently:
1. does not have a simple or straightforward way for developers to
create new generic functions,
2. does not have a standard way for methods to be added to existing
generic functions (i.e., some are added using registration
functions, others require defining ``__special__`` methods, possibly
by monkeypatching).
In addition, it is currently a common anti-pattern for Python code to
inspect the types of received arguments, in order to decide what to do
with the objects. For example, code may wish to accept either an object
of some type, or a sequence of objects of that type.
Currently, the "obvious way" to do this is by type inspection, but this
is brittle and closed to extension. Abstract Base Classes make it easier
to discover present behaviour, but don't help adding new behaviour.
A developer using an already-written library may be unable to change how
their objects are treated by such code, especially if the objects they
are using were created by a third party.
Therefore, this PEP proposes a uniform API to address dynamic
overloading using decorators.
User API
========
To define a generic function, decorate it with the ``@singledispatch``
decorator. Note that the dispatch happens on the type of the first
argument, create your function accordingly::
>>> from functools import singledispatch
>>> @singledispatch
... def fun(arg, verbose=False):
... if verbose:
... print("Let me just say,", end=" ")
... print(arg)
To add overloaded implementations to the function, use the
``register()`` attribute of the generic function. It is a decorator,
taking a type parameter and decorating a function implementing the
operation for that type::
>>> @fun.register(int)
... def _(arg, verbose=False):
... if verbose:
... print("Strength in numbers, eh?", end=" ")
... print(arg)
...
>>> @fun.register(list)
... def _(arg, verbose=False):
... if verbose:
... print("Enumerate this:")
... for i, elem in enumerate(arg):
... print(i, elem)
To enable registering lambdas and pre-existing functions, the
``register()`` attribute can be used in a functional form::
>>> def nothing(arg, verbose=False):
... print("Nothing.")
...
>>> fun.register(type(None), nothing)
The ``register()`` attribute returns the undecorated function which
2013-05-24 10:12:04 -04:00
enables decorator stacking, pickling, as well as creating unit tests for
each variant independently::
>>> @fun.register(float)
... @fun.register(Decimal)
... def fun_num(arg, verbose=False):
... if verbose:
... print("Half of your number:", end=" ")
... print(arg / 2)
...
>>> fun_num is fun
False
When called, the generic function dispatches on the type of the first
argument::
>>> fun("Hello, world.")
Hello, world.
>>> fun("test.", verbose=True)
Let me just say, test.
>>> fun(42, verbose=True)
Strength in numbers, eh? 42
>>> fun(['spam', 'spam', 'eggs', 'spam'], verbose=True)
Enumerate this:
0 spam
1 spam
2 eggs
3 spam
>>> fun(None)
Nothing.
2013-05-24 10:12:04 -04:00
>>> fun(1.23)
0.615
Where there is no registered implementation for a specific type, its
method resolution order is used to find a more generic implementation.
To check which implementation will the generic function choose for
a given type, use the ``dispatch()`` attribute::
2013-05-24 10:12:04 -04:00
>>> fun.dispatch(float)
<function fun_num at 0x104319058>
>>> fun.dispatch(dict)
<function fun at 0x103fe4788>
To access all registered implementations, use the read-only ``registry``
2013-05-25 14:05:45 -04:00
attribute::
>>> fun.registry.keys()
dict_keys([<class 'NoneType'>, <class 'int'>, <class 'object'>,
<class 'decimal.Decimal'>, <class 'list'>,
<class 'float'>])
>>> fun.registry[float]
<function fun_num at 0x1035a2840>
>>> fun.registry[object]
<function fun at 0x103170788>
The proposed API is intentionally limited and opinionated, as to ensure
it is easy to explain and use, as well as to maintain consistency with
existing members in the ``functools`` module.
Implementation Notes
====================
The functionality described in this PEP is already implemented in the
``pkgutil`` standard library module as ``simplegeneric``. Because this
implementation is mature, the goal is to move it largely as-is. The
reference implementation is available on hg.python.org [#ref-impl]_.
The dispatch type is specified as a decorator argument. An alternative
form using function annotations has been considered but its inclusion
has been deferred. As of May 2013, this usage pattern is out of scope
for the standard library [#pep-0008]_ and the best practices for
annotation usage are still debated.
Based on the current ``pkgutil.simplegeneric`` implementation and
following the convention on registering virtual subclasses on Abstract
Base Classes, the dispatch registry will not be thread-safe.
Abstract Base Classes
---------------------
The ``pkgutil.simplegeneric`` implementation relied on several forms of
method resultion order (MRO). ``@singledispatch`` removes special
handling of old-style classes and Zope's ExtensionClasses. More
importantly, it introduces support for Abstract Base Classes (ABC).
When a generic function implementation is registered for an ABC, the
dispatch algorithm switches to a mode of MRO calculation for the
provided argument which includes the relevant ABCs. The algorithm is as
follows::
def _compose_mro(cls, haystack):
"""Calculates the MRO for a given class `cls`, including relevant
abstract base classes from `haystack`."""
bases = set(cls.__mro__)
mro = list(cls.__mro__)
for regcls in haystack:
if regcls in bases or not issubclass(cls, regcls):
continue # either present in the __mro__ or unrelated
for index, base in enumerate(mro):
if not issubclass(base, regcls):
break
if base in bases and not issubclass(regcls, base):
# Conflict resolution: put classes present in __mro__
# and their subclasses first.
index += 1
mro.insert(index, regcls)
return mro
In its most basic form, it returns the MRO for the given type::
2013-05-25 10:21:03 -04:00
>>> _compose_mro(dict, [])
[<class 'dict'>, <class 'object'>]
When the haystack consists of ABCs that the specified type is a subclass
of, they are inserted in a predictable order::
2013-05-25 10:21:03 -04:00
>>> _compose_mro(dict, [Sized, MutableMapping, str,
... Sequence, Iterable])
[<class 'dict'>, <class 'collections.abc.MutableMapping'>,
<class 'collections.abc.Iterable'>, <class 'collections.abc.Sized'>,
<class 'object'>]
2013-05-25 16:06:02 -04:00
While this mode of operation is significantly slower, all dispatch
decisions are cached. The cache is invalidated on registering new
implementations on the generic function or when user code calls
``register()`` on an ABC to register a new virtual subclass. In the
latter case, it is possible to create a situation with ambiguous
dispatch, for instance::
>>> from collections import Iterable, Container
>>> class P:
... pass
>>> Iterable.register(P)
<class '__main__.P'>
>>> Container.register(P)
<class '__main__.P'>
Faced with ambiguity, ``@singledispatch`` refuses the temptation to
guess::
>>> @singledispatch
... def g(arg):
... return "base"
...
>>> g.register(Iterable, lambda arg: "iterable")
<function <lambda> at 0x108b49110>
>>> g.register(Container, lambda arg: "container")
<function <lambda> at 0x108b491c8>
>>> g(P())
Traceback (most recent call last):
...
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>
Note that this exception would not be raised if ``Iterable`` and
``Container`` had been provided as base classes during class definition.
In this case dispatch happens in the MRO order::
>>> class Ten(Iterable, Container):
... def __iter__(self):
... for i in range(10):
... yield i
... def __contains__(self, value):
... return value in range(10)
...
>>> g(Ten())
'iterable'
Usage Patterns
==============
This PEP proposes extending behaviour only of functions specifically
marked as generic. Just as a base class method may be overridden by
a subclass, so too may a function be overloaded to provide custom
functionality for a given type.
Universal overloading does not equal *arbitrary* overloading, in the
sense that we need not expect people to randomly redefine the behavior
of existing functions in unpredictable ways. To the contrary, generic
function usage in actual programs tends to follow very predictable
patterns and registered implementations are highly-discoverable in the
common case.
If a module is defining a new generic operation, it will usually also
define any required implementations for existing types in the same
place. Likewise, if a module is defining a new type, then it will
usually define implementations there for any generic functions that it
knows or cares about. As a result, the vast majority of registered
implementations can be found adjacent to either the function being
overloaded, or to a newly-defined type for which the implementation is
adding support.
It is only in rather infrequent cases that one will have implementations
registered in a module that contains neither the function nor the
type(s) for which the implementation is added. In the absence of
incompetence or deliberate intention to be obscure, the few
implementations that are not registered adjacent to the relevant type(s)
or function(s), will generally not need to be understood or known about
outside the scope where those implementations are defined. (Except in
the "support modules" case, where best practice suggests naming them
accordingly.)
As mentioned earlier, single-dispatch generics are already prolific
throughout the standard library. A clean, standard way of doing them
provides a way forward to refactor those custom implementations to use
a common one, opening them up for user extensibility at the same time.
Alternative approaches
======================
In PEP 3124 [#pep-3124]_ Phillip J. Eby proposes a full-grown solution
with overloading based on arbitrary rule sets (with the default
implementation dispatching on argument types), as well as interfaces,
adaptation and method combining. PEAK-Rules [#peak-rules]_ is
a reference implementation of the concepts described in PJE's PEP.
Such a broad approach is inherently complex, which makes reaching
a consensus hard. In contrast, this PEP focuses on a single piece of
functionality that is simple to reason about. It's important to note
this does not preclude the use of other approaches now or in the future.
In a 2005 article on Artima [#artima2005]_ Guido van Rossum presents
a generic function implementation that dispatches on types of all
arguments on a function. The same approach was chosen in Andrey Popp's
``generic`` package available on PyPI [#pypi-generic]_, as well as David
Mertz's ``gnosis.magic.multimethods`` [#gnosis-multimethods]_.
While this seems desirable at first, I agree with Fredrik Lundh's
comment that "if you design APIs with pages of logic just to sort out
what code a function should execute, you should probably hand over the
API design to someone else". In other words, the single argument
approach proposed in this PEP is not only easier to implement but also
clearly communicates that dispatching on a more complex state is an
anti-pattern. It also has the virtue of corresponding directly with the
familiar method dispatch mechanism in object oriented programming. The
only difference is whether the custom implementation is associated more
closely with the data (object-oriented methods) or the algorithm
(single-dispatch overloading).
PyPy's RPython offers ``extendabletype`` [#pairtype]_, a metaclass which
enables classes to be externally extended. In combination with
``pairtype()`` and ``pair()`` factories, this offers a form of
single-dispatch generics.
Acknowledgements
================
Apart from Phillip J. Eby's work on PEP 3124 [#pep-3124]_ and
PEAK-Rules, influences include Paul Moore's original issue
[#issue-5135]_ that proposed exposing ``pkgutil.simplegeneric`` as part
of the ``functools`` API, Guido van Rossum's article on multimethods
[#artima2005]_, and discussions with Raymond Hettinger on a general
pprint rewrite. Huge thanks to Nick Coghlan for encouraging me to create
this PEP and providing initial feedback.
References
==========
.. [#ref-impl]
http://hg.python.org/features/pep-443/file/tip/Lib/functools.py#l359
.. [#pep-0008] PEP 8 states in the "Programming Recommendations"
section that "the Python standard library will not use function
annotations as that would result in a premature commitment to
a particular annotation style".
(http://www.python.org/dev/peps/pep-0008)
.. [#pep-3124] http://www.python.org/dev/peps/pep-3124/
.. [#peak-rules] http://peak.telecommunity.com/DevCenter/PEAK_2dRules
.. [#artima2005]
http://www.artima.com/weblogs/viewpost.jsp?thread=101605
.. [#pypi-generic] http://pypi.python.org/pypi/generic
.. [#gnosis-multimethods]
http://gnosis.cx/publish/programming/charming_python_b12.html
.. [#pairtype]
https://bitbucket.org/pypy/pypy/raw/default/rpython/tool/pairtype.py
.. [#issue-5135] http://bugs.python.org/issue5135
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: