Restify PEP 307 (#229)

This commit is contained in:
Serhiy Storchaka 2017-03-27 22:18:45 +03:00 committed by Brett Cannon
parent a53392a0f0
commit a3062a8744
1 changed files with 620 additions and 579 deletions

View File

@ -5,12 +5,12 @@ Last-Modified: $Date$
Author: Guido van Rossum, Tim Peters Author: Guido van Rossum, Tim Peters
Status: Final Status: Final
Type: Standards Track Type: Standards Track
Content-Type: text/plain Content-Type: text/x-rst
Created: 31-Jan-2003 Created: 31-Jan-2003
Post-History: 7-Feb-2003 Post-History: 7-Feb-2003
Introduction Introduction
============
Pickling new-style objects in Python 2.2 is done somewhat clumsily Pickling new-style objects in Python 2.2 is done somewhat clumsily
and causes pickle size to bloat compared to classic class and causes pickle size to bloat compared to classic class
@ -23,7 +23,7 @@ Introduction
must be specified. This PEP focuses on API issues, although it must be specified. This PEP focuses on API issues, although it
may occasionally touch on byte stream format details to motivate a may occasionally touch on byte stream format details to motivate a
choice. The pickle byte stream format is documented formally by choice. The pickle byte stream format is documented formally by
the standard library module pickletools.py (already checked into the standard library module ``pickletools.py`` (already checked into
CVS for Python 2.3). CVS for Python 2.3).
This PEP attempts to fully document the interface between pickled This PEP attempts to fully document the interface between pickled
@ -34,9 +34,10 @@ Introduction
Motivation Motivation
==========
Pickling new-style objects causes serious pickle bloat. For Pickling new-style objects causes serious pickle bloat. For
example, example::
class C(object): # Omit "(object)" for classic class class C(object): # Omit "(object)" for classic class
pass pass
@ -48,7 +49,7 @@ Motivation
the new-style object 86 bytes. the new-style object 86 bytes.
The reasons for the bloat are complex, but are mostly caused by The reasons for the bloat are complex, but are mostly caused by
the fact that new-style objects use __reduce__ in order to be the fact that new-style objects use ``__reduce__`` in order to be
picklable at all. After ample consideration we've concluded that picklable at all. After ample consideration we've concluded that
the only way to reduce pickle sizes for new-style objects is to the only way to reduce pickle sizes for new-style objects is to
add new opcodes to the pickle protocol. The net result is that add new opcodes to the pickle protocol. The net result is that
@ -58,6 +59,7 @@ Motivation
Protocol versions Protocol versions
=================
Previously, pickling (but not unpickling) distinguished between Previously, pickling (but not unpickling) distinguished between
text mode and binary mode. By design, binary mode is a text mode and binary mode. By design, binary mode is a
@ -74,8 +76,8 @@ Protocol versions
inserted at the start of a protocol 2 pickle indicating that it is inserted at the start of a protocol 2 pickle indicating that it is
using protocol 2. To date, each release of Python has been able to using protocol 2. To date, each release of Python has been able to
read pickles written by all previous releases. Of course pickles read pickles written by all previous releases. Of course pickles
written under protocol N can't be read by versions of Python written under protocol *N* can't be read by versions of Python
earlier than the one that introduced protocol N. earlier than the one that introduced protocol *N*.
Several functions, methods and constructors used for pickling used Several functions, methods and constructors used for pickling used
to take a positional argument named 'bin' which was a flag, to take a positional argument named 'bin' which was a flag,
@ -90,32 +92,33 @@ Protocol versions
This works in previous Python versions, too, and so can be used to This works in previous Python versions, too, and so can be used to
select the highest protocol available in a way that's both backward select the highest protocol available in a way that's both backward
and forward compatible. In addition, a new module constant and forward compatible. In addition, a new module constant
HIGHEST_PROTOCOL is supplied by both pickle and cPickle, equal to ``HIGHEST_PROTOCOL`` is supplied by both ``pickle`` and ``cPickle``, equal to
the highest protocol number the module can read. This is cleaner the highest protocol number the module can read. This is cleaner
than passing -1, but cannot be used before Python 2.3. than passing -1, but cannot be used before Python 2.3.
The pickle.py module has supported passing the 'bin' value as a The ``pickle.py`` module has supported passing the 'bin' value as a
keyword argument rather than a positional argument. (This is not keyword argument rather than a positional argument. (This is not
recommended, since cPickle only accepts positional arguments, but recommended, since ``cPickle`` only accepts positional arguments, but
it works...) Passing 'bin' as a keyword argument is deprecated, it works...) Passing 'bin' as a keyword argument is deprecated,
and a PendingDeprecationWarning is issued in this case. You have and a ``PendingDeprecationWarning`` is issued in this case. You have
to invoke the Python interpreter with -Wa or a variation on that to invoke the Python interpreter with ``-Wa`` or a variation on that
to see PendingDeprecationWarning messages. In Python 2.4, the to see ``PendingDeprecationWarning`` messages. In Python 2.4, the
warning class may be upgraded to DeprecationWarning. warning class may be upgraded to ``DeprecationWarning``.
Security issues Security issues
===============
In previous versions of Python, unpickling would do a "safety In previous versions of Python, unpickling would do a "safety
check" on certain operations, refusing to call functions or check" on certain operations, refusing to call functions or
constructors that weren't marked as "safe for unpickling" by constructors that weren't marked as "safe for unpickling" by
either having an attribute __safe_for_unpickling__ set to 1, or by either having an attribute ``__safe_for_unpickling__`` set to 1, or by
being registered in a global registry, copy_reg.safe_constructors. being registered in a global registry, ``copy_reg.safe_constructors``.
This feature gives a false sense of security: nobody has ever done This feature gives a false sense of security: nobody has ever done
the necessary, extensive, code audit to prove that unpickling the necessary, extensive, code audit to prove that unpickling
untrusted pickles cannot invoke unwanted code, and in fact bugs in untrusted pickles cannot invoke unwanted code, and in fact bugs in
the Python 2.2 pickle.py module make it easy to circumvent these the Python 2.2 ``pickle.py`` module make it easy to circumvent these
security measures. security measures.
We firmly believe that, on the Internet, it is better to know that We firmly believe that, on the Internet, it is better to know that
@ -127,144 +130,158 @@ Security issues
Therefore, as of Python 2.3, all safety checks on unpickling are Therefore, as of Python 2.3, all safety checks on unpickling are
officially removed, and replaced with this warning: officially removed, and replaced with this warning:
*** Do not unpickle data received from an untrusted or .. warning::
unauthenticated source ***
Do not unpickle data received from an untrusted or
unauthenticated source.
The same warning applies to previous Python versions, despite the The same warning applies to previous Python versions, despite the
presence of safety checks there. presence of safety checks there.
Extended __reduce__ API Extended ``__reduce__`` API
===========================
There are several APIs that a class can use to control pickling. There are several APIs that a class can use to control pickling.
Perhaps the most popular of these are __getstate__ and Perhaps the most popular of these are ``__getstate__`` and
__setstate__; but the most powerful one is __reduce__. (There's ``__setstate__``; but the most powerful one is ``__reduce__``. (There's
also __getinitargs__, and we're adding __getnewargs__ below.) also ``__getinitargs__``, and we're adding ``__getnewargs__`` below.)
There are several ways to provide __reduce__ functionality: a There are several ways to provide ``__reduce__`` functionality: a
class can implement a __reduce__ method or a __reduce_ex__ method class can implement a ``__reduce__`` method or a ``__reduce_ex__`` method
(see next section), or a reduce function can be declared in (see next section), or a reduce function can be declared in
copy_reg (copy_reg.dispatch_table maps classes to functions). The ``copy_reg`` (``copy_reg.dispatch_table`` maps classes to functions). The
return values are interpreted exactly the same, though, and we'll return values are interpreted exactly the same, though, and we'll
refer to these collectively as __reduce__. refer to these collectively as ``__reduce__``.
IMPORTANT: pickling of classic class instances does not look for a **Important:** pickling of classic class instances does not look for a
__reduce__ or __reduce_ex__ method or a reduce function in the ``__reduce__`` or ``__reduce_ex__`` method or a reduce function in the
copy_reg dispatch table, so that a classic class cannot provide ``copy_reg`` dispatch table, so that a classic class cannot provide
__reduce__ functionality in the sense intended here. A classic ``__reduce__`` functionality in the sense intended here. A classic
class must use __getinitargs__ and/or __getstate__ to customize class must use ``__getinitargs__`` and/or ``__getstate__`` to customize
pickling. These are described below. pickling. These are described below.
__reduce__ must return either a string or a tuple. If it returns ``__reduce__`` must return either a string or a tuple. If it returns
a string, this is an object whose state is not to be pickled, but a string, this is an object whose state is not to be pickled, but
instead a reference to an equivalent object referenced by name. instead a reference to an equivalent object referenced by name.
Surprisingly, the string returned by __reduce__ should be the Surprisingly, the string returned by ``__reduce__`` should be the
object's local name (relative to its module); the pickle module object's local name (relative to its module); the ``pickle`` module
searches the module namespace to determine the object's module. searches the module namespace to determine the object's module.
The rest of this section is concerned with the tuple returned by The rest of this section is concerned with the tuple returned by
__reduce__. It is a variable size tuple, of length 2 through 5. ``__reduce__``. It is a variable size tuple, of length 2 through 5.
The first two items (function and arguments) are required. The The first two items (function and arguments) are required. The
remaining items are optional and may be left off from the end; remaining items are optional and may be left off from the end;
giving None for the value of an optional item acts the same as giving ``None`` for the value of an optional item acts the same as
leaving it off. The last two items are new in this PEP. The items leaving it off. The last two items are new in this PEP. The items
are, in order: are, in order:
function Required. +-----------+---------------------------------------------------------------+
A callable object (not necessarily a function) called | function | Required. |
to create the initial version of the object; state | | |
may be added to the object later to fully reconstruct | | A callable object (not necessarily a function) called |
the pickled state. This function must itself be | | to create the initial version of the object; state |
picklable. See the section about __newobj__ for a | | may be added to the object later to fully reconstruct |
special case (new in this PEP) here. | | the pickled state. This function must itself be |
| | picklable. See the section about ``__newobj__`` for a |
| | special case (new in this PEP) here. |
+-----------+---------------------------------------------------------------+
| arguments | Required. |
| | |
| | A tuple giving the argument list for the function. |
| | As a special case, designed for Zope 2's |
| | ``ExtensionClass``, this may be ``None``; in that case, |
| | function should be a class or type, and |
| | ``function.__basicnew__()`` is called to create the |
| | initial version of the object. This exception is |
| | deprecated. |
+-----------+---------------------------------------------------------------+
arguments Required. Unpickling invokes ``function(*arguments)`` to create an initial object,
A tuple giving the argument list for the function. called *obj* below. If the remaining items are left off, that's the end
As a special case, designed for Zope 2's of unpickling for this object and *obj* is the result. Else *obj* is
ExtensionClass, this may be None; in that case,
function should be a class or type, and
function.__basicnew__() is called to create the
initial version of the object. This exception is
deprecated.
Unpickling invokes function(*arguments) to create an initial object,
called obj below. If the remaining items are left off, that's the end
of unpickling for this object and obj is the result. Else obj is
modified at unpickling time by each item specified, as follows. modified at unpickling time by each item specified, as follows.
state Optional. +-----------+---------------------------------------------------------------+
Additional state. If this is not None, the state is | state | Optional. |
pickled, and obj.__setstate__(state) will be called | | |
when unpickling. If no __setstate__ method is | | Additional state. If this is not ``None``, the state is |
defined, a default implementation is provided, which | | pickled, and ``obj.__setstate__(state)`` will be called |
assumes that state is a dictionary mapping instance | | when unpickling. If no ``__setstate__`` method is |
variable names to their values. The default | | defined, a default implementation is provided, which |
implementation calls | | assumes that state is a dictionary mapping instance |
| | variable names to their values. The default |
| | implementation calls :: |
| | |
| | obj.__dict__.update(state) |
| | |
| | or, if the ``update()`` call fails, :: |
| | |
| | for k, v in state.items(): |
| | setattr(obj, k, v) |
+-----------+---------------------------------------------------------------+
| listitems | Optional, and new in this PEP. |
| | |
| | If this is not ``None``, it should be an iterator (not a |
| | sequence!) yielding successive list items. These list |
| | items will be pickled, and appended to the object using |
| | either ``obj.append(item)`` or ``obj.extend(list_of_items)``. |
| | This is primarily used for ``list`` subclasses, but may |
| | be used by other classes as long as they have ``append()`` |
| | and ``extend()`` methods with the appropriate signature. |
| | (Whether ``append()`` or ``extend()`` is used depends on which|
| | pickle protocol version is used as well as the number |
| | of items to append, so both must be supported.) |
+-----------+---------------------------------------------------------------+
| dictitems | Optional, and new in this PEP. |
| | |
| | If this is not ``None``, it should be an iterator (not a |
| | sequence!) yielding successive dictionary items, which |
| | should be tuples of the form ``(key, value)``. These items |
| | will be pickled, and stored to the object using |
| | ``obj[key] = value``. This is primarily used for ``dict`` |
| | subclasses, but may be used by other classes as long |
| | as they implement ``__setitem__``. |
+-----------+---------------------------------------------------------------+
obj.__dict__.update(state) Note: in Python 2.2 and before, when using ``cPickle``, state would be
pickled if present even if it is ``None``; the only safe way to avoid
the ``__setstate__`` call was to return a two-tuple from ``__reduce__``.
(But ``pickle.py`` would not pickle state if it was ``None``.) In Python
2.3, ``__setstate__`` will never be called at unpickling time when
``__reduce__`` returns a state with value ``None`` at pickling time.
or, if the update() call fails, A ``__reduce__`` implementation that needs to work both under Python
for k, v in state.items():
setattr(obj, k, v)
listitems Optional, and new in this PEP.
If this is not None, it should be an iterator (not a
sequence!) yielding successive list items. These list
items will be pickled, and appended to the object using
either obj.append(item) or obj.extend(list_of_items).
This is primarily used for list subclasses, but may
be used by other classes as long as they have append()
and extend() methods with the appropriate signature.
(Whether append() or extend() is used depends on which
pickle protocol version is used as well as the number
of items to append, so both must be supported.)
dictitems Optional, and new in this PEP.
If this is not None, it should be an iterator (not a
sequence!) yielding successive dictionary items, which
should be tuples of the form (key, value). These items
will be pickled, and stored to the object using
obj[key] = value. This is primarily used for dict
subclasses, but may be used by other classes as long
as they implement __setitem__.
Note: in Python 2.2 and before, when using cPickle, state would be
pickled if present even if it is None; the only safe way to avoid
the __setstate__ call was to return a two-tuple from __reduce__.
(But pickle.py would not pickle state if it was None.) In Python
2.3, __setstate__ will never be called at unpickling time when
__reduce__ returns a state with value None at pickling time.
A __reduce__ implementation that needs to work both under Python
2.2 and under Python 2.3 could check the variable 2.2 and under Python 2.3 could check the variable
pickle.format_version to determine whether to use the listitems ``pickle.format_version`` to determine whether to use the *listitems*
and dictitems features. If this value is >= "2.0" then they are and *dictitems* features. If this value is ``>= "2.0"`` then they are
supported. If not, any list or dict items should be incorporated supported. If not, any list or dict items should be incorporated
somehow in the 'state' return value, and the __setstate__ method somehow in the 'state' return value, and the ``__setstate__`` method
should be prepared to accept list or dict items as part of the should be prepared to accept list or dict items as part of the
state (how this is done is up to the application). state (how this is done is up to the application).
The __reduce_ex__ API The ``__reduce_ex__`` API
=========================
It is sometimes useful to know the protocol version when It is sometimes useful to know the protocol version when
implementing __reduce__. This can be done by implementing a implementing ``__reduce__``. This can be done by implementing a
method named __reduce_ex__ instead of __reduce__. __reduce_ex__, method named ``__reduce_ex__`` instead of ``__reduce__``. ``__reduce_ex__``,
when it exists, is called in preference over __reduce__ (you may when it exists, is called in preference over ``__reduce__`` (you may
still provide __reduce__ for backwards compatibility). The still provide ``__reduce__`` for backwards compatibility). The
__reduce_ex__ method will be called with a single integer ``__reduce_ex__`` method will be called with a single integer
argument, the protocol version. argument, the protocol version.
The 'object' class implements both __reduce__ and __reduce_ex__; The 'object' class implements both ``__reduce__`` and ``__reduce_ex__``;
however, if a subclass overrides __reduce__ but not __reduce_ex__, however, if a subclass overrides ``__reduce__`` but not ``__reduce_ex__``,
the __reduce_ex__ implementation detects this and calls the ``__reduce_ex__`` implementation detects this and calls
__reduce__. ``__reduce__``.
Customizing pickling absent a __reduce__ implementation Customizing pickling absent a ``__reduce__`` implementation
===========================================================
If no __reduce__ implementation is available for a particular If no ``__reduce__`` implementation is available for a particular
class, there are three cases that need to be considered class, there are three cases that need to be considered
separately, because they are handled differently: separately, because they are handled differently:
@ -276,108 +293,113 @@ Customizing pickling absent a __reduce__ implementation
Types implemented in C are considered new-style classes. However, Types implemented in C are considered new-style classes. However,
except for the common built-in types, these need to provide a except for the common built-in types, these need to provide a
__reduce__ implementation in order to be picklable with protocols ``__reduce__`` implementation in order to be picklable with protocols
0 or 1. Protocol 2 supports built-in types providing 0 or 1. Protocol 2 supports built-in types providing
__getnewargs__, __getstate__ and __setstate__ as well. ``__getnewargs__``, ``__getstate__`` and ``__setstate__`` as well.
Case 1: pickling classic class instances Case 1: pickling classic class instances
----------------------------------------
This case is the same for all protocols, and is unchanged from This case is the same for all protocols, and is unchanged from
Python 2.1. Python 2.1.
For classic classes, __reduce__ is not used. Instead, classic For classic classes, ``__reduce__`` is not used. Instead, classic
classes can customize their pickling by providing methods named classes can customize their pickling by providing methods named
__getstate__, __setstate__ and __getinitargs__. Absent these, a ``__getstate__``, ``__setstate__`` and ``__getinitargs__``. Absent these, a
default pickling strategy for classic class instances is default pickling strategy for classic class instances is
implemented that works as long as all instance variables are implemented that works as long as all instance variables are
picklable. This default strategy is documented in terms of picklable. This default strategy is documented in terms of
default implementations of __getstate__ and __setstate__. default implementations of ``__getstate__`` and ``__setstate__``.
The primary ways to customize pickling of classic class instances The primary ways to customize pickling of classic class instances
is by specifying __getstate__ and/or __setstate__ methods. It is is by specifying ``__getstate__`` and/or ``__setstate__`` methods. It is
fine if a class implements one of these but not the other, as long fine if a class implements one of these but not the other, as long
as it is compatible with the default version. as it is compatible with the default version.
The __getstate__ method The ``__getstate__`` method
'''''''''''''''''''''''''''
The __getstate__ method should return a picklable value The ``__getstate__`` method should return a picklable value
representing the object's state without referencing the object representing the object's state without referencing the object
itself. If no __getstate__ method exists, a default itself. If no ``__getstate__`` method exists, a default
implementation is used that returns self.__dict__. implementation is used that returns ``self.__dict__``.
The __setstate__ method The ``__setstate__`` method
'''''''''''''''''''''''''''
The __setstate__ method should take one argument; it will be The ``__setstate__`` method should take one argument; it will be
called with the value returned by __getstate__ (or its default called with the value returned by ``__getstate__`` (or its default
implementation). implementation).
If no __setstate__ method exists, a default implementation is If no ``__setstate__`` method exists, a default implementation is
provided that assumes the state is a dictionary mapping instance provided that assumes the state is a dictionary mapping instance
variable names to values. The default implementation tries two variable names to values. The default implementation tries two
things: things:
- First, it tries to call self.__dict__.update(state). - First, it tries to call ``self.__dict__.update(state)``.
- If the update() call fails with a RuntimeError exception, it - If the ``update()`` call fails with a ``RuntimeError`` exception, it
calls setattr(self, key, value) for each (key, value) pair in calls ``setattr(self, key, value)`` for each ``(key, value)`` pair in
the state dictionary. This only happens when unpickling in the state dictionary. This only happens when unpickling in
restricted execution mode (see the rexec standard library restricted execution mode (see the ``rexec`` standard library
module). module).
The __getinitargs__ method The ``__getinitargs__`` method
''''''''''''''''''''''''''''''
The __setstate__ method (or its default implementation) requires The ``__setstate__`` method (or its default implementation) requires
that a new object already exists so that its __setstate__ method that a new object already exists so that its ``__setstate__`` method
can be called. The point is to create a new object that isn't can be called. The point is to create a new object that isn't
fully initialized; in particular, the class's __init__ method fully initialized; in particular, the class's ``__init__`` method
should not be called if possible. should not be called if possible.
These are the possibilities: These are the possibilities:
- Normally, the following trick is used: create an instance of a - Normally, the following trick is used: create an instance of a
trivial classic class (one without any methods or instance trivial classic class (one without any methods or instance
variables) and then use __class__ assignment to change its variables) and then use ``__class__`` assignment to change its
class to the desired class. This creates an instance of the class to the desired class. This creates an instance of the
desired class with an empty __dict__ whose __init__ has not desired class with an empty ``__dict__`` whose ``__init__`` has not
been called. been called.
- However, if the class has a method named __getinitargs__, the - However, if the class has a method named ``__getinitargs__``, the
above trick is not used, and a class instance is created by above trick is not used, and a class instance is created by
using the tuple returned by __getinitargs__ as an argument using the tuple returned by ``__getinitargs__`` as an argument
list to the class constructor. This is done even if list to the class constructor. This is done even if
__getinitargs__ returns an empty tuple -- a __getinitargs__ ``__getinitargs__`` returns an empty tuple --- a ``__getinitargs__``
method that returns () is not equivalent to not having method that returns ``()`` is not equivalent to not having
__getinitargs__ at all. __getinitargs__ *must* return a ``__getinitargs__`` at all. ``__getinitargs__`` *must* return a
tuple. tuple.
- In restricted execution mode, the trick from the first bullet - In restricted execution mode, the trick from the first bullet
doesn't work; in this case, the class constructor is called doesn't work; in this case, the class constructor is called
with an empty argument list if no __getinitargs__ method with an empty argument list if no ``__getinitargs__`` method
exists. This means that in order for a classic class to be exists. This means that in order for a classic class to be
unpicklable in restricted execution mode, it must either unpicklable in restricted execution mode, it must either
implement __getinitargs__ or its constructor (i.e., its implement ``__getinitargs__`` or its constructor (i.e., its
__init__ method) must be callable without arguments. ``__init__`` method) must be callable without arguments.
Case 2: pickling new-style class instances using protocols 0 or 1 Case 2: pickling new-style class instances using protocols 0 or 1
-----------------------------------------------------------------
This case is unchanged from Python 2.2. For better pickling of This case is unchanged from Python 2.2. For better pickling of
new-style class instances when backwards compatibility is not an new-style class instances when backwards compatibility is not an
issue, protocol 2 should be used; see case 3 below. issue, protocol 2 should be used; see case 3 below.
New-style classes, whether implemented in C or in Python, inherit New-style classes, whether implemented in C or in Python, inherit
a default __reduce__ implementation from the universal base class a default ``__reduce__`` implementation from the universal base class
'object'. 'object'.
This default __reduce__ implementation is not used for those This default ``__reduce__`` implementation is not used for those
built-in types for which the pickle module has built-in support. built-in types for which the ``pickle`` module has built-in support.
Here's a full list of those types: Here's a full list of those types:
- Concrete built-in types: NoneType, bool, int, float, complex, - Concrete built-in types: ``NoneType``, ``bool``, ``int``, ``float``, ``complex``,
str, unicode, tuple, list, dict. (Complex is supported by ``str``, ``unicode``, ``tuple``, ``list``, ``dict``. (Complex is supported by
virtue of a __reduce__ implementation registered in copy_reg.) virtue of a ``__reduce__`` implementation registered in ``copy_reg``.)
In Jython, PyStringMap is also included in this list. In Jython, ``PyStringMap`` is also included in this list.
- Classic instances. - Classic instances.
@ -388,71 +410,72 @@ Case 2: pickling new-style class instances using protocols 0 or 1
same name (the fully qualified module name plus the variable same name (the fully qualified module name plus the variable
name in that module) is substituted. name in that module) is substituted.
The default __reduce__ implementation will fail at pickling time The default ``__reduce__`` implementation will fail at pickling time
for built-in types not mentioned above, and for new-style classes for built-in types not mentioned above, and for new-style classes
implemented in C: if they want to be picklable, they must supply implemented in C: if they want to be picklable, they must supply
a custom __reduce__ implementation under protocols 0 and 1. a custom ``__reduce__`` implementation under protocols 0 and 1.
For new-style classes implemented in Python, the default For new-style classes implemented in Python, the default
__reduce__ implementation (copy_reg._reduce) works as follows: ``__reduce__`` implementation (``copy_reg._reduce``) works as follows:
Let D be the class on the object to be pickled. First, find the Let ``D`` be the class on the object to be pickled. First, find the
nearest base class that is implemented in C (either as a nearest base class that is implemented in C (either as a
built-in type or as a type defined by an extension class). Call built-in type or as a type defined by an extension class). Call
this base class B, and the class of the object to be pickled D. this base class ``B``, and the class of the object to be pickled ``D``.
Unless B is the class 'object', instances of class B must be Unless ``B`` is the class 'object', instances of class ``B`` must be
picklable, either by having built-in support (as defined in the picklable, either by having built-in support (as defined in the
above three bullet points), or by having a non-default above three bullet points), or by having a non-default
__reduce__ implementation. B must not be the same class as D ``__reduce__`` implementation. ``B`` must not be the same class as ``D``
(if it were, it would mean that D is not implemented in Python). (if it were, it would mean that ``D`` is not implemented in Python).
The callable produced by the default __reduce__ is The callable produced by the default ``__reduce__`` is
copy_reg._reconstructor, and its arguments tuple is ``copy_reg._reconstructor``, and its arguments tuple is
(D, B, basestate), where basestate is None if B is the builtin ``(D, B, basestate)``, where ``basestate`` is ``None`` if ``B`` is the builtin
object class, and basestate is object class, and ``basestate`` is ::
basestate = B(obj) basestate = B(obj)
if B is not the builtin object class. This is geared toward if ``B`` is not the builtin object class. This is geared toward
pickling subclasses of builtin types, where, for example, pickling subclasses of builtin types, where, for example,
list(some_list_subclass_instance) produces "the list part" of ``list(some_list_subclass_instance)`` produces "the list part" of
the list subclass instance. the ``list`` subclass instance.
The object is recreated at unpickling time by The object is recreated at unpickling time by
copy_reg._reconstructor, like so: ``copy_reg._reconstructor``, like so::
obj = B.__new__(D, basestate) obj = B.__new__(D, basestate)
B.__init__(obj, basestate) B.__init__(obj, basestate)
Objects using the default __reduce__ implementation can customize Objects using the default ``__reduce__`` implementation can customize
it by defining __getstate__ and/or __setstate__ methods. These it by defining ``__getstate__`` and/or ``__setstate__`` methods. These
work almost the same as described for classic classes above, except work almost the same as described for classic classes above, except
that if __getstate__ returns an object (of any type) whose value is that if ``__getstate__`` returns an object (of any type) whose value is
considered false (e.g. None, or a number that is zero, or an empty considered false (e.g. ``None``, or a number that is zero, or an empty
sequence or mapping), this state is not pickled and __setstate__ sequence or mapping), this state is not pickled and ``__setstate__``
will not be called at all. If __getstate__ exists and returns a will not be called at all. If ``__getstate__`` exists and returns a
true value, that value becomes the third element of the tuple true value, that value becomes the third element of the tuple
returned by the default __reduce__, and at unpickling time the returned by the default ``__reduce__``, and at unpickling time the
value is passed to __setstate__. If __getstate__ does not exist, value is passed to ``__setstate__``. If ``__getstate__`` does not exist,
but obj.__dict__ exists, then obj.__dict__ becomes the third but ``obj.__dict__`` exists, then ``obj.__dict__`` becomes the third
element of the tuple returned by __reduce__, and again at element of the tuple returned by ``__reduce__``, and again at
unpickling time the value is passed to obj.__setstate__. The unpickling time the value is passed to ``obj.__setstate__``. The
default __setstate__ is the same as that for classic classes, default ``__setstate__`` is the same as that for classic classes,
described above. described above.
Note that this strategy ignores slots. Instances of new-style Note that this strategy ignores slots. Instances of new-style
classes that have slots but no __getstate__ method cannot be classes that have slots but no ``__getstate__`` method cannot be
pickled by protocols 0 and 1; the code explicitly checks for pickled by protocols 0 and 1; the code explicitly checks for
this condition. this condition.
Note that pickling new-style class instances ignores __getinitargs__ Note that pickling new-style class instances ignores ``__getinitargs__``
if it exists (and under all protocols). __getinitargs__ is if it exists (and under all protocols). ``__getinitargs__`` is
useful only for classic classes. useful only for classic classes.
Case 3: pickling new-style class instances using protocol 2 Case 3: pickling new-style class instances using protocol 2
-----------------------------------------------------------
Under protocol 2, the default __reduce__ implementation inherited Under protocol 2, the default ``__reduce__`` implementation inherited
from the 'object' base class is *ignored*. Instead, a different from the 'object' base class is *ignored*. Instead, a different
default implementation is used, which allows more efficient default implementation is used, which allows more efficient
pickling of new-style class instances than possible with protocols pickling of new-style class instances than possible with protocols
@ -460,102 +483,107 @@ Case 3: pickling new-style class instances using protocol 2
(meaning no more than that a protocol 2 pickle cannot be unpickled (meaning no more than that a protocol 2 pickle cannot be unpickled
before Python 2.3). before Python 2.3).
The customization uses three special methods: __getstate__, The customization uses three special methods: ``__getstate__``,
__setstate__ and __getnewargs__ (note that __getinitargs__ is again ``__setstate__`` and ``__getnewargs__`` (note that ``__getinitargs__`` is again
ignored). It is fine if a class implements one or more but not all ignored). It is fine if a class implements one or more but not all
of these, as long as it is compatible with the default of these, as long as it is compatible with the default
implementations. implementations.
The __getstate__ method The ``__getstate__`` method
'''''''''''''''''''''''''''
The __getstate__ method should return a picklable value The ``__getstate__`` method should return a picklable value
representing the object's state without referencing the object representing the object's state without referencing the object
itself. If no __getstate__ method exists, a default itself. If no ``__getstate__`` method exists, a default
implementation is used which is described below. implementation is used which is described below.
There's a subtle difference between classic and new-style There's a subtle difference between classic and new-style
classes here: if a classic class's __getstate__ returns None, classes here: if a classic class's ``__getstate__`` returns ``None``,
self.__setstate__(None) will be called as part of unpickling. ``self.__setstate__(None)`` will be called as part of unpickling.
But if a new-style class's __getstate__ returns None, its But if a new-style class's ``__getstate__`` returns ``None``, its
__setstate__ won't be called at all as part of unpickling. ``__setstate__`` won't be called at all as part of unpickling.
If no __getstate__ method exists, a default state is computed. If no ``__getstate__`` method exists, a default state is computed.
There are several cases: There are several cases:
- For a new-style class that has no instance __dict__ and no - For a new-style class that has no instance ``__dict__`` and no
__slots__, the default state is None. ``__slots__``, the default state is ``None``.
- For a new-style class that has an instance __dict__ and no - For a new-style class that has an instance ``__dict__`` and no
__slots__, the default state is self.__dict__. ``__slots__``, the default state is ``self.__dict__``.
- For a new-style class that has an instance __dict__ and - For a new-style class that has an instance ``__dict__`` and
__slots__, the default state is a tuple consisting of two ``__slots__``, the default state is a tuple consisting of two
dictionaries: self.__dict__, and a dictionary mapping slot dictionaries: ``self.__dict__``, and a dictionary mapping slot
names to slot values. Only slots that have a value are names to slot values. Only slots that have a value are
included in the latter. included in the latter.
- For a new-style class that has __slots__ and no instance - For a new-style class that has ``__slots__`` and no instance
__dict__, the default state is a tuple whose first item is ``__dict__``, the default state is a tuple whose first item is
None and whose second item is a dictionary mapping slot names ``None`` and whose second item is a dictionary mapping slot names
to slot values described in the previous bullet. to slot values described in the previous bullet.
The __setstate__ method The ``__setstate__`` method
'''''''''''''''''''''''''''
The __setstate__ method should take one argument; it will be The ``__setstate__`` method should take one argument; it will be
called with the value returned by __getstate__ or with the called with the value returned by ``__getstate__`` or with the
default state described above if no __getstate__ method is default state described above if no ``__getstate__`` method is
defined. defined.
If no __setstate__ method exists, a default implementation is If no ``__setstate__`` method exists, a default implementation is
provided that can handle the state returned by the default provided that can handle the state returned by the default
__getstate__, described above. ``__getstate__``, described above.
The __getnewargs__ method The ``__getnewargs__`` method
'''''''''''''''''''''''''''''
Like for classic classes, the __setstate__ method (or its Like for classic classes, the ``__setstate__`` method (or its
default implementation) requires that a new object already default implementation) requires that a new object already
exists so that its __setstate__ method can be called. exists so that its ``__setstate__`` method can be called.
In protocol 2, a new pickling opcode is used that causes a new In protocol 2, a new pickling opcode is used that causes a new
object to be created as follows: object to be created as follows::
obj = C.__new__(C, *args) obj = C.__new__(C, *args)
where C is the class of the pickled object, and args is either where ``C`` is the class of the pickled object, and ``args`` is either
the empty tuple, or the tuple returned by the __getnewargs__ the empty tuple, or the tuple returned by the ``__getnewargs__``
method, if defined. __getnewargs__ must return a tuple. The method, if defined. ``__getnewargs__`` must return a tuple. The
absence of a __getnewargs__ method is equivalent to the existence absence of a ``__getnewargs__`` method is equivalent to the existence
of one that returns (). of one that returns ``()``.
The __newobj__ unpickling function The ``__newobj__`` unpickling function
======================================
When the unpickling function returned by __reduce__ (the first When the unpickling function returned by ``__reduce__`` (the first
item of the returned tuple) has the name __newobj__, something item of the returned tuple) has the name ``__newobj__``, something
special happens for pickle protocol 2. An unpickling function special happens for pickle protocol 2. An unpickling function
named __newobj__ is assumed to have the following semantics: named ``__newobj__`` is assumed to have the following semantics::
def __newobj__(cls, *args): def __newobj__(cls, *args):
return cls.__new__(cls, *args) return cls.__new__(cls, *args)
Pickle protocol 2 special-cases an unpickling function with this Pickle protocol 2 special-cases an unpickling function with this
name, and emits a pickling opcode that, given 'cls' and 'args', name, and emits a pickling opcode that, given 'cls' and 'args',
will return cls.__new__(cls, *args) without also pickling a will return ``cls.__new__(cls, *args)`` without also pickling a
reference to __newobj__ (this is the same pickling opcode used by reference to ``__newobj__`` (this is the same pickling opcode used by
protocol 2 for a new-style class instance when no __reduce__ protocol 2 for a new-style class instance when no ``__reduce__``
implementation exists). This is the main reason why protocol 2 implementation exists). This is the main reason why protocol 2
pickles are much smaller than classic pickles. Of course, the pickles are much smaller than classic pickles. Of course, the
pickling code cannot verify that a function named __newobj__ pickling code cannot verify that a function named ``__newobj__``
actually has the expected semantics. If you use an unpickling actually has the expected semantics. If you use an unpickling
function named __newobj__ that returns something different, you function named ``__newobj__`` that returns something different, you
deserve what you get. deserve what you get.
It is safe to use this feature under Python 2.2; there's nothing It is safe to use this feature under Python 2.2; there's nothing
in the recommended implementation of __newobj__ that depends on in the recommended implementation of ``__newobj__`` that depends on
Python 2.3. Python 2.3.
The extension registry The extension registry
======================
Protocol 2 supports a new mechanism to reduce the size of pickles. Protocol 2 supports a new mechanism to reduce the size of pickles.
@ -571,8 +599,8 @@ The extension registry
The extension registry allows one to represent the most frequently The extension registry allows one to represent the most frequently
used names by small integers, which are pickled very efficiently: used names by small integers, which are pickled very efficiently:
an extension code in the range 1-255 requires only two bytes an extension code in the range 1--255 requires only two bytes
including the opcode, one in the range 256-65535 requires only including the opcode, one in the range 256--65535 requires only
three bytes including the opcode. three bytes including the opcode.
One of the design goals of the pickle protocol is to make pickles One of the design goals of the pickle protocol is to make pickles
@ -616,43 +644,48 @@ The extension registry
Here is the proposed initial assignment of extension code ranges: Here is the proposed initial assignment of extension code ranges:
===== ===== ===== =================================================
First Last Count Purpose First Last Count Purpose
===== ===== ===== =================================================
0 0 1 Reserved -- will never be used 0 0 1 Reserved --- will never be used
1 127 127 Reserved for Python standard library 1 127 127 Reserved for Python standard library
128 191 64 Reserved for Zope 128 191 64 Reserved for Zope
192 239 48 Reserved for 3rd parties 192 239 48 Reserved for 3rd parties
240 255 16 Reserved for private use (will never be assigned) 240 255 16 Reserved for private use (will never be assigned)
256 MAX MAX Reserved for future assignment 256 *MAX* *MAX* Reserved for future assignment
===== ===== ===== =================================================
MAX stands for 2147483647, or 2**31-1. This is a hard limitation *MAX* stands for 2147483647, or ``2**31-1``. This is a hard limitation
of the protocol as currently defined. of the protocol as currently defined.
At the moment, no specific extension codes have been assigned yet. At the moment, no specific extension codes have been assigned yet.
Extension registry API Extension registry API
----------------------
The extension registry is maintained as private global variables The extension registry is maintained as private global variables
in the copy_reg module. The following three functions are defined in the ``copy_reg`` module. The following three functions are defined
in this module to manipulate the registry: in this module to manipulate the registry:
add_extension(module, name, code) ``add_extension(module, name, code)``
Register an extension code. The module and name arguments Register an extension code. The *module* and *name* arguments
must be strings; code must be an int in the inclusive range 1 must be strings; *code* must be an ``int`` in the inclusive range 1
through MAX. This must either register a new (module, name) through *MAX*. This must either register a new ``(module, name)``
pair to a new code, or be a redundant repeat of a previous pair to a new code, or be a redundant repeat of a previous
call that was not canceled by a remove_extension() call; a call that was not canceled by a ``remove_extension()`` call; a
(module, name) pair may not be mapped to more than one code, ``(module, name)`` pair may not be mapped to more than one code,
nor may a code be mapped to more than one (module, name) nor may a code be mapped to more than one ``(module, name)``
pair. (XXX Aliasing may actually cause a problem for this pair.
requirement; we'll see as we go.)
remove_extension(module, name, code) .. XXX Aliasing may actually cause a problem for this
Arguments are as for add_extension(). Remove a previously requirement; we'll see as we go.
registered mapping between (module, name) and code.
clear_extension_cache() ``remove_extension(module, name, code)``
Arguments are as for ``add_extension()``. Remove a previously
registered mapping between ``(module, name)`` and *code*.
``clear_extension_cache()``
The implementation of extension codes may use a cache to speed The implementation of extension codes may use a cache to speed
up loading objects that are named frequently. This cache can up loading objects that are named frequently. This cache can
be emptied (removing references to cached objects) by calling be emptied (removing references to cached objects) by calling
@ -663,54 +696,56 @@ Extension registry API
The copy module The copy module
===============
Traditionally, the copy module has supported an extended subset of Traditionally, the ``copy`` module has supported an extended subset of
the pickling APIs for customizing the copy() and deepcopy() the pickling APIs for customizing the ``copy()`` and ``deepcopy()``
operations. operations.
In particular, besides checking for a __copy__ or __deepcopy__ In particular, besides checking for a ``__copy__`` or ``__deepcopy__``
method, copy() and deepcopy() have always looked for __reduce__, method, ``copy()`` and ``deepcopy()`` have always looked for ``__reduce__``,
and for classic classes, have looked for __getinitargs__, and for classic classes, have looked for ``__getinitargs__``,
__getstate__ and __setstate__. ``__getstate__`` and ``__setstate__``.
In Python 2.2, the default __reduce__ inherited from 'object' made In Python 2.2, the default ``__reduce__`` inherited from 'object' made
copying simple new-style classes possible, but slots and various copying simple new-style classes possible, but slots and various
other special cases were not covered. other special cases were not covered.
In Python 2.3, several changes are made to the copy module: In Python 2.3, several changes are made to the ``copy`` module:
- __reduce_ex__ is supported (and always called with 2 as the - ``__reduce_ex__`` is supported (and always called with 2 as the
protocol version argument). protocol version argument).
- The four- and five-argument return values of __reduce__ are - The four- and five-argument return values of ``__reduce__`` are
supported. supported.
- Before looking for a __reduce__ method, the - Before looking for a ``__reduce__`` method, the
copy_reg.dispatch_table is consulted, just like for pickling. ``copy_reg.dispatch_table`` is consulted, just like for pickling.
- When the __reduce__ method is inherited from object, it is - When the ``__reduce__`` method is inherited from object, it is
(unconditionally) replaced by a better one that uses the same (unconditionally) replaced by a better one that uses the same
APIs as pickle protocol 2: __getnewargs__, __getstate__, and APIs as pickle protocol 2: ``__getnewargs__``, ``__getstate__``, and
__setstate__, handling list and dict subclasses, and handling ``__setstate__``, handling ``list`` and ``dict`` subclasses, and handling
slots. slots.
As a consequence of the latter change, certain new-style classes As a consequence of the latter change, certain new-style classes
that were copyable under Python 2.2 are not copyable under Python that were copyable under Python 2.2 are not copyable under Python
2.3. (These classes are also not picklable using pickle protocol 2.3. (These classes are also not picklable using pickle protocol
2.) A minimal example of such a class: 2.) A minimal example of such a class::
class C(object): class C(object):
def __new__(cls, a): def __new__(cls, a):
return object.__new__(cls) return object.__new__(cls)
The problem only occurs when __new__ is overridden and has at The problem only occurs when ``__new__`` is overridden and has at
least one mandatory argument in addition to the class argument. least one mandatory argument in addition to the class argument.
To fix this, a __getnewargs__ method should be added that returns To fix this, a ``__getnewargs__`` method should be added that returns
the appropriate argument tuple (excluding the class). the appropriate argument tuple (excluding the class).
Pickling Python longs Pickling Python longs
=====================
Pickling and unpickling Python longs takes time quadratic in Pickling and unpickling Python longs takes time quadratic in
the number of digits, in protocols 0 and 1. Under protocol 2, the number of digits, in protocols 0 and 1. Under protocol 2,
@ -718,8 +753,9 @@ Pickling Python longs
Pickling bools Pickling bools
==============
Protocol 2 introduces new opcodes for pickling True and False Protocol 2 introduces new opcodes for pickling ``True`` and ``False``
directly. Under protocols 0 and 1, bools are pickled as integers, directly. Under protocols 0 and 1, bools are pickled as integers,
using a trick in the representation of the integer in the pickle using a trick in the representation of the integer in the pickle
so that an unpickler can recognize that a bool was intended. That so that an unpickler can recognize that a bool was intended. That
@ -728,6 +764,7 @@ Pickling bools
Pickling small tuples Pickling small tuples
=====================
Protocol 2 introduces new opcodes for more-compact pickling of Protocol 2 introduces new opcodes for more-compact pickling of
tuples of lengths 1, 2 and 3. Protocol 1 previously introduced tuples of lengths 1, 2 and 3. Protocol 1 previously introduced
@ -735,6 +772,7 @@ Pickling small tuples
Protocol identification Protocol identification
=======================
Protocol 2 introduces a new opcode, with which all protocol 2 Protocol 2 introduces a new opcode, with which all protocol 2
pickles begin, identifying that the pickle is protocol 2. pickles begin, identifying that the pickle is protocol 2.
@ -744,6 +782,7 @@ Protocol identification
Pickling of large lists and dicts Pickling of large lists and dicts
=================================
Protocol 1 pickles large lists and dicts "in one piece", which Protocol 1 pickles large lists and dicts "in one piece", which
minimizes pickle size, but requires that unpickling create a temp minimizes pickle size, but requires that unpickling create a temp
@ -752,17 +791,19 @@ Pickling of large lists and dicts
more than 1000 elements each, so that unpickling needn't create more than 1000 elements each, so that unpickling needn't create
a temp object larger than needed to hold 1000 elements. This a temp object larger than needed to hold 1000 elements. This
isn't part of protocol 2, however: the opcodes produced are still isn't part of protocol 2, however: the opcodes produced are still
part of protocol 1. __reduce__ implementations that return the part of protocol 1. ``__reduce__`` implementations that return the
optional new listitems or dictitems iterators also benefit from optional new listitems or dictitems iterators also benefit from
this unpickling temp-space optimization. this unpickling temp-space optimization.
Copyright Copyright
=========
This document has been placed in the public domain. This document has been placed in the public domain.
..
Local Variables: Local Variables:
mode: indented-text mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil