Restify PEP 307 (#229)
This commit is contained in:
parent
a53392a0f0
commit
a3062a8744
581
pep-0307.txt
581
pep-0307.txt
|
@ -5,12 +5,12 @@ Last-Modified: $Date$
|
|||
Author: Guido van Rossum, Tim Peters
|
||||
Status: Final
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Content-Type: text/x-rst
|
||||
Created: 31-Jan-2003
|
||||
Post-History: 7-Feb-2003
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Pickling new-style objects in Python 2.2 is done somewhat clumsily
|
||||
and causes pickle size to bloat compared to classic class
|
||||
|
@ -23,7 +23,7 @@ Introduction
|
|||
must be specified. This PEP focuses on API issues, although it
|
||||
may occasionally touch on byte stream format details to motivate a
|
||||
choice. The pickle byte stream format is documented formally by
|
||||
the standard library module pickletools.py (already checked into
|
||||
the standard library module ``pickletools.py`` (already checked into
|
||||
CVS for Python 2.3).
|
||||
|
||||
This PEP attempts to fully document the interface between pickled
|
||||
|
@ -34,9 +34,10 @@ Introduction
|
|||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Pickling new-style objects causes serious pickle bloat. For
|
||||
example,
|
||||
example::
|
||||
|
||||
class C(object): # Omit "(object)" for classic class
|
||||
pass
|
||||
|
@ -48,7 +49,7 @@ Motivation
|
|||
the new-style object 86 bytes.
|
||||
|
||||
The reasons for the bloat are complex, but are mostly caused by
|
||||
the fact that new-style objects use __reduce__ in order to be
|
||||
the fact that new-style objects use ``__reduce__`` in order to be
|
||||
picklable at all. After ample consideration we've concluded that
|
||||
the only way to reduce pickle sizes for new-style objects is to
|
||||
add new opcodes to the pickle protocol. The net result is that
|
||||
|
@ -58,6 +59,7 @@ Motivation
|
|||
|
||||
|
||||
Protocol versions
|
||||
=================
|
||||
|
||||
Previously, pickling (but not unpickling) distinguished between
|
||||
text mode and binary mode. By design, binary mode is a
|
||||
|
@ -74,8 +76,8 @@ Protocol versions
|
|||
inserted at the start of a protocol 2 pickle indicating that it is
|
||||
using protocol 2. To date, each release of Python has been able to
|
||||
read pickles written by all previous releases. Of course pickles
|
||||
written under protocol N can't be read by versions of Python
|
||||
earlier than the one that introduced protocol N.
|
||||
written under protocol *N* can't be read by versions of Python
|
||||
earlier than the one that introduced protocol *N*.
|
||||
|
||||
Several functions, methods and constructors used for pickling used
|
||||
to take a positional argument named 'bin' which was a flag,
|
||||
|
@ -90,32 +92,33 @@ Protocol versions
|
|||
This works in previous Python versions, too, and so can be used to
|
||||
select the highest protocol available in a way that's both backward
|
||||
and forward compatible. In addition, a new module constant
|
||||
HIGHEST_PROTOCOL is supplied by both pickle and cPickle, equal to
|
||||
``HIGHEST_PROTOCOL`` is supplied by both ``pickle`` and ``cPickle``, equal to
|
||||
the highest protocol number the module can read. This is cleaner
|
||||
than passing -1, but cannot be used before Python 2.3.
|
||||
|
||||
The pickle.py module has supported passing the 'bin' value as a
|
||||
The ``pickle.py`` module has supported passing the 'bin' value as a
|
||||
keyword argument rather than a positional argument. (This is not
|
||||
recommended, since cPickle only accepts positional arguments, but
|
||||
recommended, since ``cPickle`` only accepts positional arguments, but
|
||||
it works...) Passing 'bin' as a keyword argument is deprecated,
|
||||
and a PendingDeprecationWarning is issued in this case. You have
|
||||
to invoke the Python interpreter with -Wa or a variation on that
|
||||
to see PendingDeprecationWarning messages. In Python 2.4, the
|
||||
warning class may be upgraded to DeprecationWarning.
|
||||
and a ``PendingDeprecationWarning`` is issued in this case. You have
|
||||
to invoke the Python interpreter with ``-Wa`` or a variation on that
|
||||
to see ``PendingDeprecationWarning`` messages. In Python 2.4, the
|
||||
warning class may be upgraded to ``DeprecationWarning``.
|
||||
|
||||
|
||||
Security issues
|
||||
===============
|
||||
|
||||
In previous versions of Python, unpickling would do a "safety
|
||||
check" on certain operations, refusing to call functions or
|
||||
constructors that weren't marked as "safe for unpickling" by
|
||||
either having an attribute __safe_for_unpickling__ set to 1, or by
|
||||
being registered in a global registry, copy_reg.safe_constructors.
|
||||
either having an attribute ``__safe_for_unpickling__`` set to 1, or by
|
||||
being registered in a global registry, ``copy_reg.safe_constructors``.
|
||||
|
||||
This feature gives a false sense of security: nobody has ever done
|
||||
the necessary, extensive, code audit to prove that unpickling
|
||||
untrusted pickles cannot invoke unwanted code, and in fact bugs in
|
||||
the Python 2.2 pickle.py module make it easy to circumvent these
|
||||
the Python 2.2 ``pickle.py`` module make it easy to circumvent these
|
||||
security measures.
|
||||
|
||||
We firmly believe that, on the Internet, it is better to know that
|
||||
|
@ -127,144 +130,158 @@ Security issues
|
|||
Therefore, as of Python 2.3, all safety checks on unpickling are
|
||||
officially removed, and replaced with this warning:
|
||||
|
||||
*** Do not unpickle data received from an untrusted or
|
||||
unauthenticated source ***
|
||||
.. warning::
|
||||
|
||||
Do not unpickle data received from an untrusted or
|
||||
unauthenticated source.
|
||||
|
||||
The same warning applies to previous Python versions, despite the
|
||||
presence of safety checks there.
|
||||
|
||||
|
||||
Extended __reduce__ API
|
||||
Extended ``__reduce__`` API
|
||||
===========================
|
||||
|
||||
There are several APIs that a class can use to control pickling.
|
||||
Perhaps the most popular of these are __getstate__ and
|
||||
__setstate__; but the most powerful one is __reduce__. (There's
|
||||
also __getinitargs__, and we're adding __getnewargs__ below.)
|
||||
Perhaps the most popular of these are ``__getstate__`` and
|
||||
``__setstate__``; but the most powerful one is ``__reduce__``. (There's
|
||||
also ``__getinitargs__``, and we're adding ``__getnewargs__`` below.)
|
||||
|
||||
There are several ways to provide __reduce__ functionality: a
|
||||
class can implement a __reduce__ method or a __reduce_ex__ method
|
||||
There are several ways to provide ``__reduce__`` functionality: a
|
||||
class can implement a ``__reduce__`` method or a ``__reduce_ex__`` method
|
||||
(see next section), or a reduce function can be declared in
|
||||
copy_reg (copy_reg.dispatch_table maps classes to functions). The
|
||||
``copy_reg`` (``copy_reg.dispatch_table`` maps classes to functions). The
|
||||
return values are interpreted exactly the same, though, and we'll
|
||||
refer to these collectively as __reduce__.
|
||||
refer to these collectively as ``__reduce__``.
|
||||
|
||||
IMPORTANT: pickling of classic class instances does not look for a
|
||||
__reduce__ or __reduce_ex__ method or a reduce function in the
|
||||
copy_reg dispatch table, so that a classic class cannot provide
|
||||
__reduce__ functionality in the sense intended here. A classic
|
||||
class must use __getinitargs__ and/or __getstate__ to customize
|
||||
**Important:** pickling of classic class instances does not look for a
|
||||
``__reduce__`` or ``__reduce_ex__`` method or a reduce function in the
|
||||
``copy_reg`` dispatch table, so that a classic class cannot provide
|
||||
``__reduce__`` functionality in the sense intended here. A classic
|
||||
class must use ``__getinitargs__`` and/or ``__getstate__`` to customize
|
||||
pickling. These are described below.
|
||||
|
||||
__reduce__ must return either a string or a tuple. If it returns
|
||||
``__reduce__`` must return either a string or a tuple. If it returns
|
||||
a string, this is an object whose state is not to be pickled, but
|
||||
instead a reference to an equivalent object referenced by name.
|
||||
Surprisingly, the string returned by __reduce__ should be the
|
||||
object's local name (relative to its module); the pickle module
|
||||
Surprisingly, the string returned by ``__reduce__`` should be the
|
||||
object's local name (relative to its module); the ``pickle`` module
|
||||
searches the module namespace to determine the object's module.
|
||||
|
||||
The rest of this section is concerned with the tuple returned by
|
||||
__reduce__. It is a variable size tuple, of length 2 through 5.
|
||||
``__reduce__``. It is a variable size tuple, of length 2 through 5.
|
||||
The first two items (function and arguments) are required. The
|
||||
remaining items are optional and may be left off from the end;
|
||||
giving None for the value of an optional item acts the same as
|
||||
giving ``None`` for the value of an optional item acts the same as
|
||||
leaving it off. The last two items are new in this PEP. The items
|
||||
are, in order:
|
||||
|
||||
function Required.
|
||||
A callable object (not necessarily a function) called
|
||||
to create the initial version of the object; state
|
||||
may be added to the object later to fully reconstruct
|
||||
the pickled state. This function must itself be
|
||||
picklable. See the section about __newobj__ for a
|
||||
special case (new in this PEP) here.
|
||||
+-----------+---------------------------------------------------------------+
|
||||
| function | Required. |
|
||||
| | |
|
||||
| | A callable object (not necessarily a function) called |
|
||||
| | to create the initial version of the object; state |
|
||||
| | may be added to the object later to fully reconstruct |
|
||||
| | the pickled state. This function must itself be |
|
||||
| | picklable. See the section about ``__newobj__`` for a |
|
||||
| | special case (new in this PEP) here. |
|
||||
+-----------+---------------------------------------------------------------+
|
||||
| arguments | Required. |
|
||||
| | |
|
||||
| | A tuple giving the argument list for the function. |
|
||||
| | As a special case, designed for Zope 2's |
|
||||
| | ``ExtensionClass``, this may be ``None``; in that case, |
|
||||
| | function should be a class or type, and |
|
||||
| | ``function.__basicnew__()`` is called to create the |
|
||||
| | initial version of the object. This exception is |
|
||||
| | deprecated. |
|
||||
+-----------+---------------------------------------------------------------+
|
||||
|
||||
arguments Required.
|
||||
A tuple giving the argument list for the function.
|
||||
As a special case, designed for Zope 2's
|
||||
ExtensionClass, this may be None; in that case,
|
||||
function should be a class or type, and
|
||||
function.__basicnew__() is called to create the
|
||||
initial version of the object. This exception is
|
||||
deprecated.
|
||||
|
||||
Unpickling invokes function(*arguments) to create an initial object,
|
||||
called obj below. If the remaining items are left off, that's the end
|
||||
of unpickling for this object and obj is the result. Else obj is
|
||||
Unpickling invokes ``function(*arguments)`` to create an initial object,
|
||||
called *obj* below. If the remaining items are left off, that's the end
|
||||
of unpickling for this object and *obj* is the result. Else *obj* is
|
||||
modified at unpickling time by each item specified, as follows.
|
||||
|
||||
state Optional.
|
||||
Additional state. If this is not None, the state is
|
||||
pickled, and obj.__setstate__(state) will be called
|
||||
when unpickling. If no __setstate__ method is
|
||||
defined, a default implementation is provided, which
|
||||
assumes that state is a dictionary mapping instance
|
||||
variable names to their values. The default
|
||||
implementation calls
|
||||
+-----------+---------------------------------------------------------------+
|
||||
| state | Optional. |
|
||||
| | |
|
||||
| | Additional state. If this is not ``None``, the state is |
|
||||
| | pickled, and ``obj.__setstate__(state)`` will be called |
|
||||
| | when unpickling. If no ``__setstate__`` method is |
|
||||
| | defined, a default implementation is provided, which |
|
||||
| | assumes that state is a dictionary mapping instance |
|
||||
| | variable names to their values. The default |
|
||||
| | implementation calls :: |
|
||||
| | |
|
||||
| | obj.__dict__.update(state) |
|
||||
| | |
|
||||
| | or, if the ``update()`` call fails, :: |
|
||||
| | |
|
||||
| | for k, v in state.items(): |
|
||||
| | setattr(obj, k, v) |
|
||||
+-----------+---------------------------------------------------------------+
|
||||
| listitems | Optional, and new in this PEP. |
|
||||
| | |
|
||||
| | If this is not ``None``, it should be an iterator (not a |
|
||||
| | sequence!) yielding successive list items. These list |
|
||||
| | items will be pickled, and appended to the object using |
|
||||
| | either ``obj.append(item)`` or ``obj.extend(list_of_items)``. |
|
||||
| | This is primarily used for ``list`` subclasses, but may |
|
||||
| | be used by other classes as long as they have ``append()`` |
|
||||
| | and ``extend()`` methods with the appropriate signature. |
|
||||
| | (Whether ``append()`` or ``extend()`` is used depends on which|
|
||||
| | pickle protocol version is used as well as the number |
|
||||
| | of items to append, so both must be supported.) |
|
||||
+-----------+---------------------------------------------------------------+
|
||||
| dictitems | Optional, and new in this PEP. |
|
||||
| | |
|
||||
| | If this is not ``None``, it should be an iterator (not a |
|
||||
| | sequence!) yielding successive dictionary items, which |
|
||||
| | should be tuples of the form ``(key, value)``. These items |
|
||||
| | will be pickled, and stored to the object using |
|
||||
| | ``obj[key] = value``. This is primarily used for ``dict`` |
|
||||
| | subclasses, but may be used by other classes as long |
|
||||
| | as they implement ``__setitem__``. |
|
||||
+-----------+---------------------------------------------------------------+
|
||||
|
||||
obj.__dict__.update(state)
|
||||
Note: in Python 2.2 and before, when using ``cPickle``, state would be
|
||||
pickled if present even if it is ``None``; the only safe way to avoid
|
||||
the ``__setstate__`` call was to return a two-tuple from ``__reduce__``.
|
||||
(But ``pickle.py`` would not pickle state if it was ``None``.) In Python
|
||||
2.3, ``__setstate__`` will never be called at unpickling time when
|
||||
``__reduce__`` returns a state with value ``None`` at pickling time.
|
||||
|
||||
or, if the update() call fails,
|
||||
|
||||
for k, v in state.items():
|
||||
setattr(obj, k, v)
|
||||
|
||||
listitems Optional, and new in this PEP.
|
||||
If this is not None, it should be an iterator (not a
|
||||
sequence!) yielding successive list items. These list
|
||||
items will be pickled, and appended to the object using
|
||||
either obj.append(item) or obj.extend(list_of_items).
|
||||
This is primarily used for list subclasses, but may
|
||||
be used by other classes as long as they have append()
|
||||
and extend() methods with the appropriate signature.
|
||||
(Whether append() or extend() is used depends on which
|
||||
pickle protocol version is used as well as the number
|
||||
of items to append, so both must be supported.)
|
||||
|
||||
dictitems Optional, and new in this PEP.
|
||||
If this is not None, it should be an iterator (not a
|
||||
sequence!) yielding successive dictionary items, which
|
||||
should be tuples of the form (key, value). These items
|
||||
will be pickled, and stored to the object using
|
||||
obj[key] = value. This is primarily used for dict
|
||||
subclasses, but may be used by other classes as long
|
||||
as they implement __setitem__.
|
||||
|
||||
Note: in Python 2.2 and before, when using cPickle, state would be
|
||||
pickled if present even if it is None; the only safe way to avoid
|
||||
the __setstate__ call was to return a two-tuple from __reduce__.
|
||||
(But pickle.py would not pickle state if it was None.) In Python
|
||||
2.3, __setstate__ will never be called at unpickling time when
|
||||
__reduce__ returns a state with value None at pickling time.
|
||||
|
||||
A __reduce__ implementation that needs to work both under Python
|
||||
A ``__reduce__`` implementation that needs to work both under Python
|
||||
2.2 and under Python 2.3 could check the variable
|
||||
pickle.format_version to determine whether to use the listitems
|
||||
and dictitems features. If this value is >= "2.0" then they are
|
||||
``pickle.format_version`` to determine whether to use the *listitems*
|
||||
and *dictitems* features. If this value is ``>= "2.0"`` then they are
|
||||
supported. If not, any list or dict items should be incorporated
|
||||
somehow in the 'state' return value, and the __setstate__ method
|
||||
somehow in the 'state' return value, and the ``__setstate__`` method
|
||||
should be prepared to accept list or dict items as part of the
|
||||
state (how this is done is up to the application).
|
||||
|
||||
|
||||
The __reduce_ex__ API
|
||||
The ``__reduce_ex__`` API
|
||||
=========================
|
||||
|
||||
It is sometimes useful to know the protocol version when
|
||||
implementing __reduce__. This can be done by implementing a
|
||||
method named __reduce_ex__ instead of __reduce__. __reduce_ex__,
|
||||
when it exists, is called in preference over __reduce__ (you may
|
||||
still provide __reduce__ for backwards compatibility). The
|
||||
__reduce_ex__ method will be called with a single integer
|
||||
implementing ``__reduce__``. This can be done by implementing a
|
||||
method named ``__reduce_ex__`` instead of ``__reduce__``. ``__reduce_ex__``,
|
||||
when it exists, is called in preference over ``__reduce__`` (you may
|
||||
still provide ``__reduce__`` for backwards compatibility). The
|
||||
``__reduce_ex__`` method will be called with a single integer
|
||||
argument, the protocol version.
|
||||
|
||||
The 'object' class implements both __reduce__ and __reduce_ex__;
|
||||
however, if a subclass overrides __reduce__ but not __reduce_ex__,
|
||||
the __reduce_ex__ implementation detects this and calls
|
||||
__reduce__.
|
||||
The 'object' class implements both ``__reduce__`` and ``__reduce_ex__``;
|
||||
however, if a subclass overrides ``__reduce__`` but not ``__reduce_ex__``,
|
||||
the ``__reduce_ex__`` implementation detects this and calls
|
||||
``__reduce__``.
|
||||
|
||||
|
||||
Customizing pickling absent a __reduce__ implementation
|
||||
Customizing pickling absent a ``__reduce__`` implementation
|
||||
===========================================================
|
||||
|
||||
If no __reduce__ implementation is available for a particular
|
||||
If no ``__reduce__`` implementation is available for a particular
|
||||
class, there are three cases that need to be considered
|
||||
separately, because they are handled differently:
|
||||
|
||||
|
@ -276,108 +293,113 @@ Customizing pickling absent a __reduce__ implementation
|
|||
|
||||
Types implemented in C are considered new-style classes. However,
|
||||
except for the common built-in types, these need to provide a
|
||||
__reduce__ implementation in order to be picklable with protocols
|
||||
``__reduce__`` implementation in order to be picklable with protocols
|
||||
0 or 1. Protocol 2 supports built-in types providing
|
||||
__getnewargs__, __getstate__ and __setstate__ as well.
|
||||
``__getnewargs__``, ``__getstate__`` and ``__setstate__`` as well.
|
||||
|
||||
|
||||
Case 1: pickling classic class instances
|
||||
----------------------------------------
|
||||
|
||||
This case is the same for all protocols, and is unchanged from
|
||||
Python 2.1.
|
||||
|
||||
For classic classes, __reduce__ is not used. Instead, classic
|
||||
For classic classes, ``__reduce__`` is not used. Instead, classic
|
||||
classes can customize their pickling by providing methods named
|
||||
__getstate__, __setstate__ and __getinitargs__. Absent these, a
|
||||
``__getstate__``, ``__setstate__`` and ``__getinitargs__``. Absent these, a
|
||||
default pickling strategy for classic class instances is
|
||||
implemented that works as long as all instance variables are
|
||||
picklable. This default strategy is documented in terms of
|
||||
default implementations of __getstate__ and __setstate__.
|
||||
default implementations of ``__getstate__`` and ``__setstate__``.
|
||||
|
||||
The primary ways to customize pickling of classic class instances
|
||||
is by specifying __getstate__ and/or __setstate__ methods. It is
|
||||
is by specifying ``__getstate__`` and/or ``__setstate__`` methods. It is
|
||||
fine if a class implements one of these but not the other, as long
|
||||
as it is compatible with the default version.
|
||||
|
||||
The __getstate__ method
|
||||
The ``__getstate__`` method
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
The __getstate__ method should return a picklable value
|
||||
The ``__getstate__`` method should return a picklable value
|
||||
representing the object's state without referencing the object
|
||||
itself. If no __getstate__ method exists, a default
|
||||
implementation is used that returns self.__dict__.
|
||||
itself. If no ``__getstate__`` method exists, a default
|
||||
implementation is used that returns ``self.__dict__``.
|
||||
|
||||
The __setstate__ method
|
||||
The ``__setstate__`` method
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
The __setstate__ method should take one argument; it will be
|
||||
called with the value returned by __getstate__ (or its default
|
||||
The ``__setstate__`` method should take one argument; it will be
|
||||
called with the value returned by ``__getstate__`` (or its default
|
||||
implementation).
|
||||
|
||||
If no __setstate__ method exists, a default implementation is
|
||||
If no ``__setstate__`` method exists, a default implementation is
|
||||
provided that assumes the state is a dictionary mapping instance
|
||||
variable names to values. The default implementation tries two
|
||||
things:
|
||||
|
||||
- First, it tries to call self.__dict__.update(state).
|
||||
- First, it tries to call ``self.__dict__.update(state)``.
|
||||
|
||||
- If the update() call fails with a RuntimeError exception, it
|
||||
calls setattr(self, key, value) for each (key, value) pair in
|
||||
- If the ``update()`` call fails with a ``RuntimeError`` exception, it
|
||||
calls ``setattr(self, key, value)`` for each ``(key, value)`` pair in
|
||||
the state dictionary. This only happens when unpickling in
|
||||
restricted execution mode (see the rexec standard library
|
||||
restricted execution mode (see the ``rexec`` standard library
|
||||
module).
|
||||
|
||||
The __getinitargs__ method
|
||||
The ``__getinitargs__`` method
|
||||
''''''''''''''''''''''''''''''
|
||||
|
||||
The __setstate__ method (or its default implementation) requires
|
||||
that a new object already exists so that its __setstate__ method
|
||||
The ``__setstate__`` method (or its default implementation) requires
|
||||
that a new object already exists so that its ``__setstate__`` method
|
||||
can be called. The point is to create a new object that isn't
|
||||
fully initialized; in particular, the class's __init__ method
|
||||
fully initialized; in particular, the class's ``__init__`` method
|
||||
should not be called if possible.
|
||||
|
||||
These are the possibilities:
|
||||
|
||||
- Normally, the following trick is used: create an instance of a
|
||||
trivial classic class (one without any methods or instance
|
||||
variables) and then use __class__ assignment to change its
|
||||
variables) and then use ``__class__`` assignment to change its
|
||||
class to the desired class. This creates an instance of the
|
||||
desired class with an empty __dict__ whose __init__ has not
|
||||
desired class with an empty ``__dict__`` whose ``__init__`` has not
|
||||
been called.
|
||||
|
||||
- However, if the class has a method named __getinitargs__, the
|
||||
- However, if the class has a method named ``__getinitargs__``, the
|
||||
above trick is not used, and a class instance is created by
|
||||
using the tuple returned by __getinitargs__ as an argument
|
||||
using the tuple returned by ``__getinitargs__`` as an argument
|
||||
list to the class constructor. This is done even if
|
||||
__getinitargs__ returns an empty tuple -- a __getinitargs__
|
||||
method that returns () is not equivalent to not having
|
||||
__getinitargs__ at all. __getinitargs__ *must* return a
|
||||
``__getinitargs__`` returns an empty tuple --- a ``__getinitargs__``
|
||||
method that returns ``()`` is not equivalent to not having
|
||||
``__getinitargs__`` at all. ``__getinitargs__`` *must* return a
|
||||
tuple.
|
||||
|
||||
- In restricted execution mode, the trick from the first bullet
|
||||
doesn't work; in this case, the class constructor is called
|
||||
with an empty argument list if no __getinitargs__ method
|
||||
with an empty argument list if no ``__getinitargs__`` method
|
||||
exists. This means that in order for a classic class to be
|
||||
unpicklable in restricted execution mode, it must either
|
||||
implement __getinitargs__ or its constructor (i.e., its
|
||||
__init__ method) must be callable without arguments.
|
||||
implement ``__getinitargs__`` or its constructor (i.e., its
|
||||
``__init__`` method) must be callable without arguments.
|
||||
|
||||
|
||||
Case 2: pickling new-style class instances using protocols 0 or 1
|
||||
-----------------------------------------------------------------
|
||||
|
||||
This case is unchanged from Python 2.2. For better pickling of
|
||||
new-style class instances when backwards compatibility is not an
|
||||
issue, protocol 2 should be used; see case 3 below.
|
||||
|
||||
New-style classes, whether implemented in C or in Python, inherit
|
||||
a default __reduce__ implementation from the universal base class
|
||||
a default ``__reduce__`` implementation from the universal base class
|
||||
'object'.
|
||||
|
||||
This default __reduce__ implementation is not used for those
|
||||
built-in types for which the pickle module has built-in support.
|
||||
This default ``__reduce__`` implementation is not used for those
|
||||
built-in types for which the ``pickle`` module has built-in support.
|
||||
Here's a full list of those types:
|
||||
|
||||
- Concrete built-in types: NoneType, bool, int, float, complex,
|
||||
str, unicode, tuple, list, dict. (Complex is supported by
|
||||
virtue of a __reduce__ implementation registered in copy_reg.)
|
||||
In Jython, PyStringMap is also included in this list.
|
||||
- Concrete built-in types: ``NoneType``, ``bool``, ``int``, ``float``, ``complex``,
|
||||
``str``, ``unicode``, ``tuple``, ``list``, ``dict``. (Complex is supported by
|
||||
virtue of a ``__reduce__`` implementation registered in ``copy_reg``.)
|
||||
In Jython, ``PyStringMap`` is also included in this list.
|
||||
|
||||
- Classic instances.
|
||||
|
||||
|
@ -388,71 +410,72 @@ Case 2: pickling new-style class instances using protocols 0 or 1
|
|||
same name (the fully qualified module name plus the variable
|
||||
name in that module) is substituted.
|
||||
|
||||
The default __reduce__ implementation will fail at pickling time
|
||||
The default ``__reduce__`` implementation will fail at pickling time
|
||||
for built-in types not mentioned above, and for new-style classes
|
||||
implemented in C: if they want to be picklable, they must supply
|
||||
a custom __reduce__ implementation under protocols 0 and 1.
|
||||
a custom ``__reduce__`` implementation under protocols 0 and 1.
|
||||
|
||||
For new-style classes implemented in Python, the default
|
||||
__reduce__ implementation (copy_reg._reduce) works as follows:
|
||||
``__reduce__`` implementation (``copy_reg._reduce``) works as follows:
|
||||
|
||||
Let D be the class on the object to be pickled. First, find the
|
||||
Let ``D`` be the class on the object to be pickled. First, find the
|
||||
nearest base class that is implemented in C (either as a
|
||||
built-in type or as a type defined by an extension class). Call
|
||||
this base class B, and the class of the object to be pickled D.
|
||||
Unless B is the class 'object', instances of class B must be
|
||||
this base class ``B``, and the class of the object to be pickled ``D``.
|
||||
Unless ``B`` is the class 'object', instances of class ``B`` must be
|
||||
picklable, either by having built-in support (as defined in the
|
||||
above three bullet points), or by having a non-default
|
||||
__reduce__ implementation. B must not be the same class as D
|
||||
(if it were, it would mean that D is not implemented in Python).
|
||||
``__reduce__`` implementation. ``B`` must not be the same class as ``D``
|
||||
(if it were, it would mean that ``D`` is not implemented in Python).
|
||||
|
||||
The callable produced by the default __reduce__ is
|
||||
copy_reg._reconstructor, and its arguments tuple is
|
||||
(D, B, basestate), where basestate is None if B is the builtin
|
||||
object class, and basestate is
|
||||
The callable produced by the default ``__reduce__`` is
|
||||
``copy_reg._reconstructor``, and its arguments tuple is
|
||||
``(D, B, basestate)``, where ``basestate`` is ``None`` if ``B`` is the builtin
|
||||
object class, and ``basestate`` is ::
|
||||
|
||||
basestate = B(obj)
|
||||
|
||||
if B is not the builtin object class. This is geared toward
|
||||
if ``B`` is not the builtin object class. This is geared toward
|
||||
pickling subclasses of builtin types, where, for example,
|
||||
list(some_list_subclass_instance) produces "the list part" of
|
||||
the list subclass instance.
|
||||
``list(some_list_subclass_instance)`` produces "the list part" of
|
||||
the ``list`` subclass instance.
|
||||
|
||||
The object is recreated at unpickling time by
|
||||
copy_reg._reconstructor, like so:
|
||||
``copy_reg._reconstructor``, like so::
|
||||
|
||||
obj = B.__new__(D, basestate)
|
||||
B.__init__(obj, basestate)
|
||||
|
||||
Objects using the default __reduce__ implementation can customize
|
||||
it by defining __getstate__ and/or __setstate__ methods. These
|
||||
Objects using the default ``__reduce__`` implementation can customize
|
||||
it by defining ``__getstate__`` and/or ``__setstate__`` methods. These
|
||||
work almost the same as described for classic classes above, except
|
||||
that if __getstate__ returns an object (of any type) whose value is
|
||||
considered false (e.g. None, or a number that is zero, or an empty
|
||||
sequence or mapping), this state is not pickled and __setstate__
|
||||
will not be called at all. If __getstate__ exists and returns a
|
||||
that if ``__getstate__`` returns an object (of any type) whose value is
|
||||
considered false (e.g. ``None``, or a number that is zero, or an empty
|
||||
sequence or mapping), this state is not pickled and ``__setstate__``
|
||||
will not be called at all. If ``__getstate__`` exists and returns a
|
||||
true value, that value becomes the third element of the tuple
|
||||
returned by the default __reduce__, and at unpickling time the
|
||||
value is passed to __setstate__. If __getstate__ does not exist,
|
||||
but obj.__dict__ exists, then obj.__dict__ becomes the third
|
||||
element of the tuple returned by __reduce__, and again at
|
||||
unpickling time the value is passed to obj.__setstate__. The
|
||||
default __setstate__ is the same as that for classic classes,
|
||||
returned by the default ``__reduce__``, and at unpickling time the
|
||||
value is passed to ``__setstate__``. If ``__getstate__`` does not exist,
|
||||
but ``obj.__dict__`` exists, then ``obj.__dict__`` becomes the third
|
||||
element of the tuple returned by ``__reduce__``, and again at
|
||||
unpickling time the value is passed to ``obj.__setstate__``. The
|
||||
default ``__setstate__`` is the same as that for classic classes,
|
||||
described above.
|
||||
|
||||
Note that this strategy ignores slots. Instances of new-style
|
||||
classes that have slots but no __getstate__ method cannot be
|
||||
classes that have slots but no ``__getstate__`` method cannot be
|
||||
pickled by protocols 0 and 1; the code explicitly checks for
|
||||
this condition.
|
||||
|
||||
Note that pickling new-style class instances ignores __getinitargs__
|
||||
if it exists (and under all protocols). __getinitargs__ is
|
||||
Note that pickling new-style class instances ignores ``__getinitargs__``
|
||||
if it exists (and under all protocols). ``__getinitargs__`` is
|
||||
useful only for classic classes.
|
||||
|
||||
|
||||
Case 3: pickling new-style class instances using protocol 2
|
||||
-----------------------------------------------------------
|
||||
|
||||
Under protocol 2, the default __reduce__ implementation inherited
|
||||
Under protocol 2, the default ``__reduce__`` implementation inherited
|
||||
from the 'object' base class is *ignored*. Instead, a different
|
||||
default implementation is used, which allows more efficient
|
||||
pickling of new-style class instances than possible with protocols
|
||||
|
@ -460,102 +483,107 @@ Case 3: pickling new-style class instances using protocol 2
|
|||
(meaning no more than that a protocol 2 pickle cannot be unpickled
|
||||
before Python 2.3).
|
||||
|
||||
The customization uses three special methods: __getstate__,
|
||||
__setstate__ and __getnewargs__ (note that __getinitargs__ is again
|
||||
The customization uses three special methods: ``__getstate__``,
|
||||
``__setstate__`` and ``__getnewargs__`` (note that ``__getinitargs__`` is again
|
||||
ignored). It is fine if a class implements one or more but not all
|
||||
of these, as long as it is compatible with the default
|
||||
implementations.
|
||||
|
||||
The __getstate__ method
|
||||
The ``__getstate__`` method
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
The __getstate__ method should return a picklable value
|
||||
The ``__getstate__`` method should return a picklable value
|
||||
representing the object's state without referencing the object
|
||||
itself. If no __getstate__ method exists, a default
|
||||
itself. If no ``__getstate__`` method exists, a default
|
||||
implementation is used which is described below.
|
||||
|
||||
There's a subtle difference between classic and new-style
|
||||
classes here: if a classic class's __getstate__ returns None,
|
||||
self.__setstate__(None) will be called as part of unpickling.
|
||||
But if a new-style class's __getstate__ returns None, its
|
||||
__setstate__ won't be called at all as part of unpickling.
|
||||
classes here: if a classic class's ``__getstate__`` returns ``None``,
|
||||
``self.__setstate__(None)`` will be called as part of unpickling.
|
||||
But if a new-style class's ``__getstate__`` returns ``None``, its
|
||||
``__setstate__`` won't be called at all as part of unpickling.
|
||||
|
||||
If no __getstate__ method exists, a default state is computed.
|
||||
If no ``__getstate__`` method exists, a default state is computed.
|
||||
There are several cases:
|
||||
|
||||
- For a new-style class that has no instance __dict__ and no
|
||||
__slots__, the default state is None.
|
||||
- For a new-style class that has no instance ``__dict__`` and no
|
||||
``__slots__``, the default state is ``None``.
|
||||
|
||||
- For a new-style class that has an instance __dict__ and no
|
||||
__slots__, the default state is self.__dict__.
|
||||
- For a new-style class that has an instance ``__dict__`` and no
|
||||
``__slots__``, the default state is ``self.__dict__``.
|
||||
|
||||
- For a new-style class that has an instance __dict__ and
|
||||
__slots__, the default state is a tuple consisting of two
|
||||
dictionaries: self.__dict__, and a dictionary mapping slot
|
||||
- For a new-style class that has an instance ``__dict__`` and
|
||||
``__slots__``, the default state is a tuple consisting of two
|
||||
dictionaries: ``self.__dict__``, and a dictionary mapping slot
|
||||
names to slot values. Only slots that have a value are
|
||||
included in the latter.
|
||||
|
||||
- For a new-style class that has __slots__ and no instance
|
||||
__dict__, the default state is a tuple whose first item is
|
||||
None and whose second item is a dictionary mapping slot names
|
||||
- For a new-style class that has ``__slots__`` and no instance
|
||||
``__dict__``, the default state is a tuple whose first item is
|
||||
``None`` and whose second item is a dictionary mapping slot names
|
||||
to slot values described in the previous bullet.
|
||||
|
||||
The __setstate__ method
|
||||
The ``__setstate__`` method
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
The __setstate__ method should take one argument; it will be
|
||||
called with the value returned by __getstate__ or with the
|
||||
default state described above if no __getstate__ method is
|
||||
The ``__setstate__`` method should take one argument; it will be
|
||||
called with the value returned by ``__getstate__`` or with the
|
||||
default state described above if no ``__getstate__`` method is
|
||||
defined.
|
||||
|
||||
If no __setstate__ method exists, a default implementation is
|
||||
If no ``__setstate__`` method exists, a default implementation is
|
||||
provided that can handle the state returned by the default
|
||||
__getstate__, described above.
|
||||
``__getstate__``, described above.
|
||||
|
||||
The __getnewargs__ method
|
||||
The ``__getnewargs__`` method
|
||||
'''''''''''''''''''''''''''''
|
||||
|
||||
Like for classic classes, the __setstate__ method (or its
|
||||
Like for classic classes, the ``__setstate__`` method (or its
|
||||
default implementation) requires that a new object already
|
||||
exists so that its __setstate__ method can be called.
|
||||
exists so that its ``__setstate__`` method can be called.
|
||||
|
||||
In protocol 2, a new pickling opcode is used that causes a new
|
||||
object to be created as follows:
|
||||
object to be created as follows::
|
||||
|
||||
obj = C.__new__(C, *args)
|
||||
|
||||
where C is the class of the pickled object, and args is either
|
||||
the empty tuple, or the tuple returned by the __getnewargs__
|
||||
method, if defined. __getnewargs__ must return a tuple. The
|
||||
absence of a __getnewargs__ method is equivalent to the existence
|
||||
of one that returns ().
|
||||
where ``C`` is the class of the pickled object, and ``args`` is either
|
||||
the empty tuple, or the tuple returned by the ``__getnewargs__``
|
||||
method, if defined. ``__getnewargs__`` must return a tuple. The
|
||||
absence of a ``__getnewargs__`` method is equivalent to the existence
|
||||
of one that returns ``()``.
|
||||
|
||||
|
||||
The __newobj__ unpickling function
|
||||
The ``__newobj__`` unpickling function
|
||||
======================================
|
||||
|
||||
When the unpickling function returned by __reduce__ (the first
|
||||
item of the returned tuple) has the name __newobj__, something
|
||||
When the unpickling function returned by ``__reduce__`` (the first
|
||||
item of the returned tuple) has the name ``__newobj__``, something
|
||||
special happens for pickle protocol 2. An unpickling function
|
||||
named __newobj__ is assumed to have the following semantics:
|
||||
named ``__newobj__`` is assumed to have the following semantics::
|
||||
|
||||
def __newobj__(cls, *args):
|
||||
return cls.__new__(cls, *args)
|
||||
|
||||
Pickle protocol 2 special-cases an unpickling function with this
|
||||
name, and emits a pickling opcode that, given 'cls' and 'args',
|
||||
will return cls.__new__(cls, *args) without also pickling a
|
||||
reference to __newobj__ (this is the same pickling opcode used by
|
||||
protocol 2 for a new-style class instance when no __reduce__
|
||||
will return ``cls.__new__(cls, *args)`` without also pickling a
|
||||
reference to ``__newobj__`` (this is the same pickling opcode used by
|
||||
protocol 2 for a new-style class instance when no ``__reduce__``
|
||||
implementation exists). This is the main reason why protocol 2
|
||||
pickles are much smaller than classic pickles. Of course, the
|
||||
pickling code cannot verify that a function named __newobj__
|
||||
pickling code cannot verify that a function named ``__newobj__``
|
||||
actually has the expected semantics. If you use an unpickling
|
||||
function named __newobj__ that returns something different, you
|
||||
function named ``__newobj__`` that returns something different, you
|
||||
deserve what you get.
|
||||
|
||||
It is safe to use this feature under Python 2.2; there's nothing
|
||||
in the recommended implementation of __newobj__ that depends on
|
||||
in the recommended implementation of ``__newobj__`` that depends on
|
||||
Python 2.3.
|
||||
|
||||
|
||||
The extension registry
|
||||
======================
|
||||
|
||||
Protocol 2 supports a new mechanism to reduce the size of pickles.
|
||||
|
||||
|
@ -571,8 +599,8 @@ The extension registry
|
|||
|
||||
The extension registry allows one to represent the most frequently
|
||||
used names by small integers, which are pickled very efficiently:
|
||||
an extension code in the range 1-255 requires only two bytes
|
||||
including the opcode, one in the range 256-65535 requires only
|
||||
an extension code in the range 1--255 requires only two bytes
|
||||
including the opcode, one in the range 256--65535 requires only
|
||||
three bytes including the opcode.
|
||||
|
||||
One of the design goals of the pickle protocol is to make pickles
|
||||
|
@ -616,43 +644,48 @@ The extension registry
|
|||
|
||||
Here is the proposed initial assignment of extension code ranges:
|
||||
|
||||
===== ===== ===== =================================================
|
||||
First Last Count Purpose
|
||||
|
||||
0 0 1 Reserved -- will never be used
|
||||
===== ===== ===== =================================================
|
||||
0 0 1 Reserved --- will never be used
|
||||
1 127 127 Reserved for Python standard library
|
||||
128 191 64 Reserved for Zope
|
||||
192 239 48 Reserved for 3rd parties
|
||||
240 255 16 Reserved for private use (will never be assigned)
|
||||
256 MAX MAX Reserved for future assignment
|
||||
256 *MAX* *MAX* Reserved for future assignment
|
||||
===== ===== ===== =================================================
|
||||
|
||||
MAX stands for 2147483647, or 2**31-1. This is a hard limitation
|
||||
*MAX* stands for 2147483647, or ``2**31-1``. This is a hard limitation
|
||||
of the protocol as currently defined.
|
||||
|
||||
At the moment, no specific extension codes have been assigned yet.
|
||||
|
||||
|
||||
Extension registry API
|
||||
----------------------
|
||||
|
||||
The extension registry is maintained as private global variables
|
||||
in the copy_reg module. The following three functions are defined
|
||||
in the ``copy_reg`` module. The following three functions are defined
|
||||
in this module to manipulate the registry:
|
||||
|
||||
add_extension(module, name, code)
|
||||
Register an extension code. The module and name arguments
|
||||
must be strings; code must be an int in the inclusive range 1
|
||||
through MAX. This must either register a new (module, name)
|
||||
``add_extension(module, name, code)``
|
||||
Register an extension code. The *module* and *name* arguments
|
||||
must be strings; *code* must be an ``int`` in the inclusive range 1
|
||||
through *MAX*. This must either register a new ``(module, name)``
|
||||
pair to a new code, or be a redundant repeat of a previous
|
||||
call that was not canceled by a remove_extension() call; a
|
||||
(module, name) pair may not be mapped to more than one code,
|
||||
nor may a code be mapped to more than one (module, name)
|
||||
pair. (XXX Aliasing may actually cause a problem for this
|
||||
requirement; we'll see as we go.)
|
||||
call that was not canceled by a ``remove_extension()`` call; a
|
||||
``(module, name)`` pair may not be mapped to more than one code,
|
||||
nor may a code be mapped to more than one ``(module, name)``
|
||||
pair.
|
||||
|
||||
remove_extension(module, name, code)
|
||||
Arguments are as for add_extension(). Remove a previously
|
||||
registered mapping between (module, name) and code.
|
||||
.. XXX Aliasing may actually cause a problem for this
|
||||
requirement; we'll see as we go.
|
||||
|
||||
clear_extension_cache()
|
||||
``remove_extension(module, name, code)``
|
||||
Arguments are as for ``add_extension()``. Remove a previously
|
||||
registered mapping between ``(module, name)`` and *code*.
|
||||
|
||||
``clear_extension_cache()``
|
||||
The implementation of extension codes may use a cache to speed
|
||||
up loading objects that are named frequently. This cache can
|
||||
be emptied (removing references to cached objects) by calling
|
||||
|
@ -663,54 +696,56 @@ Extension registry API
|
|||
|
||||
|
||||
The copy module
|
||||
===============
|
||||
|
||||
Traditionally, the copy module has supported an extended subset of
|
||||
the pickling APIs for customizing the copy() and deepcopy()
|
||||
Traditionally, the ``copy`` module has supported an extended subset of
|
||||
the pickling APIs for customizing the ``copy()`` and ``deepcopy()``
|
||||
operations.
|
||||
|
||||
In particular, besides checking for a __copy__ or __deepcopy__
|
||||
method, copy() and deepcopy() have always looked for __reduce__,
|
||||
and for classic classes, have looked for __getinitargs__,
|
||||
__getstate__ and __setstate__.
|
||||
In particular, besides checking for a ``__copy__`` or ``__deepcopy__``
|
||||
method, ``copy()`` and ``deepcopy()`` have always looked for ``__reduce__``,
|
||||
and for classic classes, have looked for ``__getinitargs__``,
|
||||
``__getstate__`` and ``__setstate__``.
|
||||
|
||||
In Python 2.2, the default __reduce__ inherited from 'object' made
|
||||
In Python 2.2, the default ``__reduce__`` inherited from 'object' made
|
||||
copying simple new-style classes possible, but slots and various
|
||||
other special cases were not covered.
|
||||
|
||||
In Python 2.3, several changes are made to the copy module:
|
||||
In Python 2.3, several changes are made to the ``copy`` module:
|
||||
|
||||
- __reduce_ex__ is supported (and always called with 2 as the
|
||||
- ``__reduce_ex__`` is supported (and always called with 2 as the
|
||||
protocol version argument).
|
||||
|
||||
- The four- and five-argument return values of __reduce__ are
|
||||
- The four- and five-argument return values of ``__reduce__`` are
|
||||
supported.
|
||||
|
||||
- Before looking for a __reduce__ method, the
|
||||
copy_reg.dispatch_table is consulted, just like for pickling.
|
||||
- Before looking for a ``__reduce__`` method, the
|
||||
``copy_reg.dispatch_table`` is consulted, just like for pickling.
|
||||
|
||||
- When the __reduce__ method is inherited from object, it is
|
||||
- When the ``__reduce__`` method is inherited from object, it is
|
||||
(unconditionally) replaced by a better one that uses the same
|
||||
APIs as pickle protocol 2: __getnewargs__, __getstate__, and
|
||||
__setstate__, handling list and dict subclasses, and handling
|
||||
APIs as pickle protocol 2: ``__getnewargs__``, ``__getstate__``, and
|
||||
``__setstate__``, handling ``list`` and ``dict`` subclasses, and handling
|
||||
slots.
|
||||
|
||||
As a consequence of the latter change, certain new-style classes
|
||||
that were copyable under Python 2.2 are not copyable under Python
|
||||
2.3. (These classes are also not picklable using pickle protocol
|
||||
2.) A minimal example of such a class:
|
||||
2.) A minimal example of such a class::
|
||||
|
||||
class C(object):
|
||||
def __new__(cls, a):
|
||||
return object.__new__(cls)
|
||||
|
||||
The problem only occurs when __new__ is overridden and has at
|
||||
The problem only occurs when ``__new__`` is overridden and has at
|
||||
least one mandatory argument in addition to the class argument.
|
||||
|
||||
To fix this, a __getnewargs__ method should be added that returns
|
||||
To fix this, a ``__getnewargs__`` method should be added that returns
|
||||
the appropriate argument tuple (excluding the class).
|
||||
|
||||
|
||||
Pickling Python longs
|
||||
=====================
|
||||
|
||||
Pickling and unpickling Python longs takes time quadratic in
|
||||
the number of digits, in protocols 0 and 1. Under protocol 2,
|
||||
|
@ -718,8 +753,9 @@ Pickling Python longs
|
|||
|
||||
|
||||
Pickling bools
|
||||
==============
|
||||
|
||||
Protocol 2 introduces new opcodes for pickling True and False
|
||||
Protocol 2 introduces new opcodes for pickling ``True`` and ``False``
|
||||
directly. Under protocols 0 and 1, bools are pickled as integers,
|
||||
using a trick in the representation of the integer in the pickle
|
||||
so that an unpickler can recognize that a bool was intended. That
|
||||
|
@ -728,6 +764,7 @@ Pickling bools
|
|||
|
||||
|
||||
Pickling small tuples
|
||||
=====================
|
||||
|
||||
Protocol 2 introduces new opcodes for more-compact pickling of
|
||||
tuples of lengths 1, 2 and 3. Protocol 1 previously introduced
|
||||
|
@ -735,6 +772,7 @@ Pickling small tuples
|
|||
|
||||
|
||||
Protocol identification
|
||||
=======================
|
||||
|
||||
Protocol 2 introduces a new opcode, with which all protocol 2
|
||||
pickles begin, identifying that the pickle is protocol 2.
|
||||
|
@ -744,6 +782,7 @@ Protocol identification
|
|||
|
||||
|
||||
Pickling of large lists and dicts
|
||||
=================================
|
||||
|
||||
Protocol 1 pickles large lists and dicts "in one piece", which
|
||||
minimizes pickle size, but requires that unpickling create a temp
|
||||
|
@ -752,17 +791,19 @@ Pickling of large lists and dicts
|
|||
more than 1000 elements each, so that unpickling needn't create
|
||||
a temp object larger than needed to hold 1000 elements. This
|
||||
isn't part of protocol 2, however: the opcodes produced are still
|
||||
part of protocol 1. __reduce__ implementations that return the
|
||||
part of protocol 1. ``__reduce__`` implementations that return the
|
||||
optional new listitems or dictitems iterators also benefit from
|
||||
this unpickling temp-space optimization.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
|
|
Loading…
Reference in New Issue