Refactored according to 3 main cases.

This commit is contained in:
Guido van Rossum 2003-02-04 17:53:55 +00:00
parent 6b1fc6f8c0
commit eb93e1fcdc
1 changed files with 243 additions and 148 deletions

View File

@ -216,16 +216,249 @@ Extended __reduce__ API
state (how this is done is up to the application).
XXX Refactoring needed
Customizing pickling absent a __reduce__ implementation
The following sections should really be reorganized according to
the following cases:
If no __reduce__ implementation is available for a particular
class, there are three cases that need to be considered
separately, because they are handled differently:
1. classic classes, all protocols
1. classic class instances, all protocols
2. new-style classes, protocols 0 and 1
2. new-style class instances, protocols 0 and 1
3. new-style classes, protocol 2
3. new-style class instances, protocol 2
Types implemented in C are considered new-style classes. However,
except for the common built-in types, these need to provide a
__reduce__ implementation in order to be picklable with protocols
0 or 1. Protocol 2 supports built-in types providing
__getnewargs__, __getstate__ and __setstate__ as well.
Case 1: pickling classic class instances
This case is the same for all protocols, and is unchanged from
Python 2.1.
For classic classes, __reduce__ is not used. Instead, classic
classes can customize their pickling by providing methods named
__getstate__, __setstate__ and __getinitargs__. Absent these, a
default pickling strategy for classic class instances is
implemented that works as long as all instance variables are
picklable. This default strategy is documented in terms of
default implementations of __getstate__ and __setstate__.
The primary ways to customize pickling of classic class instances
is by specifying __getstate__ and/or __setstate__ methods. It is
fine if a class implements one of these but not the other, as long
as it is compatible with the default version.
The __getstate__ method
The __getstate__ method should return a picklable value
representing the object's state without referencing the object
itself. If no __getstate__ method exists, a default
implementation is used that returns self.__dict__.
The __setstate__ method
The __setstate__ method should take one argument; it will be
called with the value returned by __getstate__ (or its default
implementation).
If no __setstate__ method exists, a default implementation is
provided that assumes the state is a dictionary mapping instance
variable names to values. The default implementation tries two
things:
- First, it tries to call self.__dict__.update(state).
- If the update() call fails with a RuntimeError exception, it
calls setattr(self, key, value) for each (key, value) pair in
the state dictionary. This only happens when unpickling in
restricted execution mode (see the rexec standard library
module).
The __getinitargs__ method
The __setstate__ method (or its default implementation) requires
that a new object already exists so that its __setstate__ method
can be called. The point is to create a new object that isn't
fully initialized; in particular, the class's __init__ method
should not be called if possible.
These are the possibilities:
- Normally, the following trick is used: create an instance of a
trivial classic class (one without any methods or instance
variables) and then use __class__ assignment to change its
class to the desired class. This creates an instance of the
desired class with an empty __dict__ whose __init__ has not
been called.
- However, if the class has a method named __getinitargs__, the
above trick is not used, and a class instance is created by
using the tuple returned by __getinitargs__ as an argument
list to the class constructor. This is done even if
__getinitargs__ returns an empty tuple -- a __getinitargs__
method that returns () is not equivalent to not having
__getinitargs__ at all. __getinitargs__ *must* return a
tuple.
- In restricted execution mode, the trick from the first bullet
doesn't work; in this case, the class constructor is called
with an empty argument list if no __getinitargs__ method
exists. This means that in order for a classic class to be
unpicklable in restricted execution mode, it must either
implement __getinitargs__ or its constructor (i.e., its
__init__ method) must be callable without arguments.
Case 2: pickling new-style class instances using protocols 0 or 1
This case is unchanged from Python 2.2. For better pickling of
new-style class instances when backwards compatibility is not an
issue, protocol 2 should be used; see case 3 below.
New-style classes, whether implemented in C or in Python, inherit
a default __reduce__ implementation from the universal base class
'object'.
This default __reduce__ implementation is not used for those
built-in types for which the pickle module has built-in support.
Here's a full list of those types:
- Concrete built-in types: NoneType, bool, int, float, complex,
str, unicode, tuple, list, dict. (Complex is supported by
virtue of a __reduce__ implementation registered in copy_reg.)
In Jython, PyStringMap is also included in this list.
- Classic instances.
- Classic class objects, Python function objects, built-in
function and method objects, and new-style type objects (==
new-style class objects). These are pickled by name, not by
value: at unpickling time, a reference to an object with the
same name (the fully qualified module name plus the variable
name in that module) is substituted.
The default __reduce__ implementation will fail at pickling time
for built-in types not mentioned above.
For new-style classes implemented in Python, the default
__reduce__ implementation works as follows:
Let D be the class on the object to be pickled. First, find the
nearest base class that is implemented in C (either as a
built-in type or as a type defined by an extension class). Call
this base class B, and the class of the object to be pickled D.
Unless B is the class 'object', instances of class B must be
picklable, either by having built-in support (as defined in the
above three bullet points), or by having a non-default
__reduce__ implementation. B must not be the same class as D
(if it were, it would mean that D is not implemented in Python).
The new object is created at unpickling time using the following
code:
obj = B.__new__(D, state)
B.__init__(obj, state)
where state is a value computed at pickling time as follows:
state = B(obj)
Objects for which this default __reduce__ implementation is used
can customize it by defining __getstate__ and/or __setstate__
methods. These work almost the same as described for classic
classes above, except that if __getstate__ returns an object (of
any type) whose value is considered false (e.g. None, or a number
that is zero, or an empty sequence or mapping), this state is not
pickled and __setstate__ will not be called at all.
Note that this strategy ignores slots. New-style classes that
define slots and don't define __getstate__ in the same class that
defines the slots automatically have a __getstate__ method added
that raises TypeError.
Case 3: pickling new-style class instances using protocol 2
Under protocol 2, the default __reduce__ implementation inherited
from the 'object' base class is *ignored*. Instead, a different
default implementation is used, which allows more efficient
pickling of new-style class instances than possible with protocols
0 or 1, at the cost of backward incompatibility with Python 2.2.
The customization uses three special methods: __getstate__,
__setstate__ and __getnewargs__. It is fine if a class implements
one or more but not all of these, as long as it is compatible with
the default implementations.
The __getstate__ method
The __getstate__ method should return a picklable value
representing the object's state without referencing the object
itself. If no __getstate__ method exists, a default
implementation is used which is described below.
There's a subtle difference between classic and new-style
classes here: if a classic class's __getstate__ returns None,
self.__setstate__(None) will be called as part of unpickling.
But if a new-style class's __getstate__ returns None, its
__setstate__ won't be called at all as part of unpickling.
If no __getstate__ method exists, a default state is assumed.
There are several cases:
- For a new-style class that has an instance __dict__ and no
__slots__, the default state is self.__dict__.
- For a new-style class that has no instance __dict__ and no
__slots__, the default __state__ is None.
- For a new-style class that has an instance __dict__ and
__slots__, the default state is a tuple consisting of two
dictionaries: the first being self.__dict__, and the second
being a dictionary mapping slot names to slot values. Only
slots that have a value are included in the latter.
- For a new-style class that has __slots__ and no instance
__dict__, the default state is a tuple whose first item is
None and whose second item is a dictionary mapping slot names
to slot values described in the previous bullet.
Note that new-style classes that define slots and don't define
__getstate__ in the same class that defines the slots
automatically have a __getstate__ method added that raises
TypeError. Protocol 2 ignores this __getstate__ method
(recognized by the specific text of the error message).
The __setstate__ method
The __setstate__ should take one argument; it will be called
with the value returned by __getstate__ or with the default
state described above if no __setstate__ method is defined.
If no __setstate__ method exists, a default implementation is
provided that can handle the state returned by the default
__getstate__, described above.
The __getnewargs__ method
Like for classic classes, the __setstate__ method (or its
default implementation) requires that a new object already
exists so that its __setstate__ method can be called.
In protocol 2, a new pickling opcode is used that causes a new
object to be created as follows:
obj = C.__new__(C, *args)
where args is either the empty tuple, or the tuple returned by
the __getnewargs__ method, if defined. __getnewargs__ must
return a tuple. The absence of a __getnewargs__ method is
equivalent to the existence of one that returns ().
The __newobj__ unpickling function
@ -241,8 +474,10 @@ The __newobj__ unpickling function
Pickle protocol 2 special-cases an unpickling function with this
name, and emits a pickling opcode that, given 'cls' and 'args',
will return cls.__new__(cls, *args) without also pickling a
reference to __newobj__. This is the main reason why protocol 2
pickles are so much smaller than classic pickles. Of course, the
reference to __newobj__ (this is the same pickling opcode used by
protocol 2 for a new-style class instance when no __reduce__
implementation exists). This is the main reason why protocol 2
pickles are much smaller than classic pickles. Of course, the
pickling code cannot verify that a function named __newobj__
actually has the expected semantics. If you use an unpickling
function named __newobj__ that returns something different, you
@ -253,146 +488,6 @@ The __newobj__ unpickling function
Python 2.3.
The __getstate__ and __setstate__ methods
When there is no __reduce__ for an object, the primary ways to
customize pickling is by specifying __getstate__ and/or
__setstate__ methods. These are supported for classic classes as
well as for new-style classes for which no __reduce__ exists.
When __reduce__ exists, __getstate__ is not called (unless your
__reduce__ implementation calls it), but __setstate__ will be
called with the third item from the tuple returned by __reduce__,
if not None.
There's a subtle difference between classic and new-style classes
here: if a classic class's __getstate__ returns None,
self.__setstate__(None) will be called as part of unpickling. But
if a new-style class's __getstate__ returns None, its __setstate__
won't be called at all as part of unpickling.
The __getstate__ method is supposed to return a picklable version
of an object's state that does not reference the object itself.
If no __getstate__ method exists, a default state is assumed.
There are several cases:
- For a classic class, the default state is self.__dict__.
- For a new-style class that has an instance __dict__ and no
__slots__, the default state is self.__dict__.
- For a new-style class that has no instance __dict__ and no
__slots__, the default __state__ is None.
- For a new-style class that has an instance __dict__ and
__slots__, the default state is a tuple consisting of two
dictionaries: the first being self.__dict__, and the second
being a dictionary mapping slot names to slot values. Only
slots that have a value are included in the latter.
- For a new-style class that has __slots__ and no instance
__dict__, the default state is a tuple whose first item is None
and whose second item is a dictionary mapping slot names to slot
values described in the previous bullet.
The __setstate__ should take one argument; it will be called with
the value returned by __getstate__ or with the default state
described above if no __setstate__ method is defined.
If no __setstate__ method exists, a default implementation is
provided that can handle the state returned by the default
__getstate__.
It is fine if a class implements one of these but not the other,
as long as it is compatible with the default version.
New-style classes that inherit a default __reduce__ implementation
from the ultimate base class 'object'. This implementation is not
used for protocol 2, and then last four bullets above apply. For
protocols 0 and 1, the default implementation looks for a
__getstate__ method, and if none exists, it uses a simpler default
strategy:
- If there is an instance __dict__, the state is self.__dict__.
- Otherwise, the state is None (and __setstate__ will not be
called).
Note that this strategy ignores slots. New-style classes that
define slots and don't define __getstate__ in the same class that
defines the slots automatically have a __getstate__ method added
that raises TypeError. Protocol 2 ignores this __getstate__
method (recognized by the specific text of the error message).
The __getinitargs__ and __getnewargs__ methods
The __setstate__ method (or its default implementation) requires
that a new object already exists so that its __setstate__ method
can be called. The point is to create a new object that isn't
fully initialized; in particular, the class's __init__ method
should not be called if possible.
The way this is done differs between classic and new-style
classes.
For classic classes, these are the possibilities:
- Normally, the following trick is used: create an instance of a
trivial classic class (one without any methods or instance
variables) and then use __class__ assignment to change its class
to the desired class. This creates an instance of the desired
class with an empty __dict__ whose __init__ has not been called.
- However, if the class has a method named __getinitargs__, the
above trick is not used, and a class instance is created by
using the tuple returned by __getinitargs__ as an argument list
to the class constructor. This is done even if __getinitargs__
returns an empty tuple -- a __getinitargs__ method that returns
() is not equivalent to not having __getinitargs__ at all.
__getinitargs__ *must* return a tuple.
- In restricted execution mode, the trick from the first bullet
doesn't work; in this case, the class constructor is called with
an empty argument list if no __getinitargs__ method exists.
This means that in order for a classic class to be unpicklable
in restricted mode, it must either implement __getinitargs__ or
its constructor (i.e., its __init__ method) must be callable
without arguments.
For new-style classes, these are the possibilities:
- When using protocol 0 or 1, a default __reduce__ implementation
is normally inherited from the ultimate base class class
'object'. This implementation finds the nearest base class that
is implemented in C (either as a built-in type or as a type
defined by an extension class). Calling this base class B and
the class of the object to be pickled C, the new object is
created at unpickling time using the following code:
obj = B.__new__(C, state)
B.__init__(obj, state)
where state is a value computed at pickling time as follows:
state = B(obj)
This only works when B is not C, and only for certain classes
B. It does work for the following built-in classes: int, long,
float, complex, str, unicode, tuple, list, dict; and this is its
main redeeming factor.
- When using protocol 2, the default __reduce__ implementation
inherited from 'object' is ignored. Instead, a new pickling
opcode is generated that causes a new object to be created as
follows:
obj = C.__new__(C, *args)
where args is either the empty tuple, or the tuple returned by
the __getnewargs__ method, if defined.
TBD
The rest of this PEP is still under construction!