Refactored according to 3 main cases.
This commit is contained in:
parent
6b1fc6f8c0
commit
eb93e1fcdc
391
pep-0307.txt
391
pep-0307.txt
|
@ -216,16 +216,249 @@ Extended __reduce__ API
|
|||
state (how this is done is up to the application).
|
||||
|
||||
|
||||
XXX Refactoring needed
|
||||
Customizing pickling absent a __reduce__ implementation
|
||||
|
||||
The following sections should really be reorganized according to
|
||||
the following cases:
|
||||
If no __reduce__ implementation is available for a particular
|
||||
class, there are three cases that need to be considered
|
||||
separately, because they are handled differently:
|
||||
|
||||
1. classic classes, all protocols
|
||||
1. classic class instances, all protocols
|
||||
|
||||
2. new-style classes, protocols 0 and 1
|
||||
2. new-style class instances, protocols 0 and 1
|
||||
|
||||
3. new-style classes, protocol 2
|
||||
3. new-style class instances, protocol 2
|
||||
|
||||
Types implemented in C are considered new-style classes. However,
|
||||
except for the common built-in types, these need to provide a
|
||||
__reduce__ implementation in order to be picklable with protocols
|
||||
0 or 1. Protocol 2 supports built-in types providing
|
||||
__getnewargs__, __getstate__ and __setstate__ as well.
|
||||
|
||||
|
||||
Case 1: pickling classic class instances
|
||||
|
||||
This case is the same for all protocols, and is unchanged from
|
||||
Python 2.1.
|
||||
|
||||
For classic classes, __reduce__ is not used. Instead, classic
|
||||
classes can customize their pickling by providing methods named
|
||||
__getstate__, __setstate__ and __getinitargs__. Absent these, a
|
||||
default pickling strategy for classic class instances is
|
||||
implemented that works as long as all instance variables are
|
||||
picklable. This default strategy is documented in terms of
|
||||
default implementations of __getstate__ and __setstate__.
|
||||
|
||||
The primary ways to customize pickling of classic class instances
|
||||
is by specifying __getstate__ and/or __setstate__ methods. It is
|
||||
fine if a class implements one of these but not the other, as long
|
||||
as it is compatible with the default version.
|
||||
|
||||
The __getstate__ method
|
||||
|
||||
The __getstate__ method should return a picklable value
|
||||
representing the object's state without referencing the object
|
||||
itself. If no __getstate__ method exists, a default
|
||||
implementation is used that returns self.__dict__.
|
||||
|
||||
The __setstate__ method
|
||||
|
||||
The __setstate__ method should take one argument; it will be
|
||||
called with the value returned by __getstate__ (or its default
|
||||
implementation).
|
||||
|
||||
If no __setstate__ method exists, a default implementation is
|
||||
provided that assumes the state is a dictionary mapping instance
|
||||
variable names to values. The default implementation tries two
|
||||
things:
|
||||
|
||||
- First, it tries to call self.__dict__.update(state).
|
||||
|
||||
- If the update() call fails with a RuntimeError exception, it
|
||||
calls setattr(self, key, value) for each (key, value) pair in
|
||||
the state dictionary. This only happens when unpickling in
|
||||
restricted execution mode (see the rexec standard library
|
||||
module).
|
||||
|
||||
The __getinitargs__ method
|
||||
|
||||
The __setstate__ method (or its default implementation) requires
|
||||
that a new object already exists so that its __setstate__ method
|
||||
can be called. The point is to create a new object that isn't
|
||||
fully initialized; in particular, the class's __init__ method
|
||||
should not be called if possible.
|
||||
|
||||
These are the possibilities:
|
||||
|
||||
- Normally, the following trick is used: create an instance of a
|
||||
trivial classic class (one without any methods or instance
|
||||
variables) and then use __class__ assignment to change its
|
||||
class to the desired class. This creates an instance of the
|
||||
desired class with an empty __dict__ whose __init__ has not
|
||||
been called.
|
||||
|
||||
- However, if the class has a method named __getinitargs__, the
|
||||
above trick is not used, and a class instance is created by
|
||||
using the tuple returned by __getinitargs__ as an argument
|
||||
list to the class constructor. This is done even if
|
||||
__getinitargs__ returns an empty tuple -- a __getinitargs__
|
||||
method that returns () is not equivalent to not having
|
||||
__getinitargs__ at all. __getinitargs__ *must* return a
|
||||
tuple.
|
||||
|
||||
- In restricted execution mode, the trick from the first bullet
|
||||
doesn't work; in this case, the class constructor is called
|
||||
with an empty argument list if no __getinitargs__ method
|
||||
exists. This means that in order for a classic class to be
|
||||
unpicklable in restricted execution mode, it must either
|
||||
implement __getinitargs__ or its constructor (i.e., its
|
||||
__init__ method) must be callable without arguments.
|
||||
|
||||
|
||||
Case 2: pickling new-style class instances using protocols 0 or 1
|
||||
|
||||
This case is unchanged from Python 2.2. For better pickling of
|
||||
new-style class instances when backwards compatibility is not an
|
||||
issue, protocol 2 should be used; see case 3 below.
|
||||
|
||||
New-style classes, whether implemented in C or in Python, inherit
|
||||
a default __reduce__ implementation from the universal base class
|
||||
'object'.
|
||||
|
||||
This default __reduce__ implementation is not used for those
|
||||
built-in types for which the pickle module has built-in support.
|
||||
Here's a full list of those types:
|
||||
|
||||
- Concrete built-in types: NoneType, bool, int, float, complex,
|
||||
str, unicode, tuple, list, dict. (Complex is supported by
|
||||
virtue of a __reduce__ implementation registered in copy_reg.)
|
||||
In Jython, PyStringMap is also included in this list.
|
||||
|
||||
- Classic instances.
|
||||
|
||||
- Classic class objects, Python function objects, built-in
|
||||
function and method objects, and new-style type objects (==
|
||||
new-style class objects). These are pickled by name, not by
|
||||
value: at unpickling time, a reference to an object with the
|
||||
same name (the fully qualified module name plus the variable
|
||||
name in that module) is substituted.
|
||||
|
||||
The default __reduce__ implementation will fail at pickling time
|
||||
for built-in types not mentioned above.
|
||||
|
||||
For new-style classes implemented in Python, the default
|
||||
__reduce__ implementation works as follows:
|
||||
|
||||
Let D be the class on the object to be pickled. First, find the
|
||||
nearest base class that is implemented in C (either as a
|
||||
built-in type or as a type defined by an extension class). Call
|
||||
this base class B, and the class of the object to be pickled D.
|
||||
Unless B is the class 'object', instances of class B must be
|
||||
picklable, either by having built-in support (as defined in the
|
||||
above three bullet points), or by having a non-default
|
||||
__reduce__ implementation. B must not be the same class as D
|
||||
(if it were, it would mean that D is not implemented in Python).
|
||||
|
||||
The new object is created at unpickling time using the following
|
||||
code:
|
||||
|
||||
obj = B.__new__(D, state)
|
||||
B.__init__(obj, state)
|
||||
|
||||
where state is a value computed at pickling time as follows:
|
||||
|
||||
state = B(obj)
|
||||
|
||||
Objects for which this default __reduce__ implementation is used
|
||||
can customize it by defining __getstate__ and/or __setstate__
|
||||
methods. These work almost the same as described for classic
|
||||
classes above, except that if __getstate__ returns an object (of
|
||||
any type) whose value is considered false (e.g. None, or a number
|
||||
that is zero, or an empty sequence or mapping), this state is not
|
||||
pickled and __setstate__ will not be called at all.
|
||||
|
||||
Note that this strategy ignores slots. New-style classes that
|
||||
define slots and don't define __getstate__ in the same class that
|
||||
defines the slots automatically have a __getstate__ method added
|
||||
that raises TypeError.
|
||||
|
||||
|
||||
Case 3: pickling new-style class instances using protocol 2
|
||||
|
||||
Under protocol 2, the default __reduce__ implementation inherited
|
||||
from the 'object' base class is *ignored*. Instead, a different
|
||||
default implementation is used, which allows more efficient
|
||||
pickling of new-style class instances than possible with protocols
|
||||
0 or 1, at the cost of backward incompatibility with Python 2.2.
|
||||
|
||||
The customization uses three special methods: __getstate__,
|
||||
__setstate__ and __getnewargs__. It is fine if a class implements
|
||||
one or more but not all of these, as long as it is compatible with
|
||||
the default implementations.
|
||||
|
||||
The __getstate__ method
|
||||
|
||||
The __getstate__ method should return a picklable value
|
||||
representing the object's state without referencing the object
|
||||
itself. If no __getstate__ method exists, a default
|
||||
implementation is used which is described below.
|
||||
|
||||
There's a subtle difference between classic and new-style
|
||||
classes here: if a classic class's __getstate__ returns None,
|
||||
self.__setstate__(None) will be called as part of unpickling.
|
||||
But if a new-style class's __getstate__ returns None, its
|
||||
__setstate__ won't be called at all as part of unpickling.
|
||||
|
||||
If no __getstate__ method exists, a default state is assumed.
|
||||
There are several cases:
|
||||
|
||||
- For a new-style class that has an instance __dict__ and no
|
||||
__slots__, the default state is self.__dict__.
|
||||
|
||||
- For a new-style class that has no instance __dict__ and no
|
||||
__slots__, the default __state__ is None.
|
||||
|
||||
- For a new-style class that has an instance __dict__ and
|
||||
__slots__, the default state is a tuple consisting of two
|
||||
dictionaries: the first being self.__dict__, and the second
|
||||
being a dictionary mapping slot names to slot values. Only
|
||||
slots that have a value are included in the latter.
|
||||
|
||||
- For a new-style class that has __slots__ and no instance
|
||||
__dict__, the default state is a tuple whose first item is
|
||||
None and whose second item is a dictionary mapping slot names
|
||||
to slot values described in the previous bullet.
|
||||
|
||||
Note that new-style classes that define slots and don't define
|
||||
__getstate__ in the same class that defines the slots
|
||||
automatically have a __getstate__ method added that raises
|
||||
TypeError. Protocol 2 ignores this __getstate__ method
|
||||
(recognized by the specific text of the error message).
|
||||
|
||||
The __setstate__ method
|
||||
|
||||
The __setstate__ should take one argument; it will be called
|
||||
with the value returned by __getstate__ or with the default
|
||||
state described above if no __setstate__ method is defined.
|
||||
|
||||
If no __setstate__ method exists, a default implementation is
|
||||
provided that can handle the state returned by the default
|
||||
__getstate__, described above.
|
||||
|
||||
The __getnewargs__ method
|
||||
|
||||
Like for classic classes, the __setstate__ method (or its
|
||||
default implementation) requires that a new object already
|
||||
exists so that its __setstate__ method can be called.
|
||||
|
||||
In protocol 2, a new pickling opcode is used that causes a new
|
||||
object to be created as follows:
|
||||
|
||||
obj = C.__new__(C, *args)
|
||||
|
||||
where args is either the empty tuple, or the tuple returned by
|
||||
the __getnewargs__ method, if defined. __getnewargs__ must
|
||||
return a tuple. The absence of a __getnewargs__ method is
|
||||
equivalent to the existence of one that returns ().
|
||||
|
||||
|
||||
The __newobj__ unpickling function
|
||||
|
@ -241,8 +474,10 @@ The __newobj__ unpickling function
|
|||
Pickle protocol 2 special-cases an unpickling function with this
|
||||
name, and emits a pickling opcode that, given 'cls' and 'args',
|
||||
will return cls.__new__(cls, *args) without also pickling a
|
||||
reference to __newobj__. This is the main reason why protocol 2
|
||||
pickles are so much smaller than classic pickles. Of course, the
|
||||
reference to __newobj__ (this is the same pickling opcode used by
|
||||
protocol 2 for a new-style class instance when no __reduce__
|
||||
implementation exists). This is the main reason why protocol 2
|
||||
pickles are much smaller than classic pickles. Of course, the
|
||||
pickling code cannot verify that a function named __newobj__
|
||||
actually has the expected semantics. If you use an unpickling
|
||||
function named __newobj__ that returns something different, you
|
||||
|
@ -253,146 +488,6 @@ The __newobj__ unpickling function
|
|||
Python 2.3.
|
||||
|
||||
|
||||
The __getstate__ and __setstate__ methods
|
||||
|
||||
When there is no __reduce__ for an object, the primary ways to
|
||||
customize pickling is by specifying __getstate__ and/or
|
||||
__setstate__ methods. These are supported for classic classes as
|
||||
well as for new-style classes for which no __reduce__ exists.
|
||||
|
||||
When __reduce__ exists, __getstate__ is not called (unless your
|
||||
__reduce__ implementation calls it), but __setstate__ will be
|
||||
called with the third item from the tuple returned by __reduce__,
|
||||
if not None.
|
||||
|
||||
There's a subtle difference between classic and new-style classes
|
||||
here: if a classic class's __getstate__ returns None,
|
||||
self.__setstate__(None) will be called as part of unpickling. But
|
||||
if a new-style class's __getstate__ returns None, its __setstate__
|
||||
won't be called at all as part of unpickling.
|
||||
|
||||
The __getstate__ method is supposed to return a picklable version
|
||||
of an object's state that does not reference the object itself.
|
||||
If no __getstate__ method exists, a default state is assumed.
|
||||
There are several cases:
|
||||
|
||||
- For a classic class, the default state is self.__dict__.
|
||||
|
||||
- For a new-style class that has an instance __dict__ and no
|
||||
__slots__, the default state is self.__dict__.
|
||||
|
||||
- For a new-style class that has no instance __dict__ and no
|
||||
__slots__, the default __state__ is None.
|
||||
|
||||
- For a new-style class that has an instance __dict__ and
|
||||
__slots__, the default state is a tuple consisting of two
|
||||
dictionaries: the first being self.__dict__, and the second
|
||||
being a dictionary mapping slot names to slot values. Only
|
||||
slots that have a value are included in the latter.
|
||||
|
||||
- For a new-style class that has __slots__ and no instance
|
||||
__dict__, the default state is a tuple whose first item is None
|
||||
and whose second item is a dictionary mapping slot names to slot
|
||||
values described in the previous bullet.
|
||||
|
||||
The __setstate__ should take one argument; it will be called with
|
||||
the value returned by __getstate__ or with the default state
|
||||
described above if no __setstate__ method is defined.
|
||||
|
||||
If no __setstate__ method exists, a default implementation is
|
||||
provided that can handle the state returned by the default
|
||||
__getstate__.
|
||||
|
||||
It is fine if a class implements one of these but not the other,
|
||||
as long as it is compatible with the default version.
|
||||
|
||||
New-style classes that inherit a default __reduce__ implementation
|
||||
from the ultimate base class 'object'. This implementation is not
|
||||
used for protocol 2, and then last four bullets above apply. For
|
||||
protocols 0 and 1, the default implementation looks for a
|
||||
__getstate__ method, and if none exists, it uses a simpler default
|
||||
strategy:
|
||||
|
||||
- If there is an instance __dict__, the state is self.__dict__.
|
||||
|
||||
- Otherwise, the state is None (and __setstate__ will not be
|
||||
called).
|
||||
|
||||
Note that this strategy ignores slots. New-style classes that
|
||||
define slots and don't define __getstate__ in the same class that
|
||||
defines the slots automatically have a __getstate__ method added
|
||||
that raises TypeError. Protocol 2 ignores this __getstate__
|
||||
method (recognized by the specific text of the error message).
|
||||
|
||||
|
||||
The __getinitargs__ and __getnewargs__ methods
|
||||
|
||||
The __setstate__ method (or its default implementation) requires
|
||||
that a new object already exists so that its __setstate__ method
|
||||
can be called. The point is to create a new object that isn't
|
||||
fully initialized; in particular, the class's __init__ method
|
||||
should not be called if possible.
|
||||
|
||||
The way this is done differs between classic and new-style
|
||||
classes.
|
||||
|
||||
For classic classes, these are the possibilities:
|
||||
|
||||
- Normally, the following trick is used: create an instance of a
|
||||
trivial classic class (one without any methods or instance
|
||||
variables) and then use __class__ assignment to change its class
|
||||
to the desired class. This creates an instance of the desired
|
||||
class with an empty __dict__ whose __init__ has not been called.
|
||||
|
||||
- However, if the class has a method named __getinitargs__, the
|
||||
above trick is not used, and a class instance is created by
|
||||
using the tuple returned by __getinitargs__ as an argument list
|
||||
to the class constructor. This is done even if __getinitargs__
|
||||
returns an empty tuple -- a __getinitargs__ method that returns
|
||||
() is not equivalent to not having __getinitargs__ at all.
|
||||
__getinitargs__ *must* return a tuple.
|
||||
|
||||
- In restricted execution mode, the trick from the first bullet
|
||||
doesn't work; in this case, the class constructor is called with
|
||||
an empty argument list if no __getinitargs__ method exists.
|
||||
This means that in order for a classic class to be unpicklable
|
||||
in restricted mode, it must either implement __getinitargs__ or
|
||||
its constructor (i.e., its __init__ method) must be callable
|
||||
without arguments.
|
||||
|
||||
For new-style classes, these are the possibilities:
|
||||
|
||||
- When using protocol 0 or 1, a default __reduce__ implementation
|
||||
is normally inherited from the ultimate base class class
|
||||
'object'. This implementation finds the nearest base class that
|
||||
is implemented in C (either as a built-in type or as a type
|
||||
defined by an extension class). Calling this base class B and
|
||||
the class of the object to be pickled C, the new object is
|
||||
created at unpickling time using the following code:
|
||||
|
||||
obj = B.__new__(C, state)
|
||||
B.__init__(obj, state)
|
||||
|
||||
where state is a value computed at pickling time as follows:
|
||||
|
||||
state = B(obj)
|
||||
|
||||
This only works when B is not C, and only for certain classes
|
||||
B. It does work for the following built-in classes: int, long,
|
||||
float, complex, str, unicode, tuple, list, dict; and this is its
|
||||
main redeeming factor.
|
||||
|
||||
- When using protocol 2, the default __reduce__ implementation
|
||||
inherited from 'object' is ignored. Instead, a new pickling
|
||||
opcode is generated that causes a new object to be created as
|
||||
follows:
|
||||
|
||||
obj = C.__new__(C, *args)
|
||||
|
||||
where args is either the empty tuple, or the tuple returned by
|
||||
the __getnewargs__ method, if defined.
|
||||
|
||||
|
||||
TBD
|
||||
|
||||
The rest of this PEP is still under construction!
|
||||
|
|
Loading…
Reference in New Issue