2003-01-31 14:12:53 -05:00
|
|
|
|
PEP: 307
|
|
|
|
|
Title: Extensions to the pickle protocol
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Guido van Rossum, Tim Peters
|
2004-08-07 11:59:56 -04:00
|
|
|
|
Status: Final
|
2003-01-31 14:12:53 -05:00
|
|
|
|
Type: Standards Track
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Content-Type: text/x-rst
|
2003-01-31 14:12:53 -05:00
|
|
|
|
Created: 31-Jan-2003
|
2003-02-07 13:11:29 -05:00
|
|
|
|
Post-History: 7-Feb-2003
|
2003-01-31 14:12:53 -05:00
|
|
|
|
|
|
|
|
|
Introduction
|
2017-03-27 15:18:45 -04:00
|
|
|
|
============
|
2003-01-31 14:12:53 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Pickling new-style objects in Python 2.2 is done somewhat clumsily
|
|
|
|
|
and causes pickle size to bloat compared to classic class
|
|
|
|
|
instances. This PEP documents a new pickle protocol in Python 2.3
|
|
|
|
|
that takes care of this and many other pickle issues.
|
2003-01-31 14:12:53 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
There are two sides to specifying a new pickle protocol: the byte
|
|
|
|
|
stream constituting pickled data must be specified, and the
|
|
|
|
|
interface between objects and the pickling and unpickling engines
|
|
|
|
|
must be specified. This PEP focuses on API issues, although it
|
|
|
|
|
may occasionally touch on byte stream format details to motivate a
|
|
|
|
|
choice. The pickle byte stream format is documented formally by
|
|
|
|
|
the standard library module ``pickletools.py`` (already checked into
|
|
|
|
|
CVS for Python 2.3).
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
This PEP attempts to fully document the interface between pickled
|
|
|
|
|
objects and the pickling process, highlighting additions by
|
|
|
|
|
specifying "new in this PEP". (The interface to invoke pickling
|
|
|
|
|
or unpickling is not covered fully, except for the changes to the
|
|
|
|
|
API for specifying the pickling protocol to picklers.)
|
2003-02-03 12:50:16 -05:00
|
|
|
|
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
|
|
|
|
Motivation
|
2017-03-27 15:18:45 -04:00
|
|
|
|
==========
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Pickling new-style objects causes serious pickle bloat. For
|
|
|
|
|
example::
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
class C(object): # Omit "(object)" for classic class
|
|
|
|
|
pass
|
|
|
|
|
x = C()
|
|
|
|
|
x.foo = 42
|
|
|
|
|
print len(pickle.dumps(x, 1))
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
The binary pickle for the classic object consumed 33 bytes, and for
|
|
|
|
|
the new-style object 86 bytes.
|
2003-02-06 15:29:21 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
The reasons for the bloat are complex, but are mostly caused by
|
|
|
|
|
the fact that new-style objects use ``__reduce__`` in order to be
|
|
|
|
|
picklable at all. After ample consideration we've concluded that
|
|
|
|
|
the only way to reduce pickle sizes for new-style objects is to
|
|
|
|
|
add new opcodes to the pickle protocol. The net result is that
|
|
|
|
|
with the new protocol, the pickle size in the above example is 35
|
|
|
|
|
(two extra bytes are used at the start to indicate the protocol
|
|
|
|
|
version, although this isn't strictly necessary).
|
2003-01-31 14:56:32 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Protocol versions
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
Previously, pickling (but not unpickling) distinguished between
|
|
|
|
|
text mode and binary mode. By design, binary mode is a
|
|
|
|
|
superset of text mode, and unpicklers don't need to know in
|
|
|
|
|
advance whether an incoming pickle uses text mode or binary mode.
|
|
|
|
|
The virtual machine used for unpickling is the same regardless of
|
|
|
|
|
the mode; certain opcodes simply aren't used in text mode.
|
|
|
|
|
|
|
|
|
|
Retroactively, text mode is now called protocol 0, and binary mode
|
|
|
|
|
protocol 1. The new protocol is called protocol 2. In the
|
|
|
|
|
tradition of pickling protocols, protocol 2 is a superset of
|
|
|
|
|
protocol 1. But just so that future pickling protocols aren't
|
|
|
|
|
required to be supersets of the oldest protocols, a new opcode is
|
|
|
|
|
inserted at the start of a protocol 2 pickle indicating that it is
|
|
|
|
|
using protocol 2. To date, each release of Python has been able to
|
|
|
|
|
read pickles written by all previous releases. Of course pickles
|
|
|
|
|
written under protocol *N* can't be read by versions of Python
|
|
|
|
|
earlier than the one that introduced protocol *N*.
|
|
|
|
|
|
|
|
|
|
Several functions, methods and constructors used for pickling used
|
|
|
|
|
to take a positional argument named 'bin' which was a flag,
|
|
|
|
|
defaulting to 0, indicating binary mode. This argument is renamed
|
|
|
|
|
to 'protocol' and now gives the protocol number, still defaulting
|
|
|
|
|
to 0.
|
|
|
|
|
|
|
|
|
|
It so happens that passing 2 for the 'bin' argument in previous
|
|
|
|
|
Python versions had the same effect as passing 1. Nevertheless, a
|
|
|
|
|
special case is added here: passing a negative number selects the
|
|
|
|
|
highest protocol version supported by a particular implementation.
|
|
|
|
|
This works in previous Python versions, too, and so can be used to
|
|
|
|
|
select the highest protocol available in a way that's both backward
|
|
|
|
|
and forward compatible. In addition, a new module constant
|
|
|
|
|
``HIGHEST_PROTOCOL`` is supplied by both ``pickle`` and ``cPickle``, equal to
|
|
|
|
|
the highest protocol number the module can read. This is cleaner
|
|
|
|
|
than passing -1, but cannot be used before Python 2.3.
|
|
|
|
|
|
|
|
|
|
The ``pickle.py`` module has supported passing the 'bin' value as a
|
|
|
|
|
keyword argument rather than a positional argument. (This is not
|
|
|
|
|
recommended, since ``cPickle`` only accepts positional arguments, but
|
|
|
|
|
it works...) Passing 'bin' as a keyword argument is deprecated,
|
|
|
|
|
and a ``PendingDeprecationWarning`` is issued in this case. You have
|
|
|
|
|
to invoke the Python interpreter with ``-Wa`` or a variation on that
|
|
|
|
|
to see ``PendingDeprecationWarning`` messages. In Python 2.4, the
|
|
|
|
|
warning class may be upgraded to ``DeprecationWarning``.
|
2003-02-03 12:50:16 -05:00
|
|
|
|
|
2003-01-31 16:13:18 -05:00
|
|
|
|
|
|
|
|
|
Security issues
|
2017-03-27 15:18:45 -04:00
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
In previous versions of Python, unpickling would do a "safety
|
|
|
|
|
check" on certain operations, refusing to call functions or
|
|
|
|
|
constructors that weren't marked as "safe for unpickling" by
|
|
|
|
|
either having an attribute ``__safe_for_unpickling__`` set to 1, or by
|
|
|
|
|
being registered in a global registry, ``copy_reg.safe_constructors``.
|
|
|
|
|
|
|
|
|
|
This feature gives a false sense of security: nobody has ever done
|
|
|
|
|
the necessary, extensive, code audit to prove that unpickling
|
|
|
|
|
untrusted pickles cannot invoke unwanted code, and in fact bugs in
|
|
|
|
|
the Python 2.2 ``pickle.py`` module make it easy to circumvent these
|
|
|
|
|
security measures.
|
|
|
|
|
|
|
|
|
|
We firmly believe that, on the Internet, it is better to know that
|
|
|
|
|
you are using an insecure protocol than to trust a protocol to be
|
|
|
|
|
secure whose implementation hasn't been thoroughly checked. Even
|
|
|
|
|
high quality implementations of widely used protocols are
|
|
|
|
|
routinely found flawed; Python's pickle implementation simply
|
|
|
|
|
cannot make such guarantees without a much larger time investment.
|
|
|
|
|
Therefore, as of Python 2.3, all safety checks on unpickling are
|
|
|
|
|
officially removed, and replaced with this warning:
|
|
|
|
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
|
|
|
|
Do not unpickle data received from an untrusted or
|
|
|
|
|
unauthenticated source.
|
|
|
|
|
|
|
|
|
|
The same warning applies to previous Python versions, despite the
|
|
|
|
|
presence of safety checks there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extended ``__reduce__`` API
|
|
|
|
|
===========================
|
|
|
|
|
|
|
|
|
|
There are several APIs that a class can use to control pickling.
|
|
|
|
|
Perhaps the most popular of these are ``__getstate__`` and
|
|
|
|
|
``__setstate__``; but the most powerful one is ``__reduce__``. (There's
|
|
|
|
|
also ``__getinitargs__``, and we're adding ``__getnewargs__`` below.)
|
|
|
|
|
|
|
|
|
|
There are several ways to provide ``__reduce__`` functionality: a
|
|
|
|
|
class can implement a ``__reduce__`` method or a ``__reduce_ex__`` method
|
|
|
|
|
(see next section), or a reduce function can be declared in
|
|
|
|
|
``copy_reg`` (``copy_reg.dispatch_table`` maps classes to functions). The
|
|
|
|
|
return values are interpreted exactly the same, though, and we'll
|
|
|
|
|
refer to these collectively as ``__reduce__``.
|
|
|
|
|
|
|
|
|
|
**Important:** pickling of classic class instances does not look for a
|
|
|
|
|
``__reduce__`` or ``__reduce_ex__`` method or a reduce function in the
|
|
|
|
|
``copy_reg`` dispatch table, so that a classic class cannot provide
|
|
|
|
|
``__reduce__`` functionality in the sense intended here. A classic
|
|
|
|
|
class must use ``__getinitargs__`` and/or ``__getstate__`` to customize
|
|
|
|
|
pickling. These are described below.
|
|
|
|
|
|
|
|
|
|
``__reduce__`` must return either a string or a tuple. If it returns
|
|
|
|
|
a string, this is an object whose state is not to be pickled, but
|
|
|
|
|
instead a reference to an equivalent object referenced by name.
|
|
|
|
|
Surprisingly, the string returned by ``__reduce__`` should be the
|
|
|
|
|
object's local name (relative to its module); the ``pickle`` module
|
|
|
|
|
searches the module namespace to determine the object's module.
|
|
|
|
|
|
|
|
|
|
The rest of this section is concerned with the tuple returned by
|
|
|
|
|
``__reduce__``. It is a variable size tuple, of length 2 through 5.
|
|
|
|
|
The first two items (function and arguments) are required. The
|
|
|
|
|
remaining items are optional and may be left off from the end;
|
|
|
|
|
giving ``None`` for the value of an optional item acts the same as
|
|
|
|
|
leaving it off. The last two items are new in this PEP. The items
|
|
|
|
|
are, in order:
|
|
|
|
|
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
| function | Required. |
|
|
|
|
|
| | |
|
|
|
|
|
| | A callable object (not necessarily a function) called |
|
|
|
|
|
| | to create the initial version of the object; state |
|
|
|
|
|
| | may be added to the object later to fully reconstruct |
|
|
|
|
|
| | the pickled state. This function must itself be |
|
|
|
|
|
| | picklable. See the section about ``__newobj__`` for a |
|
|
|
|
|
| | special case (new in this PEP) here. |
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
| arguments | Required. |
|
|
|
|
|
| | |
|
|
|
|
|
| | A tuple giving the argument list for the function. |
|
|
|
|
|
| | As a special case, designed for Zope 2's |
|
|
|
|
|
| | ``ExtensionClass``, this may be ``None``; in that case, |
|
|
|
|
|
| | function should be a class or type, and |
|
|
|
|
|
| | ``function.__basicnew__()`` is called to create the |
|
|
|
|
|
| | initial version of the object. This exception is |
|
|
|
|
|
| | deprecated. |
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
|
|
|
|
|
Unpickling invokes ``function(*arguments)`` to create an initial object,
|
|
|
|
|
called *obj* below. If the remaining items are left off, that's the end
|
|
|
|
|
of unpickling for this object and *obj* is the result. Else *obj* is
|
|
|
|
|
modified at unpickling time by each item specified, as follows.
|
|
|
|
|
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
| state | Optional. |
|
|
|
|
|
| | |
|
|
|
|
|
| | Additional state. If this is not ``None``, the state is |
|
|
|
|
|
| | pickled, and ``obj.__setstate__(state)`` will be called |
|
|
|
|
|
| | when unpickling. If no ``__setstate__`` method is |
|
|
|
|
|
| | defined, a default implementation is provided, which |
|
|
|
|
|
| | assumes that state is a dictionary mapping instance |
|
|
|
|
|
| | variable names to their values. The default |
|
|
|
|
|
| | implementation calls :: |
|
|
|
|
|
| | |
|
|
|
|
|
| | obj.__dict__.update(state) |
|
|
|
|
|
| | |
|
|
|
|
|
| | or, if the ``update()`` call fails, :: |
|
|
|
|
|
| | |
|
|
|
|
|
| | for k, v in state.items(): |
|
|
|
|
|
| | setattr(obj, k, v) |
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
| listitems | Optional, and new in this PEP. |
|
|
|
|
|
| | |
|
|
|
|
|
| | If this is not ``None``, it should be an iterator (not a |
|
|
|
|
|
| | sequence!) yielding successive list items. These list |
|
|
|
|
|
| | items will be pickled, and appended to the object using |
|
|
|
|
|
| | either ``obj.append(item)`` or ``obj.extend(list_of_items)``. |
|
|
|
|
|
| | This is primarily used for ``list`` subclasses, but may |
|
|
|
|
|
| | be used by other classes as long as they have ``append()`` |
|
|
|
|
|
| | and ``extend()`` methods with the appropriate signature. |
|
|
|
|
|
| | (Whether ``append()`` or ``extend()`` is used depends on which|
|
|
|
|
|
| | pickle protocol version is used as well as the number |
|
|
|
|
|
| | of items to append, so both must be supported.) |
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
| dictitems | Optional, and new in this PEP. |
|
|
|
|
|
| | |
|
|
|
|
|
| | If this is not ``None``, it should be an iterator (not a |
|
|
|
|
|
| | sequence!) yielding successive dictionary items, which |
|
|
|
|
|
| | should be tuples of the form ``(key, value)``. These items |
|
|
|
|
|
| | will be pickled, and stored to the object using |
|
|
|
|
|
| | ``obj[key] = value``. This is primarily used for ``dict`` |
|
|
|
|
|
| | subclasses, but may be used by other classes as long |
|
|
|
|
|
| | as they implement ``__setitem__``. |
|
|
|
|
|
+-----------+---------------------------------------------------------------+
|
|
|
|
|
|
|
|
|
|
Note: in Python 2.2 and before, when using ``cPickle``, state would be
|
|
|
|
|
pickled if present even if it is ``None``; the only safe way to avoid
|
|
|
|
|
the ``__setstate__`` call was to return a two-tuple from ``__reduce__``.
|
|
|
|
|
(But ``pickle.py`` would not pickle state if it was ``None``.) In Python
|
|
|
|
|
2.3, ``__setstate__`` will never be called at unpickling time when
|
|
|
|
|
``__reduce__`` returns a state with value ``None`` at pickling time.
|
|
|
|
|
|
|
|
|
|
A ``__reduce__`` implementation that needs to work both under Python
|
|
|
|
|
2.2 and under Python 2.3 could check the variable
|
|
|
|
|
``pickle.format_version`` to determine whether to use the *listitems*
|
|
|
|
|
and *dictitems* features. If this value is ``>= "2.0"`` then they are
|
|
|
|
|
supported. If not, any list or dict items should be incorporated
|
|
|
|
|
somehow in the 'state' return value, and the ``__setstate__`` method
|
|
|
|
|
should be prepared to accept list or dict items as part of the
|
|
|
|
|
state (how this is done is up to the application).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The ``__reduce_ex__`` API
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
It is sometimes useful to know the protocol version when
|
|
|
|
|
implementing ``__reduce__``. This can be done by implementing a
|
|
|
|
|
method named ``__reduce_ex__`` instead of ``__reduce__``. ``__reduce_ex__``,
|
|
|
|
|
when it exists, is called in preference over ``__reduce__`` (you may
|
|
|
|
|
still provide ``__reduce__`` for backwards compatibility). The
|
|
|
|
|
``__reduce_ex__`` method will be called with a single integer
|
|
|
|
|
argument, the protocol version.
|
|
|
|
|
|
|
|
|
|
The 'object' class implements both ``__reduce__`` and ``__reduce_ex__``;
|
|
|
|
|
however, if a subclass overrides ``__reduce__`` but not ``__reduce_ex__``,
|
|
|
|
|
the ``__reduce_ex__`` implementation detects this and calls
|
|
|
|
|
``__reduce__``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Customizing pickling absent a ``__reduce__`` implementation
|
|
|
|
|
===========================================================
|
|
|
|
|
|
|
|
|
|
If no ``__reduce__`` implementation is available for a particular
|
|
|
|
|
class, there are three cases that need to be considered
|
|
|
|
|
separately, because they are handled differently:
|
|
|
|
|
|
|
|
|
|
1. classic class instances, all protocols
|
|
|
|
|
|
|
|
|
|
2. new-style class instances, protocols 0 and 1
|
|
|
|
|
|
|
|
|
|
3. new-style class instances, protocol 2
|
|
|
|
|
|
|
|
|
|
Types implemented in C are considered new-style classes. However,
|
|
|
|
|
except for the common built-in types, these need to provide a
|
|
|
|
|
``__reduce__`` implementation in order to be picklable with protocols
|
|
|
|
|
0 or 1. Protocol 2 supports built-in types providing
|
|
|
|
|
``__getnewargs__``, ``__getstate__`` and ``__setstate__`` as well.
|
2003-02-03 15:22:23 -05:00
|
|
|
|
|
2003-02-01 15:10:35 -05:00
|
|
|
|
|
2003-02-04 12:53:55 -05:00
|
|
|
|
Case 1: pickling classic class instances
|
2017-03-27 15:18:45 -04:00
|
|
|
|
----------------------------------------
|
|
|
|
|
|
|
|
|
|
This case is the same for all protocols, and is unchanged from
|
|
|
|
|
Python 2.1.
|
|
|
|
|
|
|
|
|
|
For classic classes, ``__reduce__`` is not used. Instead, classic
|
|
|
|
|
classes can customize their pickling by providing methods named
|
|
|
|
|
``__getstate__``, ``__setstate__`` and ``__getinitargs__``. Absent these, a
|
|
|
|
|
default pickling strategy for classic class instances is
|
|
|
|
|
implemented that works as long as all instance variables are
|
|
|
|
|
picklable. This default strategy is documented in terms of
|
|
|
|
|
default implementations of ``__getstate__`` and ``__setstate__``.
|
|
|
|
|
|
|
|
|
|
The primary ways to customize pickling of classic class instances
|
|
|
|
|
is by specifying ``__getstate__`` and/or ``__setstate__`` methods. It is
|
|
|
|
|
fine if a class implements one of these but not the other, as long
|
|
|
|
|
as it is compatible with the default version.
|
|
|
|
|
|
|
|
|
|
The ``__getstate__`` method
|
|
|
|
|
'''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
The ``__getstate__`` method should return a picklable value
|
|
|
|
|
representing the object's state without referencing the object
|
|
|
|
|
itself. If no ``__getstate__`` method exists, a default
|
|
|
|
|
implementation is used that returns ``self.__dict__``.
|
|
|
|
|
|
|
|
|
|
The ``__setstate__`` method
|
|
|
|
|
'''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
The ``__setstate__`` method should take one argument; it will be
|
|
|
|
|
called with the value returned by ``__getstate__`` (or its default
|
|
|
|
|
implementation).
|
|
|
|
|
|
|
|
|
|
If no ``__setstate__`` method exists, a default implementation is
|
|
|
|
|
provided that assumes the state is a dictionary mapping instance
|
|
|
|
|
variable names to values. The default implementation tries two
|
|
|
|
|
things:
|
|
|
|
|
|
|
|
|
|
- First, it tries to call ``self.__dict__.update(state)``.
|
|
|
|
|
|
|
|
|
|
- If the ``update()`` call fails with a ``RuntimeError`` exception, it
|
|
|
|
|
calls ``setattr(self, key, value)`` for each ``(key, value)`` pair in
|
|
|
|
|
the state dictionary. This only happens when unpickling in
|
|
|
|
|
restricted execution mode (see the ``rexec`` standard library
|
|
|
|
|
module).
|
|
|
|
|
|
|
|
|
|
The ``__getinitargs__`` method
|
|
|
|
|
''''''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
The ``__setstate__`` method (or its default implementation) requires
|
|
|
|
|
that a new object already exists so that its ``__setstate__`` method
|
|
|
|
|
can be called. The point is to create a new object that isn't
|
|
|
|
|
fully initialized; in particular, the class's ``__init__`` method
|
|
|
|
|
should not be called if possible.
|
|
|
|
|
|
|
|
|
|
These are the possibilities:
|
|
|
|
|
|
|
|
|
|
- Normally, the following trick is used: create an instance of a
|
|
|
|
|
trivial classic class (one without any methods or instance
|
|
|
|
|
variables) and then use ``__class__`` assignment to change its
|
|
|
|
|
class to the desired class. This creates an instance of the
|
|
|
|
|
desired class with an empty ``__dict__`` whose ``__init__`` has not
|
|
|
|
|
been called.
|
|
|
|
|
|
|
|
|
|
- However, if the class has a method named ``__getinitargs__``, the
|
|
|
|
|
above trick is not used, and a class instance is created by
|
|
|
|
|
using the tuple returned by ``__getinitargs__`` as an argument
|
|
|
|
|
list to the class constructor. This is done even if
|
|
|
|
|
``__getinitargs__`` returns an empty tuple --- a ``__getinitargs__``
|
|
|
|
|
method that returns ``()`` is not equivalent to not having
|
|
|
|
|
``__getinitargs__`` at all. ``__getinitargs__`` *must* return a
|
|
|
|
|
tuple.
|
|
|
|
|
|
|
|
|
|
- In restricted execution mode, the trick from the first bullet
|
|
|
|
|
doesn't work; in this case, the class constructor is called
|
|
|
|
|
with an empty argument list if no ``__getinitargs__`` method
|
|
|
|
|
exists. This means that in order for a classic class to be
|
|
|
|
|
unpicklable in restricted execution mode, it must either
|
|
|
|
|
implement ``__getinitargs__`` or its constructor (i.e., its
|
|
|
|
|
``__init__`` method) must be callable without arguments.
|
2003-02-03 15:22:23 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-04 12:53:55 -05:00
|
|
|
|
Case 2: pickling new-style class instances using protocols 0 or 1
|
2017-03-27 15:18:45 -04:00
|
|
|
|
-----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
This case is unchanged from Python 2.2. For better pickling of
|
|
|
|
|
new-style class instances when backwards compatibility is not an
|
|
|
|
|
issue, protocol 2 should be used; see case 3 below.
|
|
|
|
|
|
|
|
|
|
New-style classes, whether implemented in C or in Python, inherit
|
|
|
|
|
a default ``__reduce__`` implementation from the universal base class
|
|
|
|
|
'object'.
|
|
|
|
|
|
|
|
|
|
This default ``__reduce__`` implementation is not used for those
|
|
|
|
|
built-in types for which the ``pickle`` module has built-in support.
|
|
|
|
|
Here's a full list of those types:
|
|
|
|
|
|
|
|
|
|
- Concrete built-in types: ``NoneType``, ``bool``, ``int``, ``float``, ``complex``,
|
|
|
|
|
``str``, ``unicode``, ``tuple``, ``list``, ``dict``. (Complex is supported by
|
|
|
|
|
virtue of a ``__reduce__`` implementation registered in ``copy_reg``.)
|
|
|
|
|
In Jython, ``PyStringMap`` is also included in this list.
|
|
|
|
|
|
|
|
|
|
- Classic instances.
|
|
|
|
|
|
|
|
|
|
- Classic class objects, Python function objects, built-in
|
|
|
|
|
function and method objects, and new-style type objects (==
|
|
|
|
|
new-style class objects). These are pickled by name, not by
|
|
|
|
|
value: at unpickling time, a reference to an object with the
|
|
|
|
|
same name (the fully qualified module name plus the variable
|
|
|
|
|
name in that module) is substituted.
|
|
|
|
|
|
|
|
|
|
The default ``__reduce__`` implementation will fail at pickling time
|
|
|
|
|
for built-in types not mentioned above, and for new-style classes
|
|
|
|
|
implemented in C: if they want to be picklable, they must supply
|
|
|
|
|
a custom ``__reduce__`` implementation under protocols 0 and 1.
|
|
|
|
|
|
|
|
|
|
For new-style classes implemented in Python, the default
|
|
|
|
|
``__reduce__`` implementation (``copy_reg._reduce``) works as follows:
|
|
|
|
|
|
|
|
|
|
Let ``D`` be the class on the object to be pickled. First, find the
|
|
|
|
|
nearest base class that is implemented in C (either as a
|
|
|
|
|
built-in type or as a type defined by an extension class). Call
|
|
|
|
|
this base class ``B``, and the class of the object to be pickled ``D``.
|
|
|
|
|
Unless ``B`` is the class 'object', instances of class ``B`` must be
|
|
|
|
|
picklable, either by having built-in support (as defined in the
|
|
|
|
|
above three bullet points), or by having a non-default
|
|
|
|
|
``__reduce__`` implementation. ``B`` must not be the same class as ``D``
|
|
|
|
|
(if it were, it would mean that ``D`` is not implemented in Python).
|
|
|
|
|
|
|
|
|
|
The callable produced by the default ``__reduce__`` is
|
|
|
|
|
``copy_reg._reconstructor``, and its arguments tuple is
|
|
|
|
|
``(D, B, basestate)``, where ``basestate`` is ``None`` if ``B`` is the builtin
|
|
|
|
|
object class, and ``basestate`` is ::
|
|
|
|
|
|
|
|
|
|
basestate = B(obj)
|
|
|
|
|
|
|
|
|
|
if ``B`` is not the builtin object class. This is geared toward
|
|
|
|
|
pickling subclasses of builtin types, where, for example,
|
|
|
|
|
``list(some_list_subclass_instance)`` produces "the list part" of
|
|
|
|
|
the ``list`` subclass instance.
|
|
|
|
|
|
|
|
|
|
The object is recreated at unpickling time by
|
|
|
|
|
``copy_reg._reconstructor``, like so::
|
|
|
|
|
|
|
|
|
|
obj = B.__new__(D, basestate)
|
|
|
|
|
B.__init__(obj, basestate)
|
|
|
|
|
|
|
|
|
|
Objects using the default ``__reduce__`` implementation can customize
|
|
|
|
|
it by defining ``__getstate__`` and/or ``__setstate__`` methods. These
|
|
|
|
|
work almost the same as described for classic classes above, except
|
|
|
|
|
that if ``__getstate__`` returns an object (of any type) whose value is
|
|
|
|
|
considered false (e.g. ``None``, or a number that is zero, or an empty
|
|
|
|
|
sequence or mapping), this state is not pickled and ``__setstate__``
|
|
|
|
|
will not be called at all. If ``__getstate__`` exists and returns a
|
|
|
|
|
true value, that value becomes the third element of the tuple
|
|
|
|
|
returned by the default ``__reduce__``, and at unpickling time the
|
|
|
|
|
value is passed to ``__setstate__``. If ``__getstate__`` does not exist,
|
|
|
|
|
but ``obj.__dict__`` exists, then ``obj.__dict__`` becomes the third
|
|
|
|
|
element of the tuple returned by ``__reduce__``, and again at
|
|
|
|
|
unpickling time the value is passed to ``obj.__setstate__``. The
|
|
|
|
|
default ``__setstate__`` is the same as that for classic classes,
|
|
|
|
|
described above.
|
|
|
|
|
|
|
|
|
|
Note that this strategy ignores slots. Instances of new-style
|
|
|
|
|
classes that have slots but no ``__getstate__`` method cannot be
|
|
|
|
|
pickled by protocols 0 and 1; the code explicitly checks for
|
|
|
|
|
this condition.
|
|
|
|
|
|
|
|
|
|
Note that pickling new-style class instances ignores ``__getinitargs__``
|
|
|
|
|
if it exists (and under all protocols). ``__getinitargs__`` is
|
|
|
|
|
useful only for classic classes.
|
2003-02-10 23:50:59 -05:00
|
|
|
|
|
2003-02-03 15:22:23 -05:00
|
|
|
|
|
2003-02-04 12:53:55 -05:00
|
|
|
|
Case 3: pickling new-style class instances using protocol 2
|
2017-03-27 15:18:45 -04:00
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Under protocol 2, the default ``__reduce__`` implementation inherited
|
|
|
|
|
from the 'object' base class is *ignored*. Instead, a different
|
|
|
|
|
default implementation is used, which allows more efficient
|
|
|
|
|
pickling of new-style class instances than possible with protocols
|
|
|
|
|
0 or 1, at the cost of backward incompatibility with Python 2.2
|
|
|
|
|
(meaning no more than that a protocol 2 pickle cannot be unpickled
|
|
|
|
|
before Python 2.3).
|
|
|
|
|
|
|
|
|
|
The customization uses three special methods: ``__getstate__``,
|
|
|
|
|
``__setstate__`` and ``__getnewargs__`` (note that ``__getinitargs__`` is again
|
|
|
|
|
ignored). It is fine if a class implements one or more but not all
|
|
|
|
|
of these, as long as it is compatible with the default
|
|
|
|
|
implementations.
|
|
|
|
|
|
|
|
|
|
The ``__getstate__`` method
|
|
|
|
|
'''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
The ``__getstate__`` method should return a picklable value
|
|
|
|
|
representing the object's state without referencing the object
|
|
|
|
|
itself. If no ``__getstate__`` method exists, a default
|
|
|
|
|
implementation is used which is described below.
|
|
|
|
|
|
|
|
|
|
There's a subtle difference between classic and new-style
|
|
|
|
|
classes here: if a classic class's ``__getstate__`` returns ``None``,
|
|
|
|
|
``self.__setstate__(None)`` will be called as part of unpickling.
|
|
|
|
|
But if a new-style class's ``__getstate__`` returns ``None``, its
|
|
|
|
|
``__setstate__`` won't be called at all as part of unpickling.
|
|
|
|
|
|
|
|
|
|
If no ``__getstate__`` method exists, a default state is computed.
|
|
|
|
|
There are several cases:
|
|
|
|
|
|
|
|
|
|
- For a new-style class that has no instance ``__dict__`` and no
|
|
|
|
|
``__slots__``, the default state is ``None``.
|
|
|
|
|
|
|
|
|
|
- For a new-style class that has an instance ``__dict__`` and no
|
|
|
|
|
``__slots__``, the default state is ``self.__dict__``.
|
|
|
|
|
|
|
|
|
|
- For a new-style class that has an instance ``__dict__`` and
|
|
|
|
|
``__slots__``, the default state is a tuple consisting of two
|
|
|
|
|
dictionaries: ``self.__dict__``, and a dictionary mapping slot
|
|
|
|
|
names to slot values. Only slots that have a value are
|
|
|
|
|
included in the latter.
|
|
|
|
|
|
|
|
|
|
- For a new-style class that has ``__slots__`` and no instance
|
|
|
|
|
``__dict__``, the default state is a tuple whose first item is
|
|
|
|
|
``None`` and whose second item is a dictionary mapping slot names
|
|
|
|
|
to slot values described in the previous bullet.
|
|
|
|
|
|
|
|
|
|
The ``__setstate__`` method
|
|
|
|
|
'''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
The ``__setstate__`` method should take one argument; it will be
|
|
|
|
|
called with the value returned by ``__getstate__`` or with the
|
|
|
|
|
default state described above if no ``__getstate__`` method is
|
|
|
|
|
defined.
|
|
|
|
|
|
|
|
|
|
If no ``__setstate__`` method exists, a default implementation is
|
|
|
|
|
provided that can handle the state returned by the default
|
|
|
|
|
``__getstate__``, described above.
|
|
|
|
|
|
|
|
|
|
The ``__getnewargs__`` method
|
|
|
|
|
'''''''''''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
Like for classic classes, the ``__setstate__`` method (or its
|
|
|
|
|
default implementation) requires that a new object already
|
|
|
|
|
exists so that its ``__setstate__`` method can be called.
|
|
|
|
|
|
|
|
|
|
In protocol 2, a new pickling opcode is used that causes a new
|
|
|
|
|
object to be created as follows::
|
|
|
|
|
|
|
|
|
|
obj = C.__new__(C, *args)
|
|
|
|
|
|
|
|
|
|
where ``C`` is the class of the pickled object, and ``args`` is either
|
|
|
|
|
the empty tuple, or the tuple returned by the ``__getnewargs__``
|
|
|
|
|
method, if defined. ``__getnewargs__`` must return a tuple. The
|
|
|
|
|
absence of a ``__getnewargs__`` method is equivalent to the existence
|
|
|
|
|
of one that returns ``()``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The ``__newobj__`` unpickling function
|
|
|
|
|
======================================
|
|
|
|
|
|
|
|
|
|
When the unpickling function returned by ``__reduce__`` (the first
|
|
|
|
|
item of the returned tuple) has the name ``__newobj__``, something
|
|
|
|
|
special happens for pickle protocol 2. An unpickling function
|
|
|
|
|
named ``__newobj__`` is assumed to have the following semantics::
|
|
|
|
|
|
|
|
|
|
def __newobj__(cls, *args):
|
|
|
|
|
return cls.__new__(cls, *args)
|
|
|
|
|
|
|
|
|
|
Pickle protocol 2 special-cases an unpickling function with this
|
|
|
|
|
name, and emits a pickling opcode that, given 'cls' and 'args',
|
|
|
|
|
will return ``cls.__new__(cls, *args)`` without also pickling a
|
|
|
|
|
reference to ``__newobj__`` (this is the same pickling opcode used by
|
|
|
|
|
protocol 2 for a new-style class instance when no ``__reduce__``
|
|
|
|
|
implementation exists). This is the main reason why protocol 2
|
|
|
|
|
pickles are much smaller than classic pickles. Of course, the
|
|
|
|
|
pickling code cannot verify that a function named ``__newobj__``
|
|
|
|
|
actually has the expected semantics. If you use an unpickling
|
|
|
|
|
function named ``__newobj__`` that returns something different, you
|
|
|
|
|
deserve what you get.
|
|
|
|
|
|
|
|
|
|
It is safe to use this feature under Python 2.2; there's nothing
|
|
|
|
|
in the recommended implementation of ``__newobj__`` that depends on
|
|
|
|
|
Python 2.3.
|
2003-02-03 15:22:23 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-04 14:12:25 -05:00
|
|
|
|
The extension registry
|
2017-03-27 15:18:45 -04:00
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Protocol 2 supports a new mechanism to reduce the size of pickles.
|
|
|
|
|
|
|
|
|
|
When class instances (classic or new-style) are pickled, the full
|
|
|
|
|
name of the class (module name including package name, and class
|
|
|
|
|
name) is included in the pickle. Especially for applications that
|
|
|
|
|
generate many small pickles, this is a lot of overhead that has to
|
|
|
|
|
be repeated in each pickle. For large pickles, when using
|
|
|
|
|
protocol 1, repeated references to the same class name are
|
|
|
|
|
compressed using the "memo" feature; but each class name must be
|
|
|
|
|
spelled in full at least once per pickle, and this causes a lot of
|
|
|
|
|
overhead for small pickles.
|
|
|
|
|
|
|
|
|
|
The extension registry allows one to represent the most frequently
|
|
|
|
|
used names by small integers, which are pickled very efficiently:
|
|
|
|
|
an extension code in the range 1--255 requires only two bytes
|
|
|
|
|
including the opcode, one in the range 256--65535 requires only
|
|
|
|
|
three bytes including the opcode.
|
|
|
|
|
|
|
|
|
|
One of the design goals of the pickle protocol is to make pickles
|
|
|
|
|
"context-free": as long as you have installed the modules
|
|
|
|
|
containing the classes referenced by a pickle, you can unpickle
|
|
|
|
|
it, without needing to import any of those classes ahead of time.
|
|
|
|
|
|
|
|
|
|
Unbridled use of extension codes could jeopardize this desirable
|
|
|
|
|
property of pickles. Therefore, the main use of extension codes
|
|
|
|
|
is reserved for a set of codes to be standardized by some
|
|
|
|
|
standard-setting body. This being Python, the standard-setting
|
|
|
|
|
body is the PSF. From time to time, the PSF will decide on a
|
|
|
|
|
table mapping extension codes to class names (or occasionally
|
|
|
|
|
names of other global objects; functions are also eligible). This
|
|
|
|
|
table will be incorporated in the next Python release(s).
|
|
|
|
|
|
|
|
|
|
However, for some applications, like Zope, context-free pickles
|
|
|
|
|
are not a requirement, and waiting for the PSF to standardize
|
|
|
|
|
some codes may not be practical. Two solutions are offered for
|
|
|
|
|
such applications.
|
|
|
|
|
|
|
|
|
|
First, a few ranges of extension codes are reserved for private
|
|
|
|
|
use. Any application can register codes in these ranges.
|
|
|
|
|
Two applications exchanging pickles using codes in these ranges
|
|
|
|
|
need to have some out-of-band mechanism to agree on the mapping
|
|
|
|
|
between extension codes and names.
|
|
|
|
|
|
|
|
|
|
Second, some large Python projects (e.g. Zope) can be assigned a
|
|
|
|
|
range of extension codes outside the "private use" range that they
|
|
|
|
|
can assign as they see fit.
|
|
|
|
|
|
|
|
|
|
The extension registry is defined as a mapping between extension
|
|
|
|
|
codes and names. When an extension code is unpickled, it ends up
|
|
|
|
|
producing an object, but this object is gotten by interpreting the
|
|
|
|
|
name as a module name followed by a class (or function) name. The
|
|
|
|
|
mapping from names to objects is cached. It is quite possible
|
|
|
|
|
that certain names cannot be imported; that should not be a
|
|
|
|
|
problem as long as no pickle containing a reference to such names
|
|
|
|
|
has to be unpickled. (The same issue already exists for direct
|
|
|
|
|
references to such names in pickles that use protocols 0 or 1.)
|
|
|
|
|
|
|
|
|
|
Here is the proposed initial assignment of extension code ranges:
|
|
|
|
|
|
|
|
|
|
===== ===== ===== =================================================
|
|
|
|
|
First Last Count Purpose
|
|
|
|
|
===== ===== ===== =================================================
|
|
|
|
|
0 0 1 Reserved --- will never be used
|
|
|
|
|
1 127 127 Reserved for Python standard library
|
|
|
|
|
128 191 64 Reserved for Zope
|
|
|
|
|
192 239 48 Reserved for 3rd parties
|
|
|
|
|
240 255 16 Reserved for private use (will never be assigned)
|
|
|
|
|
256 *MAX* *MAX* Reserved for future assignment
|
|
|
|
|
===== ===== ===== =================================================
|
|
|
|
|
|
|
|
|
|
*MAX* stands for 2147483647, or ``2**31-1``. This is a hard limitation
|
|
|
|
|
of the protocol as currently defined.
|
|
|
|
|
|
|
|
|
|
At the moment, no specific extension codes have been assigned yet.
|
2003-02-04 14:12:25 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-04 14:28:16 -05:00
|
|
|
|
Extension registry API
|
2017-03-27 15:18:45 -04:00
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
The extension registry is maintained as private global variables
|
|
|
|
|
in the ``copy_reg`` module. The following three functions are defined
|
|
|
|
|
in this module to manipulate the registry:
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
``add_extension(module, name, code)``
|
|
|
|
|
Register an extension code. The *module* and *name* arguments
|
|
|
|
|
must be strings; *code* must be an ``int`` in the inclusive range 1
|
|
|
|
|
through *MAX*. This must either register a new ``(module, name)``
|
|
|
|
|
pair to a new code, or be a redundant repeat of a previous
|
|
|
|
|
call that was not canceled by a ``remove_extension()`` call; a
|
|
|
|
|
``(module, name)`` pair may not be mapped to more than one code,
|
|
|
|
|
nor may a code be mapped to more than one ``(module, name)``
|
|
|
|
|
pair.
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
.. XXX Aliasing may actually cause a problem for this
|
|
|
|
|
requirement; we'll see as we go.
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
``remove_extension(module, name, code)``
|
|
|
|
|
Arguments are as for ``add_extension()``. Remove a previously
|
|
|
|
|
registered mapping between ``(module, name)`` and *code*.
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
``clear_extension_cache()``
|
|
|
|
|
The implementation of extension codes may use a cache to speed
|
|
|
|
|
up loading objects that are named frequently. This cache can
|
|
|
|
|
be emptied (removing references to cached objects) by calling
|
|
|
|
|
this method.
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Note that the API does not enforce the standard range assignments.
|
|
|
|
|
It is up to applications to respect these.
|
2003-02-04 14:28:16 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-07 13:11:29 -05:00
|
|
|
|
The copy module
|
2017-03-27 15:18:45 -04:00
|
|
|
|
===============
|
2003-02-01 15:10:35 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Traditionally, the ``copy`` module has supported an extended subset of
|
|
|
|
|
the pickling APIs for customizing the ``copy()`` and ``deepcopy()``
|
|
|
|
|
operations.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
In particular, besides checking for a ``__copy__`` or ``__deepcopy__``
|
|
|
|
|
method, ``copy()`` and ``deepcopy()`` have always looked for ``__reduce__``,
|
|
|
|
|
and for classic classes, have looked for ``__getinitargs__``,
|
|
|
|
|
``__getstate__`` and ``__setstate__``.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
In Python 2.2, the default ``__reduce__`` inherited from 'object' made
|
|
|
|
|
copying simple new-style classes possible, but slots and various
|
|
|
|
|
other special cases were not covered.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
In Python 2.3, several changes are made to the ``copy`` module:
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
- ``__reduce_ex__`` is supported (and always called with 2 as the
|
|
|
|
|
protocol version argument).
|
2003-02-18 20:59:59 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
- The four- and five-argument return values of ``__reduce__`` are
|
|
|
|
|
supported.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
- Before looking for a ``__reduce__`` method, the
|
|
|
|
|
``copy_reg.dispatch_table`` is consulted, just like for pickling.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
- When the ``__reduce__`` method is inherited from object, it is
|
|
|
|
|
(unconditionally) replaced by a better one that uses the same
|
|
|
|
|
APIs as pickle protocol 2: ``__getnewargs__``, ``__getstate__``, and
|
|
|
|
|
``__setstate__``, handling ``list`` and ``dict`` subclasses, and handling
|
|
|
|
|
slots.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
As a consequence of the latter change, certain new-style classes
|
|
|
|
|
that were copyable under Python 2.2 are not copyable under Python
|
|
|
|
|
2.3. (These classes are also not picklable using pickle protocol
|
|
|
|
|
2.) A minimal example of such a class::
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
class C(object):
|
|
|
|
|
def __new__(cls, a):
|
|
|
|
|
return object.__new__(cls)
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
The problem only occurs when ``__new__`` is overridden and has at
|
|
|
|
|
least one mandatory argument in addition to the class argument.
|
2003-02-07 13:11:29 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
To fix this, a ``__getnewargs__`` method should be added that returns
|
|
|
|
|
the appropriate argument tuple (excluding the class).
|
2003-02-01 15:10:35 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-10 18:21:03 -05:00
|
|
|
|
Pickling Python longs
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=====================
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Pickling and unpickling Python longs takes time quadratic in
|
|
|
|
|
the number of digits, in protocols 0 and 1. Under protocol 2,
|
|
|
|
|
new opcodes support linear-time pickling and unpickling of longs.
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pickling bools
|
2017-03-27 15:18:45 -04:00
|
|
|
|
==============
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Protocol 2 introduces new opcodes for pickling ``True`` and ``False``
|
|
|
|
|
directly. Under protocols 0 and 1, bools are pickled as integers,
|
|
|
|
|
using a trick in the representation of the integer in the pickle
|
|
|
|
|
so that an unpickler can recognize that a bool was intended. That
|
|
|
|
|
trick consumed 4 bytes per bool pickled. The new bool opcodes
|
|
|
|
|
consume 1 byte per bool.
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pickling small tuples
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=====================
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Protocol 2 introduces new opcodes for more-compact pickling of
|
|
|
|
|
tuples of lengths 1, 2 and 3. Protocol 1 previously introduced
|
|
|
|
|
an opcode for more-compact pickling of empty tuples.
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Protocol identification
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=======================
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Protocol 2 introduces a new opcode, with which all protocol 2
|
|
|
|
|
pickles begin, identifying that the pickle is protocol 2.
|
|
|
|
|
Attempting to unpickle a protocol 2 pickle under older versions
|
|
|
|
|
of Python will therefore raise an "unknown opcode" exception
|
|
|
|
|
immediately.
|
2003-02-10 18:21:03 -05:00
|
|
|
|
|
|
|
|
|
|
2003-02-11 16:23:59 -05:00
|
|
|
|
Pickling of large lists and dicts
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=================================
|
2003-02-11 16:23:59 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
Protocol 1 pickles large lists and dicts "in one piece", which
|
|
|
|
|
minimizes pickle size, but requires that unpickling create a temp
|
|
|
|
|
object as large as the object being unpickled. Part of the
|
|
|
|
|
protocol 2 changes break large lists and dicts into pieces of no
|
|
|
|
|
more than 1000 elements each, so that unpickling needn't create
|
|
|
|
|
a temp object larger than needed to hold 1000 elements. This
|
|
|
|
|
isn't part of protocol 2, however: the opcodes produced are still
|
|
|
|
|
part of protocol 1. ``__reduce__`` implementations that return the
|
|
|
|
|
optional new listitems or dictitems iterators also benefit from
|
|
|
|
|
this unpickling temp-space optimization.
|
2003-02-11 16:23:59 -05:00
|
|
|
|
|
|
|
|
|
|
2003-01-31 14:12:53 -05:00
|
|
|
|
Copyright
|
2017-03-27 15:18:45 -04:00
|
|
|
|
=========
|
2003-01-31 14:12:53 -05:00
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
This document has been placed in the public domain.
|
2003-01-31 14:12:53 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2017-03-27 15:18:45 -04:00
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|