Extend intro; add motivation and a section on protocol versions.
This commit is contained in:
parent
f739e21311
commit
584036ddd6
63
pep-0307.txt
63
pep-0307.txt
|
@ -15,9 +15,68 @@ Introduction
|
|||
Pickling new-style objects in Python 2.2 is done somewhat clumsily
|
||||
and causes pickle size to bloat compared to classic class
|
||||
instances. This PEP documents a new pickle protocol that takes
|
||||
care of this and many other pickling issues.
|
||||
care of this and many other pickle issues.
|
||||
|
||||
(XXX The rest of this PEP is TBD.)
|
||||
There are two sides to specifying a new pickle protocol: the byte
|
||||
stream constituting pickled data must be specified, and the
|
||||
interface between objects and the pickling and unpickling engines
|
||||
must be specified. This PEP focuses on API issues, although it
|
||||
may occasionally touch on byte stream format details to motivate a
|
||||
choice. The pickle byte stream format is documented formally by
|
||||
the standard library module pickletools.py (already checked into
|
||||
CVS for Python 2.3).
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
Pickling new-style objects causes serious pickle bloat. For
|
||||
example, the binary pickle for a classic object with one instance
|
||||
variable takes up 33 bytes; a new-style object with one instance
|
||||
variable takes up 86 bytes. This was measured as follows:
|
||||
|
||||
class C(object): # Omit "(object)" for classic class
|
||||
pass
|
||||
x = C()
|
||||
x.foo = 42
|
||||
print len(pickle.dumps(x, 1))
|
||||
|
||||
The reasons for the bloat are complex, but are mostly caused by
|
||||
the fact that new-style objects use __reduce__ in order to be
|
||||
picklable at all. After ample consideration we've concluded that
|
||||
the only way to reduce pickle sizes for new-style objects is to
|
||||
add new opcodes to the pickle protocol. The net result is that
|
||||
with the new protocol, the pickle size in the above example is 35
|
||||
(two extra bytes are used at the start to indicate the protocol
|
||||
version, although this isn't strictly necessary).
|
||||
|
||||
|
||||
Protocol versions
|
||||
|
||||
Previously, pickling (but not unpickling) has distinguished
|
||||
between text mode and binary mode. By design, text mode is a
|
||||
subset of binary mode, and unpicklers don't need to know in
|
||||
advance whether an incoming pickle uses text mode or binary mode.
|
||||
The virtual machine used for unpickling is the same regardless of
|
||||
the mode; certain opcode simply aren't used in text mode.
|
||||
|
||||
Retroactively, text mode is called protocol 0, and binary mode is
|
||||
called protocol 1. The new protocol is called protocol 2. In the
|
||||
tradition of pickling protocols, protocol 2 is a superset of
|
||||
protocol 1. But just so that future pickling protocols aren't
|
||||
required to be supersets of the oldest protocols, a new opcode is
|
||||
inserted at the start of a protocol 2 pickle indicating that it is
|
||||
using protocol 2.
|
||||
|
||||
Several functions, methods and constructors used for pickling used
|
||||
to take a positional argument named 'bin' which was a flag,
|
||||
defaulting to 0, indicating binary mode. This argument is renamed
|
||||
to 'proto' and now gives the protocol number, defaulting to 0.
|
||||
|
||||
It so happens that passing 2 for the 'bin' argument in previous
|
||||
Python versions had the same effect as passing 1. Nevertheless, a
|
||||
special case is added here: passing a negative number selects the
|
||||
highest protocol version supported by a particular
|
||||
implementation. This works in previous Python versions, too.
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue