Extend intro; add motivation and a section on protocol versions.

This commit is contained in:
Guido van Rossum 2003-01-31 19:56:32 +00:00
parent f739e21311
commit 584036ddd6
1 changed files with 61 additions and 2 deletions

View File

@ -15,9 +15,68 @@ Introduction
Pickling new-style objects in Python 2.2 is done somewhat clumsily
and causes pickle size to bloat compared to classic class
instances. This PEP documents a new pickle protocol that takes
care of this and many other pickling issues.
care of this and many other pickle issues.
(XXX The rest of this PEP is TBD.)
There are two sides to specifying a new pickle protocol: the byte
stream constituting pickled data must be specified, and the
interface between objects and the pickling and unpickling engines
must be specified. This PEP focuses on API issues, although it
may occasionally touch on byte stream format details to motivate a
choice. The pickle byte stream format is documented formally by
the standard library module pickletools.py (already checked into
CVS for Python 2.3).
Motivation
Pickling new-style objects causes serious pickle bloat. For
example, the binary pickle for a classic object with one instance
variable takes up 33 bytes; a new-style object with one instance
variable takes up 86 bytes. This was measured as follows:
class C(object): # Omit "(object)" for classic class
pass
x = C()
x.foo = 42
print len(pickle.dumps(x, 1))
The reasons for the bloat are complex, but are mostly caused by
the fact that new-style objects use __reduce__ in order to be
picklable at all. After ample consideration we've concluded that
the only way to reduce pickle sizes for new-style objects is to
add new opcodes to the pickle protocol. The net result is that
with the new protocol, the pickle size in the above example is 35
(two extra bytes are used at the start to indicate the protocol
version, although this isn't strictly necessary).
Protocol versions
Previously, pickling (but not unpickling) has distinguished
between text mode and binary mode. By design, text mode is a
subset of binary mode, and unpicklers don't need to know in
advance whether an incoming pickle uses text mode or binary mode.
The virtual machine used for unpickling is the same regardless of
the mode; certain opcode simply aren't used in text mode.
Retroactively, text mode is called protocol 0, and binary mode is
called protocol 1. The new protocol is called protocol 2. In the
tradition of pickling protocols, protocol 2 is a superset of
protocol 1. But just so that future pickling protocols aren't
required to be supersets of the oldest protocols, a new opcode is
inserted at the start of a protocol 2 pickle indicating that it is
using protocol 2.
Several functions, methods and constructors used for pickling used
to take a positional argument named 'bin' which was a flag,
defaulting to 0, indicating binary mode. This argument is renamed
to 'proto' and now gives the protocol number, defaulting to 0.
It so happens that passing 2 for the 'bin' argument in previous
Python versions had the same effect as passing 1. Nevertheless, a
special case is added here: passing a negative number selects the
highest protocol version supported by a particular
implementation. This works in previous Python versions, too.
Copyright