121 lines
4.8 KiB
Plaintext
121 lines
4.8 KiB
Plaintext
PEP: 307
|
||
Title: Extensions to the pickle protocol
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Guido van Rossum, Tim Peters
|
||
Status: Active
|
||
Type: Standards Track
|
||
Content-Type: text/plain
|
||
Created: 31-Jan-2003
|
||
Post-History: None
|
||
|
||
|
||
Introduction
|
||
|
||
Pickling new-style objects in Python 2.2 is done somewhat clumsily
|
||
and causes pickle size to bloat compared to classic class
|
||
instances. This PEP documents a new pickle protocol that takes
|
||
care of this and many other pickle issues.
|
||
|
||
There are two sides to specifying a new pickle protocol: the byte
|
||
stream constituting pickled data must be specified, and the
|
||
interface between objects and the pickling and unpickling engines
|
||
must be specified. This PEP focuses on API issues, although it
|
||
may occasionally touch on byte stream format details to motivate a
|
||
choice. The pickle byte stream format is documented formally by
|
||
the standard library module pickletools.py (already checked into
|
||
CVS for Python 2.3).
|
||
|
||
|
||
Motivation
|
||
|
||
Pickling new-style objects causes serious pickle bloat. For
|
||
example, the binary pickle for a classic object with one instance
|
||
variable takes up 33 bytes; a new-style object with one instance
|
||
variable takes up 86 bytes. This was measured as follows:
|
||
|
||
class C(object): # Omit "(object)" for classic class
|
||
pass
|
||
x = C()
|
||
x.foo = 42
|
||
print len(pickle.dumps(x, 1))
|
||
|
||
The reasons for the bloat are complex, but are mostly caused by
|
||
the fact that new-style objects use __reduce__ in order to be
|
||
picklable at all. After ample consideration we've concluded that
|
||
the only way to reduce pickle sizes for new-style objects is to
|
||
add new opcodes to the pickle protocol. The net result is that
|
||
with the new protocol, the pickle size in the above example is 35
|
||
(two extra bytes are used at the start to indicate the protocol
|
||
version, although this isn't strictly necessary).
|
||
|
||
|
||
Protocol versions
|
||
|
||
Previously, pickling (but not unpickling) has distinguished
|
||
between text mode and binary mode. By design, text mode is a
|
||
subset of binary mode, and unpicklers don't need to know in
|
||
advance whether an incoming pickle uses text mode or binary mode.
|
||
The virtual machine used for unpickling is the same regardless of
|
||
the mode; certain opcode simply aren't used in text mode.
|
||
|
||
Retroactively, text mode is called protocol 0, and binary mode is
|
||
called protocol 1. The new protocol is called protocol 2. In the
|
||
tradition of pickling protocols, protocol 2 is a superset of
|
||
protocol 1. But just so that future pickling protocols aren't
|
||
required to be supersets of the oldest protocols, a new opcode is
|
||
inserted at the start of a protocol 2 pickle indicating that it is
|
||
using protocol 2.
|
||
|
||
Several functions, methods and constructors used for pickling used
|
||
to take a positional argument named 'bin' which was a flag,
|
||
defaulting to 0, indicating binary mode. This argument is renamed
|
||
to 'proto' and now gives the protocol number, defaulting to 0.
|
||
|
||
It so happens that passing 2 for the 'bin' argument in previous
|
||
Python versions had the same effect as passing 1. Nevertheless, a
|
||
special case is added here: passing a negative number selects the
|
||
highest protocol version supported by a particular implementation.
|
||
This works in previous Python versions, too.
|
||
|
||
|
||
Security issues
|
||
|
||
In previous versions of Python, unpickling would do a "safety
|
||
check" on certain operations, refusing to call functions or
|
||
constructors that weren't marked as "safe for unpickling" by
|
||
either having an attribute __safe_for_unpickling__ set to 1, or by
|
||
being registered in a global registry, copy_reg.safe_constructors.
|
||
|
||
This feature gives a false sense of security: nobody has ever done
|
||
the necessary, extensive, code audit to prove that unpickling
|
||
untrusted pickles cannot invoke unwanted code, and in fact bugs in
|
||
the Python 2.2 pickle.py module make it easy to circumvent these
|
||
security measures.
|
||
|
||
We firmly believe that, on the Internet, it is better to know that
|
||
you are using an insecure protocol than to trust a protocol to be
|
||
secure whose implementation hasn't been thoroughly checked. Even
|
||
high quality implementations of widely used protocols are
|
||
routinely found flawed; Python's pickle implementation simply
|
||
cannot make such guarantees without a much larger time investment.
|
||
Therefore, as of Python 2.3, all safety checks on unpickling are
|
||
officially removed, and replaced with this warning:
|
||
|
||
*** Do not unpickle data received from an untrusted or
|
||
unauthenticated source ***
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|