diff --git a/pep-0307.txt b/pep-0307.txt index 4f616bb9d..09678056b 100644 --- a/pep-0307.txt +++ b/pep-0307.txt @@ -15,9 +15,68 @@ Introduction Pickling new-style objects in Python 2.2 is done somewhat clumsily and causes pickle size to bloat compared to classic class instances. This PEP documents a new pickle protocol that takes - care of this and many other pickling issues. + care of this and many other pickle issues. - (XXX The rest of this PEP is TBD.) + There are two sides to specifying a new pickle protocol: the byte + stream constituting pickled data must be specified, and the + interface between objects and the pickling and unpickling engines + must be specified. This PEP focuses on API issues, although it + may occasionally touch on byte stream format details to motivate a + choice. The pickle byte stream format is documented formally by + the standard library module pickletools.py (already checked into + CVS for Python 2.3). + + +Motivation + + Pickling new-style objects causes serious pickle bloat. For + example, the binary pickle for a classic object with one instance + variable takes up 33 bytes; a new-style object with one instance + variable takes up 86 bytes. This was measured as follows: + + class C(object): # Omit "(object)" for classic class + pass + x = C() + x.foo = 42 + print len(pickle.dumps(x, 1)) + + The reasons for the bloat are complex, but are mostly caused by + the fact that new-style objects use __reduce__ in order to be + picklable at all. After ample consideration we've concluded that + the only way to reduce pickle sizes for new-style objects is to + add new opcodes to the pickle protocol. The net result is that + with the new protocol, the pickle size in the above example is 35 + (two extra bytes are used at the start to indicate the protocol + version, although this isn't strictly necessary). + + +Protocol versions + + Previously, pickling (but not unpickling) has distinguished + between text mode and binary mode. By design, text mode is a + subset of binary mode, and unpicklers don't need to know in + advance whether an incoming pickle uses text mode or binary mode. + The virtual machine used for unpickling is the same regardless of + the mode; certain opcode simply aren't used in text mode. + + Retroactively, text mode is called protocol 0, and binary mode is + called protocol 1. The new protocol is called protocol 2. In the + tradition of pickling protocols, protocol 2 is a superset of + protocol 1. But just so that future pickling protocols aren't + required to be supersets of the oldest protocols, a new opcode is + inserted at the start of a protocol 2 pickle indicating that it is + using protocol 2. + + Several functions, methods and constructors used for pickling used + to take a positional argument named 'bin' which was a flag, + defaulting to 0, indicating binary mode. This argument is renamed + to 'proto' and now gives the protocol number, defaulting to 0. + + It so happens that passing 2 for the 'bin' argument in previous + Python versions had the same effect as passing 1. Nevertheless, a + special case is added here: passing a negative number selects the + highest protocol version supported by a particular + implementation. This works in previous Python versions, too. Copyright