178 lines
5.6 KiB
Plaintext
178 lines
5.6 KiB
Plaintext
PEP: 3154
|
|
Title: Pickle protocol version 4
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2011-08-11
|
|
Python-Version: 3.3
|
|
Post-History:
|
|
Resolution: TBD
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Data serialized using the pickle module must be portable across Python
|
|
versions. It should also support the latest language features as well
|
|
as implementation-specific features. For this reason, the pickle
|
|
module knows about several protocols (currently numbered from 0 to 3),
|
|
each of which appeared in a different Python version. Using a
|
|
low-numbered protocol version allows to exchange data with old Python
|
|
versions, while using a high-numbered protocol allows access to newer
|
|
features and sometimes more efficient resource use (both CPU time
|
|
required for (de)serializing, and disk size / network bandwidth
|
|
required for data transfer).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The latest current protocol, coincidentally named protocol 3, appeared
|
|
with Python 3.0 and supports the new incompatible features in the
|
|
language (mainly, unicode strings by default and the new bytes
|
|
object). The opportunity was not taken at the time to improve the
|
|
protocol in other ways.
|
|
|
|
This PEP is an attempt to foster a number of small incremental
|
|
improvements in a future new protocol version. The PEP process is
|
|
used in order to gather as many improvements as possible, because the
|
|
introduction of a new protocol version should be a rare occurrence.
|
|
|
|
|
|
Proposed changes
|
|
================
|
|
|
|
Framing
|
|
-------
|
|
|
|
Traditionally, when unpickling an object from a stream (by calling
|
|
``load()`` rather than ``loads()``), many small ``read()``
|
|
calls can be issued on the file-like object, with a potentially huge
|
|
performance impact.
|
|
|
|
Protocol 4, by contrast, features binary framing. The general structure
|
|
of a pickle is thus the following::
|
|
|
|
+------+------+
|
|
| 0x80 | 0x03 | protocol header (2 bytes)
|
|
+------+------+-----------+
|
|
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
|
+------+------------------+
|
|
| .... | first frame contents (N bytes)
|
|
+------+------+-----------+
|
|
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
|
+------+------------------+
|
|
| .... | second frame contents (N bytes)
|
|
+------+
|
|
etc.
|
|
|
|
To keep the implementation simple, it is forbidden for a pickle opcode
|
|
to overlap frame boundaries. The pickler takes care not to produce such
|
|
pickles, and the unpickler refuses them.
|
|
|
|
How the pickler decides frame sizes is an implementation detail.
|
|
A simple heuristic committing the current frame as soon as it reaches
|
|
64 KiB seems sufficient.
|
|
|
|
Binary encoding for all opcodes
|
|
-------------------------------
|
|
|
|
The GLOBAL opcode, which is still used in protocol 3, uses the
|
|
so-called "text" mode of the pickle protocol, which involves looking
|
|
for newlines in the pickle stream. It also complicates the implementation
|
|
of binary framing.
|
|
|
|
Protocol 4 forbids use of the GLOBAL opcode and replaces it with
|
|
GLOBAL_STACK, a new opcode which takes its operand from the stack.
|
|
|
|
Serializing more "lookupable" objects
|
|
-------------------------------------
|
|
|
|
By default, pickle is only able to serialize module-global functions and
|
|
classes. Supporting other kinds of objects, such as unbound methods [4]_,
|
|
is a common request. Actually, third-party support for some of them, such
|
|
as bound methods, is implemented in the multiprocessing module [5]_.
|
|
|
|
The ``__qualname__`` attribute from :pep:`3155` makes it possible to
|
|
lookup many more objects by name. Making the GLOBAL_STACK opcode accept
|
|
dot-separated names, or adding a special GETATTR opcode, would allow the
|
|
standard pickle implementation to support all those kinds of objects.
|
|
|
|
64-bit opcodes for large objects
|
|
--------------------------------
|
|
|
|
Current protocol versions export object sizes for various built-in
|
|
types (str, bytes) as 32-bit ints. This forbids serialization of
|
|
large data [1]_. New opcodes are required to support very large bytes
|
|
and str objects.
|
|
|
|
Native opcodes for sets and frozensets
|
|
--------------------------------------
|
|
|
|
Many common built-in types (such as str, bytes, dict, list, tuple)
|
|
have dedicated opcodes to improve resource consumption when
|
|
serializing and deserializing them; however, sets and frozensets
|
|
don't. Adding such opcodes would be an obvious improvement. Also,
|
|
dedicated set support could help remove the current impossibility of
|
|
pickling self-referential sets [2]_.
|
|
|
|
Calling __new__ with keyword arguments
|
|
--------------------------------------
|
|
|
|
Currently, classes whose __new__ mandates the use of keyword-only
|
|
arguments can not be pickled (or, rather, unpickled) [3]_. Both a new
|
|
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
|
are needed.
|
|
|
|
Better string encoding
|
|
----------------------
|
|
|
|
Short str objects currently have their length coded as a 4-bytes
|
|
integer, which is wasteful. A specific opcode with a 1-byte length
|
|
would make many pickles smaller.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
(...)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "pickle not 64-bit ready":
|
|
http://bugs.python.org/issue11564
|
|
|
|
.. [2] "Cannot pickle self-referencing sets":
|
|
http://bugs.python.org/issue9269
|
|
|
|
.. [3] "pickle/copyreg doesn't support keyword only arguments in __new__":
|
|
http://bugs.python.org/issue4727
|
|
|
|
.. [4] "pickle should support methods":
|
|
http://bugs.python.org/issue9276
|
|
|
|
.. [5] Lib/multiprocessing/forking.py:
|
|
http://hg.python.org/cpython/file/baea9f5f973c/Lib/multiprocessing/forking.py#l54
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|