149 lines
4.7 KiB
Plaintext
149 lines
4.7 KiB
Plaintext
PEP: 3154
|
|
Title: Pickle protocol version 4
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2011-08-11
|
|
Python-Version: 3.3
|
|
Post-History:
|
|
Resolution: TBD
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Data serialized using the pickle module must be portable across Python
|
|
versions. It should also support the latest language features as well
|
|
as implementation-specific features. For this reason, the pickle
|
|
module knows about several protocols (currently numbered from 0 to 3),
|
|
each of which appeared in a different Python version. Using a
|
|
low-numbered protocol version allows to exchange data with old Python
|
|
versions, while using a high-numbered protocol allows access to newer
|
|
features and sometimes more efficient resource use (both CPU time
|
|
required for (de)serializing, and disk size / network bandwidth
|
|
required for data transfer).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The latest current protocol, coincidentally named protocol 3, appeared
|
|
with Python 3.0 and supports the new incompatible features in the
|
|
language (mainly, unicode strings by default and the new bytes
|
|
object). The opportunity was not taken at the time to improve the
|
|
protocol in other ways.
|
|
|
|
This PEP is an attempt to foster a number of small incremental
|
|
improvements in a future new protocol version. The PEP process is
|
|
used in order to gather as many improvements as possible, because the
|
|
introduction of a new protocol version should be a rare occurrence.
|
|
|
|
|
|
Improvements in discussion
|
|
==========================
|
|
|
|
64-bit compatibility for large objects
|
|
--------------------------------------
|
|
|
|
Current protocol versions export object sizes for various built-in
|
|
types (str, bytes) as 32-bit ints. This forbids serialization of
|
|
large data [1]_. New opcodes are required to support very large bytes
|
|
and str objects.
|
|
|
|
Native opcodes for sets and frozensets
|
|
--------------------------------------
|
|
|
|
Many common built-in types (such as str, bytes, dict, list, tuple)
|
|
have dedicated opcodes to improve resource consumption when
|
|
serializing and deserializing them; however, sets and frozensets
|
|
don't. Adding such opcodes would be an obvious improvement. Also,
|
|
dedicated set support could help remove the current impossibility of
|
|
pickling self-referential sets [2]_.
|
|
|
|
Calling __new__ with keyword arguments
|
|
--------------------------------------
|
|
|
|
Currently, classes whose __new__ mandates the use of keyword-only
|
|
arguments can not be pickled (or, rather, unpickled) [3]_. Both a new
|
|
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
|
are needed.
|
|
|
|
Serializing more "lookupable" objects
|
|
-------------------------------------
|
|
|
|
For some kinds of objects, it only makes sense to serialize them by name
|
|
(for example classes and functions). By default, pickle is only able to
|
|
serialize module-global functions and classes by name. Supporting other
|
|
kinds of objects, such as unbound methods [4]_, is a common request.
|
|
Actually, third-party support for some of them, such as bound methods,
|
|
is implemented in the multiprocessing module [5]_.
|
|
|
|
:pep:`3155` now makes it possible to lookup many more objects by name.
|
|
Generalizing the GLOBAL opcode to accept dot-separated names, or adding
|
|
a special GETATTR opcode, would allow the standard pickle implementation
|
|
to support, in an efficient way, all those kinds of objects.
|
|
|
|
Binary encoding for all opcodes
|
|
-------------------------------
|
|
|
|
The GLOBAL opcode, which is still used in protocol 3, uses the
|
|
so-called "text" mode of the pickle protocol, which involves looking
|
|
for newlines in the pickle stream. Looking for newlines is difficult
|
|
to optimize on a non-seekable stream, and therefore a new version of
|
|
GLOBAL (BINGLOBAL?) could use a binary encoding instead.
|
|
|
|
It seems that all other opcodes emitted when using protocol 3 already
|
|
use binary encoding.
|
|
|
|
Better string encoding
|
|
----------------------
|
|
|
|
Short str objects currently have their length coded as a 4-bytes
|
|
integer, which is wasteful. A specific opcode with a 1-byte length
|
|
would make many pickles smaller.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
(...)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "pickle not 64-bit ready":
|
|
http://bugs.python.org/issue11564
|
|
|
|
.. [2] "Cannot pickle self-referencing sets":
|
|
http://bugs.python.org/issue9269
|
|
|
|
.. [3] "pickle/copyreg doesn't support keyword only arguments in __new__":
|
|
http://bugs.python.org/issue4727
|
|
|
|
.. [4] "pickle should support methods":
|
|
http://bugs.python.org/issue9276
|
|
|
|
.. [5] Lib/multiprocessing/forking.py:
|
|
http://hg.python.org/cpython/file/baea9f5f973c/Lib/multiprocessing/forking.py#l54
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|