151 lines
5.0 KiB
Plaintext
151 lines
5.0 KiB
Plaintext
PEP: 3154
|
|
Title: Pickle protocol version 4
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2011-08-11
|
|
Python-Version: 3.3
|
|
Post-History:
|
|
Resolution: TBD
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Data serialized using the pickle module must be portable across Python
|
|
versions. It should also support the latest language features as well as
|
|
implementation-specific features. For this reason, the pickle module knows
|
|
about several protocols (currently numbered from 0 to 3), each of which
|
|
appeared in a different Python version. Using a low-numbered protocol
|
|
version allows to exchange data with old Python versions, while using a
|
|
high-numbered protocol allows access to newer features and sometimes more
|
|
efficient resource use (both CPU time required for (de)serializing, and
|
|
disk size / network bandwidth required for data transfer).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The latest current protocol, coincidentally named protocol 3, appeared with
|
|
Python 3.0 and supports the new incompatible features in the language
|
|
(mainly, unicode strings by default and the new bytes object). The
|
|
opportunity was not taken at the time to improve the protocol in other ways.
|
|
|
|
This PEP is an attempt to foster a number of small incremental improvements
|
|
in a future new protocol version. The PEP process is used in order to gather
|
|
as many improvements as possible, because the introduction of a new protocol
|
|
version should be a rare occurrence.
|
|
|
|
|
|
Improvements in discussion
|
|
==========================
|
|
|
|
64-bit compatibility for large objects
|
|
--------------------------------------
|
|
|
|
Current protocol versions export object sizes for various built-in types
|
|
(str, bytes) as 32-bit ints. This forbids serialization of large data [1]_.
|
|
New opcodes are required to support very large bytes and str objects.
|
|
|
|
Native opcodes for sets and frozensets
|
|
--------------------------------------
|
|
|
|
Many common built-in types (such as str, bytes, dict, list, tuple) have
|
|
dedicated opcodes to improve resource consumption when serializing and
|
|
deserializing them; however, sets and frozensets don't. Adding such opcodes
|
|
would be an obvious improvement. Also, dedicated set support could help
|
|
remove the current impossibility of pickling self-referential sets
|
|
[2]_.
|
|
|
|
Calling __new__ with keyword arguments
|
|
--------------------------------------
|
|
|
|
Currently, classes whose __new__ mandates the use of keyword-only arguments
|
|
can not be pickled (or, rather, unpickled) [3]_. Both a new special method
|
|
(``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?) are needed.
|
|
|
|
Serializing more callable objects
|
|
---------------------------------
|
|
|
|
Currently, only module-global functions are serializable. Multiprocessing
|
|
has custom support for pickling other callables such as bound methods [4]_.
|
|
This support could be folded in the protocol, and made more efficient
|
|
through a new GETATTR opcode.
|
|
|
|
Serializing "pseudo-global" objects
|
|
-----------------------------------
|
|
|
|
Objects which are not module-global, but should be treated in a similar
|
|
fashion -- such as unbound methods [5]_ or nested classes -- cannot currently
|
|
be pickled (or, rather, unpickled) because the pickle protocol does not
|
|
correctly specify how to retrieve them. One solution would be through the
|
|
adjunction of a ``__namespace__`` (or ``__qualname__``) to all class and
|
|
function objects, specifying the full "path" by which they can be retrieved.
|
|
For globals, this would generally be ``"{}.{}".format(obj.__module__, obj.__name__)``.
|
|
Then a new opcode can resolve that path and push the object on the stack,
|
|
similarly to the GLOBAL opcode.
|
|
|
|
Binary encoding for all opcodes
|
|
-------------------------------
|
|
|
|
The GLOBAL opcode, which is still used in protocol 3, uses the so-called
|
|
"text" mode of the pickle protocol, which involves looking for newlines
|
|
in the pickle stream. Looking for newlines is difficult to optimize on
|
|
a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?)
|
|
could use a binary encoding instead.
|
|
|
|
It seems that all other opcodes emitted when using protocol 3 already use
|
|
binary encoding.
|
|
|
|
Better string encoding
|
|
----------------------
|
|
|
|
Short str objects currently have their length coded as a 4-bytes integer,
|
|
which is wasteful. A specific opcode with a 1-byte length would make
|
|
many pickles smaller.
|
|
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
(...)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "pickle not 64-bit ready":
|
|
http://bugs.python.org/issue11564
|
|
|
|
.. [2] "Cannot pickle self-referencing sets":
|
|
http://bugs.python.org/issue9269
|
|
|
|
.. [3] "pickle/copyreg doesn't support keyword only arguments in __new__":
|
|
http://bugs.python.org/issue4727
|
|
|
|
.. [4] Lib/multiprocessing/forking.py:
|
|
http://hg.python.org/cpython/file/baea9f5f973c/Lib/multiprocessing/forking.py#l54
|
|
|
|
.. [5] "pickle should support methods":
|
|
http://bugs.python.org/issue9276
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|