156 lines
5.0 KiB
Plaintext
156 lines
5.0 KiB
Plaintext
PEP: 3154
|
|
Title: Pickle protocol version 4
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2011-08-11
|
|
Python-Version: 3.3
|
|
Post-History:
|
|
Resolution: TBD
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Data serialized using the pickle module must be portable across Python
|
|
versions. It should also support the latest language features as well
|
|
as implementation-specific features. For this reason, the pickle
|
|
module knows about several protocols (currently numbered from 0 to 3),
|
|
each of which appeared in a different Python version. Using a
|
|
low-numbered protocol version allows to exchange data with old Python
|
|
versions, while using a high-numbered protocol allows access to newer
|
|
features and sometimes more efficient resource use (both CPU time
|
|
required for (de)serializing, and disk size / network bandwidth
|
|
required for data transfer).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The latest current protocol, coincidentally named protocol 3, appeared
|
|
with Python 3.0 and supports the new incompatible features in the
|
|
language (mainly, unicode strings by default and the new bytes
|
|
object). The opportunity was not taken at the time to improve the
|
|
protocol in other ways.
|
|
|
|
This PEP is an attempt to foster a number of small incremental
|
|
improvements in a future new protocol version. The PEP process is
|
|
used in order to gather as many improvements as possible, because the
|
|
introduction of a new protocol version should be a rare occurrence.
|
|
|
|
|
|
Improvements in discussion
|
|
==========================
|
|
|
|
64-bit compatibility for large objects
|
|
--------------------------------------
|
|
|
|
Current protocol versions export object sizes for various built-in
|
|
types (str, bytes) as 32-bit ints. This forbids serialization of
|
|
large data [1]_. New opcodes are required to support very large bytes
|
|
and str objects.
|
|
|
|
Native opcodes for sets and frozensets
|
|
--------------------------------------
|
|
|
|
Many common built-in types (such as str, bytes, dict, list, tuple)
|
|
have dedicated opcodes to improve resource consumption when
|
|
serializing and deserializing them; however, sets and frozensets
|
|
don't. Adding such opcodes would be an obvious improvement. Also,
|
|
dedicated set support could help remove the current impossibility of
|
|
pickling self-referential sets [2]_.
|
|
|
|
Calling __new__ with keyword arguments
|
|
--------------------------------------
|
|
|
|
Currently, classes whose __new__ mandates the use of keyword-only
|
|
arguments can not be pickled (or, rather, unpickled) [3]_. Both a new
|
|
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
|
are needed.
|
|
|
|
Serializing more callable objects
|
|
---------------------------------
|
|
|
|
Currently, only module-global functions are serializable.
|
|
Multiprocessing has custom support for pickling other callables such
|
|
as bound methods [4]_. This support could be folded in the protocol,
|
|
and made more efficient through a new GETATTR opcode.
|
|
|
|
Serializing "pseudo-global" objects
|
|
-----------------------------------
|
|
|
|
Objects which are not module-global, but should be treated in a
|
|
similar fashion -- such as unbound methods [5]_ or nested classes --
|
|
cannot currently be pickled (or, rather, unpickled) because the pickle
|
|
protocol does not correctly specify how to retrieve them. One
|
|
solution would be through the adjunction of a ``__namespace__`` (or
|
|
``__qualname__``) to all class and function objects, specifying the
|
|
full "path" by which they can be retrieved. For globals, this would
|
|
generally be ``"{}.{}".format(obj.__module__, obj.__name__)``. Then a
|
|
new opcode can resolve that path and push the object on the stack,
|
|
similarly to the GLOBAL opcode.
|
|
|
|
Binary encoding for all opcodes
|
|
-------------------------------
|
|
|
|
The GLOBAL opcode, which is still used in protocol 3, uses the
|
|
so-called "text" mode of the pickle protocol, which involves looking
|
|
for newlines in the pickle stream. Looking for newlines is difficult
|
|
to optimize on a non-seekable stream, and therefore a new version of
|
|
GLOBAL (BINGLOBAL?) could use a binary encoding instead.
|
|
|
|
It seems that all other opcodes emitted when using protocol 3 already
|
|
use binary encoding.
|
|
|
|
Better string encoding
|
|
----------------------
|
|
|
|
Short str objects currently have their length coded as a 4-bytes
|
|
integer, which is wasteful. A specific opcode with a 1-byte length
|
|
would make many pickles smaller.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
(...)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "pickle not 64-bit ready":
|
|
http://bugs.python.org/issue11564
|
|
|
|
.. [2] "Cannot pickle self-referencing sets":
|
|
http://bugs.python.org/issue9269
|
|
|
|
.. [3] "pickle/copyreg doesn't support keyword only arguments in __new__":
|
|
http://bugs.python.org/issue4727
|
|
|
|
.. [4] Lib/multiprocessing/forking.py:
|
|
http://hg.python.org/cpython/file/baea9f5f973c/Lib/multiprocessing/forking.py#l54
|
|
|
|
.. [5] "pickle should support methods":
|
|
http://bugs.python.org/issue9276
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|