108 lines
3.2 KiB
Plaintext
108 lines
3.2 KiB
Plaintext
PEP: 3154
|
|
Title: Pickle protocol version 4
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 2011-08-11
|
|
Python-Version: 3.3
|
|
Post-History:
|
|
Resolution: TBD
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Data serialized using the pickle module must be portable accross Python
|
|
versions. It should also support the latest language features as well as
|
|
implementation-specific features. For this reason, the pickle module knows
|
|
about several protocols (currently numbered from 0 to 3), each of which
|
|
appeared in a different Python version. Using a low-numbered protocol
|
|
version allows to exchange data with old Python versions, while using a
|
|
high-numbered protocol allows access to newer features and sometimes more
|
|
efficient resource use (both CPU time required for (de)serializing, and
|
|
disk size / network bandwidth required for data transfer).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The latest current protocol, coincidentally named protocol 3, appeared with
|
|
Python 3.0 and supports the new incompatible features in the language
|
|
(mainly, unicode strings by default and the new bytes object). The
|
|
opportunity was not taken at the time to improve the protocol in other ways.
|
|
|
|
This PEP is an attempt to foster a number of small incremental improvements
|
|
in a future new protocol version. The PEP process is used in order to gather
|
|
as many improvements as possible, because the introduction of a new protocol
|
|
version should be a rare occurrence.
|
|
|
|
|
|
Improvements in discussion
|
|
==========================
|
|
|
|
64-bit compatibility for large objects
|
|
--------------------------------------
|
|
|
|
Current protocol versions export object sizes for various built-in types
|
|
(str, bytes) as 32-bit ints. This forbids serialization of large data [1]_.
|
|
New opcodes are required to support very large bytes and str objects.
|
|
|
|
Native opcodes for sets and frozensets
|
|
--------------------------------------
|
|
|
|
Many common built-in types (such as str, bytes, dict, list, tuple) have
|
|
dedicated opcodes to improve resource consumption when serializing and
|
|
deserializing them; however, sets and frozensets don't. Adding such opcodes
|
|
would be an obvious improvements. Also, dedicated set support could help
|
|
remove the current impossibility of pickling self-referential sets
|
|
[2]_.
|
|
|
|
Binary encoding for all opcodes
|
|
-------------------------------
|
|
|
|
The GLOBAL opcode, which is still used in protocol 3, uses the so-called
|
|
"text" mode of the pickle protocol, which involves looking for newlines
|
|
in the pickle stream. Looking for newlines is difficult to optimize on
|
|
a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?)
|
|
could use a binary encoding instead.
|
|
|
|
It seems that all other opcodes emitted when using protocol 3 already use
|
|
binary encoding.
|
|
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
(...)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] "pickle not 64-bit ready":
|
|
http://bugs.python.org/issue11564
|
|
|
|
.. [2] "Cannot pickle self-referencing sets":
|
|
http://bugs.python.org/issue9269
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|