Add draft for PEP 3154, "Pickle protocol version 4"
This commit is contained in:
parent
cb20624396
commit
b1b6360f57
|
@ -0,0 +1,107 @@
|
|||
PEP: 3154
|
||||
Title: Pickle protocol version 4
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Antoine Pitrou <solipsis@pitrou.net>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 2011-08-11
|
||||
Python-Version: 3.3
|
||||
Post-History:
|
||||
Resolution: TBD
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Data serialized using the pickle module must be portable accross Python
|
||||
versions. It should also support the latest language features as well as
|
||||
implementation-specific features. For this reason, the pickle module knows
|
||||
about several protocols (currently numbered from 0 to 3), each of which
|
||||
appeared in a different Python version. Using a low-numbered protocol
|
||||
version allows to exchange data with old Python versions, while using a
|
||||
high-numbered protocol allows access to newer features and sometimes more
|
||||
efficient resource use (both CPU time required for (de)serializing, and
|
||||
disk size / network bandwidth required for data transfer).
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The latest current protocol, coincidentally named protocol 3, appeared with
|
||||
Python 3.0 and supports the new incompatible features in the language
|
||||
(mainly, unicode strings by default and the new bytes object). The
|
||||
opportunity was not taken at the time to improve the protocol in other ways.
|
||||
|
||||
This PEP is an attempt to foster a number of small incremental improvements
|
||||
in a future new protocol version. The PEP process is used in order to gather
|
||||
as many improvements as possible, because the introduction of a new protocol
|
||||
version should be a rare occurrence.
|
||||
|
||||
|
||||
Improvements in discussion
|
||||
==========================
|
||||
|
||||
64-bit compatibility for large objects
|
||||
--------------------------------------
|
||||
|
||||
Current protocol versions export object sizes for various built-in types
|
||||
(str, bytes) as 32-bit ints. This forbids serialization of large data [1]_.
|
||||
New opcodes are required to support very large bytes and str objects.
|
||||
|
||||
Native opcodes for sets and frozensets
|
||||
--------------------------------------
|
||||
|
||||
Many common built-in types (such as str, bytes, dict, list, tuple) have
|
||||
dedicated opcodes to improve resource consumption when serializing and
|
||||
deserializing them; however, sets and frozensets don't. Adding such opcodes
|
||||
would be an obvious improvements. Also, dedicated set support could help
|
||||
remove the current impossibility of pickling self-referential sets
|
||||
[2]_.
|
||||
|
||||
Binary encoding for all opcodes
|
||||
-------------------------------
|
||||
|
||||
The GLOBAL opcode, which is still used in protocol 3, uses the so-called
|
||||
"text" mode of the pickle protocol, which involves looking for newlines
|
||||
in the pickle stream. Looking for newlines is difficult to optimize on
|
||||
a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?)
|
||||
could use a binary encoding instead.
|
||||
|
||||
It seems that all other opcodes emitted when using protocol 3 already use
|
||||
binary encoding.
|
||||
|
||||
|
||||
|
||||
Acknowledgments
|
||||
===============
|
||||
|
||||
(...)
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] "pickle not 64-bit ready":
|
||||
http://bugs.python.org/issue11564
|
||||
|
||||
.. [2] "Cannot pickle self-referencing sets":
|
||||
http://bugs.python.org/issue9269
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue