Add draft for PEP 3154, "Pickle protocol version 4"

This commit is contained in:
Antoine Pitrou 2011-08-11 20:10:41 +02:00
parent cb20624396
commit b1b6360f57
1 changed files with 107 additions and 0 deletions

107
pep-3154.txt Normal file
View File

@ -0,0 +1,107 @@
PEP: 3154
Title: Pickle protocol version 4
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis@pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2011-08-11
Python-Version: 3.3
Post-History:
Resolution: TBD
Abstract
========
Data serialized using the pickle module must be portable accross Python
versions. It should also support the latest language features as well as
implementation-specific features. For this reason, the pickle module knows
about several protocols (currently numbered from 0 to 3), each of which
appeared in a different Python version. Using a low-numbered protocol
version allows to exchange data with old Python versions, while using a
high-numbered protocol allows access to newer features and sometimes more
efficient resource use (both CPU time required for (de)serializing, and
disk size / network bandwidth required for data transfer).
Rationale
=========
The latest current protocol, coincidentally named protocol 3, appeared with
Python 3.0 and supports the new incompatible features in the language
(mainly, unicode strings by default and the new bytes object). The
opportunity was not taken at the time to improve the protocol in other ways.
This PEP is an attempt to foster a number of small incremental improvements
in a future new protocol version. The PEP process is used in order to gather
as many improvements as possible, because the introduction of a new protocol
version should be a rare occurrence.
Improvements in discussion
==========================
64-bit compatibility for large objects
--------------------------------------
Current protocol versions export object sizes for various built-in types
(str, bytes) as 32-bit ints. This forbids serialization of large data [1]_.
New opcodes are required to support very large bytes and str objects.
Native opcodes for sets and frozensets
--------------------------------------
Many common built-in types (such as str, bytes, dict, list, tuple) have
dedicated opcodes to improve resource consumption when serializing and
deserializing them; however, sets and frozensets don't. Adding such opcodes
would be an obvious improvements. Also, dedicated set support could help
remove the current impossibility of pickling self-referential sets
[2]_.
Binary encoding for all opcodes
-------------------------------
The GLOBAL opcode, which is still used in protocol 3, uses the so-called
"text" mode of the pickle protocol, which involves looking for newlines
in the pickle stream. Looking for newlines is difficult to optimize on
a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?)
could use a binary encoding instead.
It seems that all other opcodes emitted when using protocol 3 already use
binary encoding.
Acknowledgments
===============
(...)
References
==========
.. [1] "pickle not 64-bit ready":
http://bugs.python.org/issue11564
.. [2] "Cannot pickle self-referencing sets":
http://bugs.python.org/issue9269
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: