Pickle 4 changes:
- add framing - change BINGLOBAL to Alexandre Vassalotti's GLOBAL_STACK
This commit is contained in:
parent
2d013833a9
commit
b26857967d
91
pep-3154.txt
91
pep-3154.txt
|
@ -42,11 +42,67 @@ used in order to gather as many improvements as possible, because the
|
|||
introduction of a new protocol version should be a rare occurrence.
|
||||
|
||||
|
||||
Improvements in discussion
|
||||
==========================
|
||||
Proposed changes
|
||||
================
|
||||
|
||||
64-bit compatibility for large objects
|
||||
--------------------------------------
|
||||
Framing
|
||||
-------
|
||||
|
||||
Traditionally, when unpickling an object from a stream (by calling
|
||||
``load()`` rather than ``loads()``), many small ``read()``
|
||||
calls can be issued on the file-like object, with a potentially huge
|
||||
performance impact.
|
||||
|
||||
Protocol 4, by contrast, features binary framing. The general structure
|
||||
of a pickle is thus the following::
|
||||
|
||||
+------+------+
|
||||
| 0x80 | 0x03 | protocol header (2 bytes)
|
||||
+------+------+-----------+
|
||||
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
||||
+------+------------------+
|
||||
| .... | first frame contents (N bytes)
|
||||
+------+------+-----------+
|
||||
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
||||
+------+------------------+
|
||||
| .... | second frame contents (N bytes)
|
||||
+------+
|
||||
etc.
|
||||
|
||||
To keep the implementation simple, it is forbidden for a pickle opcode
|
||||
to overlap frame boundaries. The pickler takes care not to produce such
|
||||
pickles, and the unpickler refuses them.
|
||||
|
||||
How the pickler decides frame sizes is an implementation detail.
|
||||
A simple heuristic committing the current frame as soon as it reaches
|
||||
64 KiB seems sufficient.
|
||||
|
||||
Binary encoding for all opcodes
|
||||
-------------------------------
|
||||
|
||||
The GLOBAL opcode, which is still used in protocol 3, uses the
|
||||
so-called "text" mode of the pickle protocol, which involves looking
|
||||
for newlines in the pickle stream. It also complicates the implementation
|
||||
of binary framing.
|
||||
|
||||
Protocol 4 forbids use of the GLOBAL opcode and replaces it with
|
||||
GLOBAL_STACK, a new opcode which takes its operand from the stack.
|
||||
|
||||
Serializing more "lookupable" objects
|
||||
-------------------------------------
|
||||
|
||||
By default, pickle is only able to serialize module-global functions and
|
||||
classes. Supporting other kinds of objects, such as unbound methods [4]_,
|
||||
is a common request. Actually, third-party support for some of them, such
|
||||
as bound methods, is implemented in the multiprocessing module [5]_.
|
||||
|
||||
The ``__qualname__`` attribute from :pep:`3155` makes it possible to
|
||||
lookup many more objects by name. Making the GLOBAL_STACK opcode accept
|
||||
dot-separated names, or adding a special GETATTR opcode, would allow the
|
||||
standard pickle implementation to support all those kinds of objects.
|
||||
|
||||
64-bit opcodes for large objects
|
||||
--------------------------------
|
||||
|
||||
Current protocol versions export object sizes for various built-in
|
||||
types (str, bytes) as 32-bit ints. This forbids serialization of
|
||||
|
@ -71,33 +127,6 @@ arguments can not be pickled (or, rather, unpickled) [3]_. Both a new
|
|||
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
||||
are needed.
|
||||
|
||||
Serializing more "lookupable" objects
|
||||
-------------------------------------
|
||||
|
||||
For some kinds of objects, it only makes sense to serialize them by name
|
||||
(for example classes and functions). By default, pickle is only able to
|
||||
serialize module-global functions and classes by name. Supporting other
|
||||
kinds of objects, such as unbound methods [4]_, is a common request.
|
||||
Actually, third-party support for some of them, such as bound methods,
|
||||
is implemented in the multiprocessing module [5]_.
|
||||
|
||||
:pep:`3155` now makes it possible to lookup many more objects by name.
|
||||
Generalizing the GLOBAL opcode to accept dot-separated names, or adding
|
||||
a special GETATTR opcode, would allow the standard pickle implementation
|
||||
to support, in an efficient way, all those kinds of objects.
|
||||
|
||||
Binary encoding for all opcodes
|
||||
-------------------------------
|
||||
|
||||
The GLOBAL opcode, which is still used in protocol 3, uses the
|
||||
so-called "text" mode of the pickle protocol, which involves looking
|
||||
for newlines in the pickle stream. Looking for newlines is difficult
|
||||
to optimize on a non-seekable stream, and therefore a new version of
|
||||
GLOBAL (BINGLOBAL?) could use a binary encoding instead.
|
||||
|
||||
It seems that all other opcodes emitted when using protocol 3 already
|
||||
use binary encoding.
|
||||
|
||||
Better string encoding
|
||||
----------------------
|
||||
|
||||
|
|
Loading…
Reference in New Issue