Pickle 4 changes:
- add framing - change BINGLOBAL to Alexandre Vassalotti's GLOBAL_STACK
This commit is contained in:
parent
2d013833a9
commit
b26857967d
91
pep-3154.txt
91
pep-3154.txt
|
@ -42,11 +42,67 @@ used in order to gather as many improvements as possible, because the
|
||||||
introduction of a new protocol version should be a rare occurrence.
|
introduction of a new protocol version should be a rare occurrence.
|
||||||
|
|
||||||
|
|
||||||
Improvements in discussion
|
Proposed changes
|
||||||
==========================
|
================
|
||||||
|
|
||||||
64-bit compatibility for large objects
|
Framing
|
||||||
--------------------------------------
|
-------
|
||||||
|
|
||||||
|
Traditionally, when unpickling an object from a stream (by calling
|
||||||
|
``load()`` rather than ``loads()``), many small ``read()``
|
||||||
|
calls can be issued on the file-like object, with a potentially huge
|
||||||
|
performance impact.
|
||||||
|
|
||||||
|
Protocol 4, by contrast, features binary framing. The general structure
|
||||||
|
of a pickle is thus the following::
|
||||||
|
|
||||||
|
+------+------+
|
||||||
|
| 0x80 | 0x03 | protocol header (2 bytes)
|
||||||
|
+------+------+-----------+
|
||||||
|
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
||||||
|
+------+------------------+
|
||||||
|
| .... | first frame contents (N bytes)
|
||||||
|
+------+------+-----------+
|
||||||
|
| AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
|
||||||
|
+------+------------------+
|
||||||
|
| .... | second frame contents (N bytes)
|
||||||
|
+------+
|
||||||
|
etc.
|
||||||
|
|
||||||
|
To keep the implementation simple, it is forbidden for a pickle opcode
|
||||||
|
to overlap frame boundaries. The pickler takes care not to produce such
|
||||||
|
pickles, and the unpickler refuses them.
|
||||||
|
|
||||||
|
How the pickler decides frame sizes is an implementation detail.
|
||||||
|
A simple heuristic committing the current frame as soon as it reaches
|
||||||
|
64 KiB seems sufficient.
|
||||||
|
|
||||||
|
Binary encoding for all opcodes
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
The GLOBAL opcode, which is still used in protocol 3, uses the
|
||||||
|
so-called "text" mode of the pickle protocol, which involves looking
|
||||||
|
for newlines in the pickle stream. It also complicates the implementation
|
||||||
|
of binary framing.
|
||||||
|
|
||||||
|
Protocol 4 forbids use of the GLOBAL opcode and replaces it with
|
||||||
|
GLOBAL_STACK, a new opcode which takes its operand from the stack.
|
||||||
|
|
||||||
|
Serializing more "lookupable" objects
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
By default, pickle is only able to serialize module-global functions and
|
||||||
|
classes. Supporting other kinds of objects, such as unbound methods [4]_,
|
||||||
|
is a common request. Actually, third-party support for some of them, such
|
||||||
|
as bound methods, is implemented in the multiprocessing module [5]_.
|
||||||
|
|
||||||
|
The ``__qualname__`` attribute from :pep:`3155` makes it possible to
|
||||||
|
lookup many more objects by name. Making the GLOBAL_STACK opcode accept
|
||||||
|
dot-separated names, or adding a special GETATTR opcode, would allow the
|
||||||
|
standard pickle implementation to support all those kinds of objects.
|
||||||
|
|
||||||
|
64-bit opcodes for large objects
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
Current protocol versions export object sizes for various built-in
|
Current protocol versions export object sizes for various built-in
|
||||||
types (str, bytes) as 32-bit ints. This forbids serialization of
|
types (str, bytes) as 32-bit ints. This forbids serialization of
|
||||||
|
@ -71,33 +127,6 @@ arguments can not be pickled (or, rather, unpickled) [3]_. Both a new
|
||||||
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
|
||||||
are needed.
|
are needed.
|
||||||
|
|
||||||
Serializing more "lookupable" objects
|
|
||||||
-------------------------------------
|
|
||||||
|
|
||||||
For some kinds of objects, it only makes sense to serialize them by name
|
|
||||||
(for example classes and functions). By default, pickle is only able to
|
|
||||||
serialize module-global functions and classes by name. Supporting other
|
|
||||||
kinds of objects, such as unbound methods [4]_, is a common request.
|
|
||||||
Actually, third-party support for some of them, such as bound methods,
|
|
||||||
is implemented in the multiprocessing module [5]_.
|
|
||||||
|
|
||||||
:pep:`3155` now makes it possible to lookup many more objects by name.
|
|
||||||
Generalizing the GLOBAL opcode to accept dot-separated names, or adding
|
|
||||||
a special GETATTR opcode, would allow the standard pickle implementation
|
|
||||||
to support, in an efficient way, all those kinds of objects.
|
|
||||||
|
|
||||||
Binary encoding for all opcodes
|
|
||||||
-------------------------------
|
|
||||||
|
|
||||||
The GLOBAL opcode, which is still used in protocol 3, uses the
|
|
||||||
so-called "text" mode of the pickle protocol, which involves looking
|
|
||||||
for newlines in the pickle stream. Looking for newlines is difficult
|
|
||||||
to optimize on a non-seekable stream, and therefore a new version of
|
|
||||||
GLOBAL (BINGLOBAL?) could use a binary encoding instead.
|
|
||||||
|
|
||||||
It seems that all other opcodes emitted when using protocol 3 already
|
|
||||||
use binary encoding.
|
|
||||||
|
|
||||||
Better string encoding
|
Better string encoding
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue