Add a FRAME opcode for framing, and document Alexandre's new MEMOIZE opcode

This commit is contained in:
Antoine Pitrou 2013-11-21 00:58:38 +01:00
parent 5a6cb6b60f
commit 3bafcf54c5
1 changed files with 23 additions and 3 deletions

View File

@ -58,15 +58,19 @@ Protocol 4, by contrast, features binary framing. The general structure
of a pickle is thus the following::
+------+------+
| 0x80 | 0x04 | protocol header (2 bytes)
| 0x80 | 0x04 | protocol header (2 bytes)
+------+------+
| OP | FRAME opcode (1 byte)
+------+------+-----------+
| MM MM MM MM MM MM MM MM | frame size (8 bytes, little-endian)
+------+------------------+
| .... | first frame contents (M bytes)
| .... | first frame contents (M bytes)
+------+
| OP | FRAME opcode (1 byte)
+------+------+-----------+
| NN NN NN NN NN NN NN NN | frame size (8 bytes, little-endian)
+------+------------------+
| .... | second frame contents (N bytes)
| .... | second frame contents (N bytes)
+------+
etc.
@ -142,6 +146,16 @@ Short str objects currently have their length coded as a 4-bytes
integer, which is wasteful. A specific opcode with a 1-byte length
would make many pickles smaller.
Smaller memoization
-------------------
The PUT opcodes all require an explicit index to select in which entry
of the memo dictionary the top-of-stack is memoized. However, in practice
those numbers are allocated in sequential order. A new opcode, MEMOIZE,
will instead store the top-of-stack in at the index equal to the current
size of the memo dictionary. This allows for shorter pickles, since PUT
opcodes are emitted for all non-atomic datatypes.
Summary of new opcodes
======================
@ -149,6 +163,9 @@ Summary of new opcodes
These reflect the state of the proposed implementation (thanks mostly
to Alexandre Vassalotti's work):
* ``FRAME``: introduce a new frame (followed by the 8-byte frame size
and the frame contents).
* ``SHORT_BINUNICODE``: push a utf8-encoded str object with a one-byte
size prefix (therefore less than 256 bytes long).
@ -178,6 +195,9 @@ to Alexandre Vassalotti's work):
``qualname``, and push the result of looking up the dotted ``qualname``
in the module named ``module_name``.
* ``MEMOIZE``: store the top-of-stack object in the memo dictionary with
an index equal to the current size of the memo dictionary.
Alternative ideas
=================