diff --git a/pep-3154.txt b/pep-3154.txt index 4c29c0a38..a13f8ec14 100644 --- a/pep-3154.txt +++ b/pep-3154.txt @@ -58,15 +58,19 @@ Protocol 4, by contrast, features binary framing. The general structure of a pickle is thus the following:: +------+------+ - | 0x80 | 0x04 | protocol header (2 bytes) + | 0x80 | 0x04 | protocol header (2 bytes) + +------+------+ + | OP | FRAME opcode (1 byte) +------+------+-----------+ | MM MM MM MM MM MM MM MM | frame size (8 bytes, little-endian) +------+------------------+ - | .... | first frame contents (M bytes) + | .... | first frame contents (M bytes) + +------+ + | OP | FRAME opcode (1 byte) +------+------+-----------+ | NN NN NN NN NN NN NN NN | frame size (8 bytes, little-endian) +------+------------------+ - | .... | second frame contents (N bytes) + | .... | second frame contents (N bytes) +------+ etc. @@ -142,6 +146,16 @@ Short str objects currently have their length coded as a 4-bytes integer, which is wasteful. A specific opcode with a 1-byte length would make many pickles smaller. +Smaller memoization +------------------- + +The PUT opcodes all require an explicit index to select in which entry +of the memo dictionary the top-of-stack is memoized. However, in practice +those numbers are allocated in sequential order. A new opcode, MEMOIZE, +will instead store the top-of-stack in at the index equal to the current +size of the memo dictionary. This allows for shorter pickles, since PUT +opcodes are emitted for all non-atomic datatypes. + Summary of new opcodes ====================== @@ -149,6 +163,9 @@ Summary of new opcodes These reflect the state of the proposed implementation (thanks mostly to Alexandre Vassalotti's work): +* ``FRAME``: introduce a new frame (followed by the 8-byte frame size + and the frame contents). + * ``SHORT_BINUNICODE``: push a utf8-encoded str object with a one-byte size prefix (therefore less than 256 bytes long). @@ -178,6 +195,9 @@ to Alexandre Vassalotti's work): ``qualname``, and push the result of looking up the dotted ``qualname`` in the module named ``module_name``. +* ``MEMOIZE``: store the top-of-stack object in the memo dictionary with + an index equal to the current size of the memo dictionary. + Alternative ideas =================