Introduce extension codes.

This commit is contained in:
Guido van Rossum 2003-02-04 19:12:25 +00:00
parent eb93e1fcdc
commit 8787ee145b
1 changed files with 76 additions and 0 deletions

View File

@ -488,6 +488,82 @@ The __newobj__ unpickling function
Python 2.3.
The extension registry
Protocol 2 supports a new mechanism to reduce the size of pickles.
When class instances (classic or new-style) are pickled, the full
name of the class (module name including package name, and class
name) is included in the pickle. Especially for applications that
generate many small pickles, this is a lot of overhead that has to
be repeated in each pickle. For large pickles, when using
protocol 1, repeated references to the same class name are
compressed using the "memo" feature; but each class name must be
spelled in full at least once per pickle, and this causes a lot of
overhead for small pickles.
The extension registry allows one to represent the most frequently
used names by small integers, which are pickled very efficiently:
an extension code in the range 1-255 requires only two bytes
including the opcode, one in the range 256-65535 requires only
three bytes including the opcode.
One of the design goals of the pickle protocol is to make pickles
"context-free": as long as you have installed the modules
containing the classes referenced by a pickle, you can unpickle
it, without needing to import any of those classes ahead of time.
Unbridled use of extension codes could jeopardize this desirable
property of pickles. Therefore, the main use of extension codes
is reserved for a set of codes to be standardized by some
standard-setting body. This being Python, the standard-setting
body is the PSF. From time to time, the PSF will decide on a
table mapping extension codes to class names (or occasionally
names of other global objects; functions are also eligible). This
table will be incorporated in the next Python release(s).
However, for some applications, like Zope, context-free pickles
are not a requirement, and waiting for the PSF to standardize
some codes may not be practical. Two solutions are offered for
such applications.
First of all, a few ranges of extension codes is reserved for
private use. Any application can register codes in these ranges.
Two applications exchanging pickles using codes in these ranges
need to have some out-of-band mechanism to agree on the mapping
between extension codes and names.
Second, some large Python projects (e.g. Zope or Twisted) can be
assigned a range of extension codes outside the "private use"
range that they can assign as they see fit.
The extension registry is defined as a mapping between extension
codes and names. When an extension code is unpickled, it ends up
producing an object, but this object is gotten by interpreting the
name as a module name followed by a class (or function) name. The
mapping from names to objects is cached. It is quite possible
that certain names cannot be imported; that should not be a
problem as long as no pickle containing a reference to such names
has to be unpickled. (The same issue already exists for direct
references to such names in pickles that use protocols 0 or 1.)
Here is the proposed initial assigment of extension code ranges:
First Last Count Purpose
0 0 1 Reserved -- will never be used
1 127 127 Reserved for Python standard library
128 191 64 Reserved for Zope 3
192 239 48 Reserved for 3rd parties
240 255 16 Reserved for private use (will never be assigned)
256 Max Max Reserved for future assignment
'Max' stands for 2147483647, or 2**31-1. This is a hard
limitation of the protocol as currently defined.
At the moment, no specific extension codes have been assigned yet.
TBD
The rest of this PEP is still under construction!