Document that people are expected to put a tuple in co_extra (#90)

This commit is contained in:
Brett Cannon 2016-09-03 12:25:18 -07:00 committed by GitHub
parent e5d996d002
commit 02ec458dd8
1 changed files with 55 additions and 39 deletions

View File

@ -91,19 +91,39 @@ The ``co_extra`` will be ``NULL`` by default and will not be used by
CPython itself. Third-party code is free to use the field as desired.
Values stored in the field are expected to not be required in order
for the code object to function, allowing the loss of the data of the
field to be acceptable (this keeps the code object as immutable from
a functionality point-of-view; this is slightly contentious and so is
listed as an open issue in `Is co_extra needed?`_). The field will be
freed like all other fields on ``PyCodeObject`` during deallocation
using ``Py_XDECREF()``.
field to be acceptable. The field will be freed like all other fields
on ``PyCodeObject`` during deallocation using ``Py_XDECREF()``.
It is not recommended that multiple users attempt to use the
``co_extra`` simultaneously. While a dictionary could theoretically be
set to the field and various users could use a key specific to the
project, there is still the issue of key collisions as well as
performance degradation from using a dictionary lookup on every frame
evaluation. Users are expected to do a type check to make sure that
the field has not been previously set by someone else.
Code using the field is expected to always store a tuple in the field.
This allows for multiple users of the field to not trample over each
other while being as performant as possible. Typical usage of the
field is expected to roughly follow the following pseudo-code::
if co_extra is None:
data = DataClass()
co_extra = (data,)
else:
assert isinstance(co_extra, tuple)
for x in co_extra:
if isinstance(x, DataClass):
data = x
break
else:
data = DataClass()
co_extra += (data,)
Using a list was considered but was found to be less performant, and
with a key use-case being JIT usage the performance consideration it
was deemed more important to use a tuple than a list. A tuple also
makes more sense semantically as the objects stored in the tuple will
be heterogeneous.
A dict was also considered, but once again performance was more
important. While a dict will have constant overhead in looking up
data, the overhead for the common case of a single object being stored
in the data structure leads to a tuple having better performance
characteristics (i.e. iterating a tuple of length 1 is faster than
the overhead of hashing and looking up an object in a dict).
Expanding ``PyInterpreterState``
@ -283,33 +303,6 @@ does require that the field not accidentally be cleared, else a crash
may occur.
Is co_extra needed?
-------------------
While discussing this PEP at PyCon US 2016, some core developers
expressed their worry of the ``co_extra`` field making code objects
mutable. The thinking seemed to be that having a field that was
mutated after the creation of the code object made the object seem
mutable, even though no other aspect of code objects changed.
The view of this PEP is that the `co_extra` field doesn't change the
fact that code objects are immutable. The field is specified in this
PEP as to not contain information required to make the code object
usable, making it more of a caching field. It could be viewed as
similar to the UTF-8 cache that string objects have internally;
strings are still considered immutable even though they have a field
that is conditionally set.
The field is also not strictly necessary. While the field greatly
simplifies attaching extra information to code objects, other options
such as keeping a mapping of code object memory addresses to what
would have been kept in ``co_extra`` or perhaps using a weak reference
of the data on the code object and then iterating through the weak
references until the attached data is found is possible. But obviously
all of these solutions are not as simple or performant as adding the
``co_extra`` field.
Rejected Ideas
==============
@ -328,6 +321,29 @@ loss in functionality or in performance while minimizing the API
changes required, the proposal was changed to its current form.
Is co_extra needed?
-------------------
While discussing this PEP at PyCon US 2016, some core developers
expressed their worry of the ``co_extra`` field making code objects
mutable. The thinking seemed to be that having a field that was
mutated after the creation of the code object made the object seem
mutable, even though no other aspect of code objects changed.
The view of this PEP is that the `co_extra` field doesn't change the
fact that code objects are immutable. The field is specified in this
PEP to not contain information required to make the code object
usable, making it more of a caching field. It could be viewed as
similar to the UTF-8 cache that string objects have internally;
strings are still considered immutable even though they have a field
that is conditionally set.
Performance measurements were also made where the field was not
available for JIT workloads. The loss of the field was deemed too
costly to performance when using an unordered map from C++ or Python's
dict to associated a code object with JIT-specific data objects.
References
==========