Document that people are expected to put a tuple in co_extra (#90)

2016-09-03 12:25:18 -07:00 · 2016-09-03 12:25:18 -07:00 · 02ec458dd8
parent e5d996d002
commit 02ec458dd8
1 changed files with 55 additions and 39 deletions
--- a/pep-0523.txt
+++ b/pep-0523.txt
@ -91,19 +91,39 @@ The ``co_extra`` will be ``NULL`` by default and will not be used by
 CPython itself. Third-party code is free to use the field as desired.
 Values stored in the field are expected to not be required in order
 for the code object to function, allowing the loss of the data of the
-field to be acceptable (this keeps the code object as immutable from
-a functionality point-of-view; this is slightly contentious and so is
-listed as an open issue in `Is co_extra needed?`_). The field will be
-freed like all other fields on ``PyCodeObject`` during deallocation
-using ``Py_XDECREF()``.
+field to be acceptable. The field will be freed like all other fields
+on ``PyCodeObject`` during deallocation using ``Py_XDECREF()``.

-It is not recommended that multiple users attempt to use the
-``co_extra`` simultaneously. While a dictionary could theoretically be
-set to the field and various users could use a key specific to the
-project, there is still the issue of key collisions as well as
-performance degradation from using a dictionary lookup on every frame
-evaluation. Users are expected to do a type check to make sure that
-the field has not been previously set by someone else.
+Code using the field is expected to always store a tuple in the field.
+This allows for multiple users of the field to not trample over each
+other while being as performant as possible. Typical usage of the
+field is expected to roughly follow the following pseudo-code::
+
+  if co_extra is None:
+    data = DataClass()
+    co_extra = (data,)
+  else:
+    assert isinstance(co_extra, tuple)
+    for x in co_extra:
+        if isinstance(x, DataClass):
+            data = x
+            break
+    else:
+        data = DataClass()
+        co_extra += (data,)
+
+Using a list was considered but was found to be less performant, and
+with a key use-case being JIT usage the performance consideration it
+was deemed more important to use a tuple than a list. A tuple also
+makes more sense semantically as the objects stored in the tuple will
+be heterogeneous.
+
+A dict was also considered, but once again performance was more
+important. While a dict will have constant overhead in looking up
+data, the overhead for the common case of a single object being stored
+in the data structure leads to a tuple having better performance
+characteristics (i.e. iterating a tuple of length 1 is faster than
+the overhead of hashing and looking up an object in a dict).


 Expanding ``PyInterpreterState``
@ -283,33 +303,6 @@ does require that the field not accidentally be cleared, else a crash
 may occur.


-Is co_extra needed?
-------------------
-
-While discussing this PEP at PyCon US 2016, some core developers
-expressed their worry of the ``co_extra`` field making code objects
-mutable. The thinking seemed to be that having a field that was
-mutated after the creation of the code object made the object seem
-mutable, even though no other aspect of code objects changed.
-
-The view of this PEP is that the `co_extra` field doesn't change the
-fact that code objects are immutable. The field is specified in this
-PEP as to not contain information required to make the code object
-usable, making it more of a caching field. It could be viewed as
-similar to the UTF-8 cache that string objects have internally;
-strings are still considered immutable even though they have a field
-that is conditionally set.
-
-The field is also not strictly necessary. While the field greatly
-simplifies attaching extra information to code objects, other options
-such as keeping a mapping of code object memory addresses to what
-would have been kept in ``co_extra`` or perhaps using a weak reference
-of the data on the code object and then iterating through the weak
-references until the attached data is found is possible. But obviously
-all of these solutions are not as simple or performant as adding the
-``co_extra`` field.
-
-
 Rejected Ideas
 ==============

@ -328,6 +321,29 @@ loss in functionality or in performance while minimizing the API
 changes required, the proposal was changed to its current form.


+Is co_extra needed?
+-------------------
+
+While discussing this PEP at PyCon US 2016, some core developers
+expressed their worry of the ``co_extra`` field making code objects
+mutable. The thinking seemed to be that having a field that was
+mutated after the creation of the code object made the object seem
+mutable, even though no other aspect of code objects changed.
+
+The view of this PEP is that the `co_extra` field doesn't change the
+fact that code objects are immutable. The field is specified in this
+PEP to not contain information required to make the code object
+usable, making it more of a caching field. It could be viewed as
+similar to the UTF-8 cache that string objects have internally;
+strings are still considered immutable even though they have a field
+that is conditionally set.
+
+Performance measurements were also made where the field was not
+available for JIT workloads. The loss of the field was deemed too
+costly to performance when using an unordered map from C++ or Python's
+dict to associated a code object with JIT-specific data objects.
+
+
 References
 ==========