diff --git a/pep-0550-hamt_vs_dict.png b/pep-0550-hamt_vs_dict.png index e77ff5dfd..d7917f7e8 100644 Binary files a/pep-0550-hamt_vs_dict.png and b/pep-0550-hamt_vs_dict.png differ diff --git a/pep-0550.rst b/pep-0550.rst index 75f64a779..eaab823f0 100644 --- a/pep-0550.rst +++ b/pep-0550.rst @@ -8,7 +8,7 @@ Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 -Post-History: 11-Aug-2017, 15-Aug-2017 +Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017 Abstract @@ -151,13 +151,13 @@ to be variants of a logical thread. In this specification, we will use the following terminology: -* **Local Context**, or LC, is a key/value mapping that stores the +* **Logical Context**, or LC, is a key/value mapping that stores the context of a logical thread. * **Execution Context**, or EC, is an OS-thread-specific dynamic - stack of Local Contexts. + stack of Logical Contexts. -* **Context Item**, or CI, is an object used to set and get values +* **Context Key**, or CK, is an object used to set and get values from the Execution Context. Please note that throughout the specification we use simple @@ -166,28 +166,28 @@ algorithms and data structures that we will use to implement the PEP are discussed in the `Implementation Strategy`_ section. -Context Item Object -------------------- +Context Key Object +------------------ -The ``sys.new_context_item(description)`` function creates a -new ``ContextItem`` object. The ``description`` parameter is a -``str``, explaining the nature of the context key for introspection -and debugging purposes. +The ``sys.new_context_key(name)`` function creates a new ``ContextKey`` +object. The ``name`` parameter is a ``str`` needed to render a +representation of ``ContextKey`` object for introspection and +debugging purposes. -``ContextItem`` objects have the following methods and attributes: +``ContextKey`` objects have the following methods and attributes: -* ``.description``: read-only description; +* ``.name``: read-only name; -* ``.set(o)`` method: set the value to ``o`` for the context item +* ``.set(o)`` method: set the value to ``o`` for the context key in the execution context. -* ``.get()`` method: return the current EC value for the context item. - Context items are initialized with ``None`` when created, so - this method call never fails. +* ``.get()`` method: return the current EC value for the context key. + Context keys return ``None`` when the key is missing, so the method + never fails. -The below is an example of how context items can be used:: +The below is an example of how context keys can be used:: - my_context = sys.new_context_item(description='mylib.context') + my_context = sys.new_context_key('my_context') my_context.set('spam') # Later, to access the value of my_context: @@ -198,29 +198,29 @@ Thread State and Multi-threaded code ------------------------------------ Execution Context is implemented on top of Thread-local Storage. -For every thread there is a separate stack of Local Contexts -- -mappings of ``ContextItem`` objects to their values in the LC. +For every thread there is a separate stack of Logical Contexts -- +mappings of ``ContextKey`` objects to their values in the LC. New threads always start with an empty EC. For CPython:: PyThreadState: execution_context: ExecutionContext([ - LocalContext({ci1: val1, ci2: val2, ...}), + LogicalContext({ci1: val1, ci2: val2, ...}), ... ]) -The ``ContextItem.get()`` and ``.set()`` methods are defined as +The ``ContextKey.get()`` and ``.set()`` methods are defined as follows (in pseudo-code):: - class ContextItem: + class ContextKey: def get(self): tstate = PyThreadState_Get() - for local_context in reversed(tstate.execution_context): - if self in local_context: - return local_context[self] + for logical_context in reversed(tstate.execution_context): + if self in logical_context: + return logical_context[self] return None @@ -228,7 +228,7 @@ follows (in pseudo-code):: tstate = PyThreadState_Get() if not tstate.execution_context: - tstate.execution_context = [LocalContext()] + tstate.execution_context = [LogicalContext()] tstate.execution_context[-1][self] = value @@ -238,7 +238,7 @@ be used as an alternative to ``threading.local()``:: def print_foo(): print(ci.get() or 'nothing') - ci = sys.new_context_item(description='test') + ci = sys.new_context_key('ci') ci.set('foo') # Will print "foo": @@ -266,13 +266,13 @@ over it. A few examples when this is needed: For these purposes we add a set of new APIs (they will be used in later sections of this specification): -* ``sys.new_local_context()``: create an empty ``LocalContext`` +* ``sys.new_logical_context()``: create an empty ``LogicalContext`` object. * ``sys.new_execution_context()``: create an empty ``ExecutionContext`` object. -* Both ``LocalContext`` and ``ExecutionContext`` objects are opaque +* Both ``LogicalContext`` and ``ExecutionContext`` objects are opaque to Python code, and there are no APIs to modify them. * ``sys.get_execution_context()`` function. The function returns a @@ -296,7 +296,7 @@ later sections of this specification): old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( - ec.local_contexts + [LocalContext()] + ec.logical_contexts + [LogicalContext()] ) try: @@ -304,12 +304,12 @@ later sections of this specification): finally: tstate.execution_context = old_ec - Any changes to Local Context by ``func`` will be ignored. + Any changes to Logical Context by ``func`` will be ignored. This allows to reuse one ``ExecutionContext`` object for multiple invocations of different functions, without them being able to affect each other's environment:: - ci = sys.new_context_item('example') + ci = sys.new_context_key('ci') ci.set('spam') def func(): @@ -325,24 +325,24 @@ later sections of this specification): # spam # spam -* ``sys.run_with_local_context(lc: LocalContext, func, *args, +* ``sys.run_with_logical_context(lc: LogicalContext, func, *args, **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution - context using the specified local context. + context using the specified logical context. - Any changes that ``func`` does to the local context will be + Any changes that ``func`` does to the logical context will be persisted in ``lc``. This behaviour is different from the ``run_with_execution_context()`` function, which always creates - a new throw-away local context. + a new throw-away logical context. In pseudo-code:: - def run_with_local_context(lc, func, *args, **kwargs): + def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( - old_ec.local_contexts + [lc] + old_ec.logical_contexts + [lc] ) try: @@ -352,7 +352,7 @@ later sections of this specification): Using the previous example:: - ci = sys.new_context_item('example') + ci = sys.new_context_key('ci') ci.set('spam') def func(): @@ -360,10 +360,10 @@ later sections of this specification): ci.set('ham') ec = sys.get_execution_context() - lc = sys.new_local_context() + lc = sys.new_logical_context() - sys.run_with_local_context(lc, func) - sys.run_with_local_context(lc, func) + sys.run_with_logical_context(lc, func) + sys.run_with_logical_context(lc, func) # Will print: # spam @@ -385,198 +385,84 @@ context for scheduled functions:: return super().submit(fn) -EC Semantics for Coroutines ---------------------------- - -Python :pep:`492` coroutines are used to implement cooperative -multitasking. For a Python end-user they are similar to threads, -especially when it comes to sharing resources or modifying -the global state. - -An event loop is needed to schedule coroutines. Coroutines that -are explicitly scheduled by the user are usually called Tasks. -When a coroutine is scheduled, it can schedule other coroutines using -an ``await`` expression. In async/await world, awaiting a coroutine -is equivalent to a regular function call in synchronous code. Thus, -Tasks are similar to threads. - -By drawing a parallel between regular multithreaded code and -async/await, it becomes apparent that any modification of the -execution context within one Task should be visible to all coroutines -scheduled within it. Any execution context modifications, however, -must not be visible to other Tasks executing within the same OS -thread. - - -Coroutine Object Modifications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -To achieve this, a small set of modifications to the coroutine object -is needed: - -* New ``cr_local_context`` attribute. This attribute is readable - and writable for Python code. - -* When a coroutine object is instantiated, its ``cr_local_context`` - is initialized with an empty Local Context. - -* Coroutine's ``.send()`` and ``.throw()`` methods are modified as - follows (in pseudo-C):: - - if coro.cr_local_context is not None: - tstate = PyThreadState_Get() - - tstate.execution_context.push(coro.cr_local_context) - - try: - # Perform the actual `Coroutine.send()` or - # `Coroutine.throw()` call. - return coro.send(...) - finally: - coro.cr_local_context = tstate.execution_context.pop() - else: - # Perform the actual `Coroutine.send()` or - # `Coroutine.throw()` call. - return coro.send(...) - -* When Python interpreter sees an ``await`` instruction, it inspects - the ``cr_local_context`` attribute of the coroutine that is about - to be awaited. For ``await coro``: - - * If ``coro.cr_local_context`` is an empty ``LocalContext`` object - that ``coro`` was created with, the interpreter will set - ``coro.cr_local_context`` to ``None``. - - * If ``coro.cr_local_context`` was modified by Python code, the - interpreter will leave it as is. - - This makes any changes to execution context made by nested coroutine - calls within a Task to be visible throughout the Task:: - - ci = sys.new_context_item('example') - - async def nested(): - ci.set('nested') - - async def main(): - ci.set('main') - print('before:', ci.get()) - await nested() - print('after:', ci.get()) - - # Will print: - # before: main - # after: nested - - Essentially, coroutines work with Execution Context items similarly - to threads, and ``await`` expression acts like a function call. - - This mechanism also works for ``yield from`` in generators decorated - with ``@types.coroutine`` or ``@asyncio.coroutine``, which are - called "generator-based coroutines" according to :pep:`492`, - and should be fully compatible with native async/await coroutines. - - -Tasks -^^^^^ - -In asynchronous frameworks like asyncio, coroutines are run by -an event loop, and need to be explicitly scheduled (in asyncio -coroutines are run by ``asyncio.Task``.) - -With the currently defined semantics, the interpreter makes -coroutines linked by an ``await`` expression share the same -Local Context. - -The interpreter, however, is not aware of the Task concept, and -cannot help with ensuring that new Tasks started in coroutines, -use the correct EC:: - - current_request = sys.new_context_item(description='request') - - async def child(): - print('current request:', repr(current_request.get())) - - async def handle_request(request): - current_request.set(request) - event_loop.create_task(child) - - run(top_coro()) - - # Will print: - # current_request: None - -To enable correct Execution Context propagation into Tasks, the -asynchronous framework needs to assist the interpreter: - -* When ``create_task`` is called, it should capture the current - execution context with ``sys.get_execution_context()`` and save it - on the Task object. - -* When the Task object runs its coroutine object, it should execute - ``.send()`` and ``.throw()`` methods within the captured - execution context, using the ``sys.run_with_execution_context()`` - function. - -With help from the asynchronous framework, the above snippet will -run correctly, and the ``child()`` coroutine will be able to access -the current request object through the ``current_request`` -Context Item. - - -Event Loop Callbacks -^^^^^^^^^^^^^^^^^^^^ - -Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` -should capture the current execution context with -``sys.get_execution_context()`` and execute callbacks -within it with ``sys.run_with_execution_context()``. - -This way the following code will work:: - - current_request = sys.new_context_item(description='request') - - def log(): - request = current_request.get() - print(request) - - async def request_handler(request): - current_request.set(request) - get_event_loop.call_soon(log) - - Generators ---------- -Generators in Python, while similar to Coroutines, are used in a -fundamentally different way. They are producers of data, and -they use ``yield`` expression to suspend/resume their execution. +Generators in Python are producers of data, and ``yield`` expressions +are used to suspend/resume their execution. When generators suspend +execution, their local state will "leak" to the outside code if they +store it in a TLS or in a global variable:: -A crucial difference between ``await coro`` and ``yield value`` is -that the former expression guarantees that the ``coro`` will be -executed fully, while the latter is producing ``value`` and -suspending the generator until it gets iterated again. + local = threading.local() -Generators, similarly to coroutines, have a ``gi_local_context`` -attribute, which is set to an empty Local Context when created. + def gen(): + old_x = local.x + local.x = 'spam' + try: + yield + ... + yield + finally: + local.x = old_x -Contrary to coroutines though, ``yield from o`` expression in -generators (that are not generator-based coroutines) is semantically -equivalent to ``for v in o: yield v``, therefore the interpreter does -not attempt to control their ``gi_local_context``. +The above code will not work as many Python users expect it to work. +A simple ``next(gen())`` will set ``local.x`` to "spam" and it will +never be reset back to its original value. + +One of the goals of this proposal is to provide a mechanism to isolate +local state in generators. + + +Generator Object Modifications +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To achieve this, we make a small set of modifications to the +generator object: + +* New ``__logical_context__`` attribute. This attribute is readable + and writable for Python code. + +* When a generator object is instantiated its ``__logical_context__`` + is initialized with an empty ``LogicalContext``. + +* Generator's ``.send()`` and ``.throw()`` methods are modified as + follows (in pseudo-C):: + + if gen.__logical_context__ is not NULL: + tstate = PyThreadState_Get() + + tstate.execution_context.push(gen.__logical_context__) + + try: + # Perform the actual `Generator.send()` or + # `Generator.throw()` call. + return gen.send(...) + finally: + gen.__logical_context__ = tstate.execution_context.pop() + else: + # Perform the actual `Generator.send()` or + # `Generator.throw()` call. + return gen.send(...) + + If a generator has a non-NULL ``__logical_context__``, it will + be pushed to the EC and, therefore, generators will use it + to accumulate their local state. + + If a generator has no ``__logical_context__``, generators will + will use whatever LC they are being run in. EC Semantics for Generators ^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Every generator object has its own Local Context that stores +Every generator object has its own Logical Context that stores only its own local modifications of the context. When a generator -is being iterated, its local context will be put in the EC stack +is being iterated, its logical context will be put in the EC stack of the current thread. This means that the generator will be able -to see access items from the surrounding context:: +to access keys from the surrounding context:: - local = sys.new_context_item("local") - global = sys.new_context_item("global") + local = sys.new_context_key("local") + global = sys.new_context_key("global") def generator(): local.set('inside gen:') @@ -601,7 +487,7 @@ to see access items from the surrounding context:: Any changes to the EC in nested generators are invisible to the outer generator:: - local = sys.new_context_item("local") + local = sys.new_context_key("local") def inner_gen(): local.set('spam') @@ -621,13 +507,13 @@ generator:: Running generators without LC ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Similarly to coroutines, generators with ``gi_local_context`` -set to ``None`` simply use the outer Local Context. +If ``__logical_context__`` is set to ``None`` for a generator, +it will simply use the outer Logical Context. The ``@contextlib.contextmanager`` decorator uses this mechanism to allow its generator to affect the EC:: - item = sys.new_context_item('test') + item = sys.new_context_key('item') @contextmanager def context(x): @@ -659,35 +545,210 @@ imposed on generators with a regular Python iterator class:: class Gen: def __init__(self): - self.local_context = sys.new_local_context() + self.logical_context = sys.new_logical_context() def __iter__(self): return self def __next__(self): - return sys.run_with_local_context( - self.local_context, self._next_impl) + return sys.run_with_logical_context( + self.logical_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... +yield from in generator-based coroutines +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Prior to :pep:`492`, ``yield from`` was used as one of the mechanisms +to implement coroutines in Python. :pep:`492` is built on top +of ``yield from`` machinery, and it is even possible to make a +generator compatible with async/await code by decorating it with +``@types.coroutine`` (or ``@asyncio.coroutine``). + +Generators decorated with these decorators follow the Execution +Context semantics described below in the +`EC Semantics for Coroutines`_ section below. + + +yield from in generators +^^^^^^^^^^^^^^^^^^^^^^^^ + +Another ``yield from`` use is to compose generators. Essentially, +``yield from gen()`` is a better version of +``for v in gen(): yield v`` (read more about many subtle details +in :pep:`380`.) + +A crucial difference between ``await coro`` and ``yield value`` is +that the former expression guarantees that the ``coro`` will be +executed fully, while the latter is producing ``value`` and +suspending the generator until it gets iterated again. + +Therefore, this proposal does not special case ``yield from`` +expression for regular generators:: + + item = sys.new_context_key('item') + + def nested(): + assert item.get() == 'outer' + item.set('inner') + yield + + def outer(): + item.set('outer') + yield from nested() + assert item.get() == 'outer' + + +EC Semantics for Coroutines +--------------------------- + +Python :pep:`492` coroutines are used to implement cooperative +multitasking. For a Python end-user they are similar to threads, +especially when it comes to sharing resources or modifying +the global state. + +An event loop is needed to schedule coroutines. Coroutines that +are explicitly scheduled by the user are usually called Tasks. +When a coroutine is scheduled, it can schedule other coroutines using +an ``await`` expression. In async/await world, awaiting a coroutine +is equivalent to a regular function call in synchronous code. Thus, +Tasks are similar to threads. + +By drawing a parallel between regular multithreaded code and +async/await, it becomes apparent that any modification of the +execution context within one Task should be visible to all coroutines +scheduled within it. Any execution context modifications, however, +must not be visible to other Tasks executing within the same OS +thread. + +Similar to generators, coroutines have the new ``__logical_context__`` +attribute and same implementations of ``.send()`` and ``.throw()`` +methods. The key difference is that coroutines start with +``__logical_context__`` set to ``NULL`` (generators start with +an empty ``LogicalContext``.) + +This means that it is expected that the asynchronous library and +its Task abstraction will control how exactly coroutines interact +with Execution Context. + + +Tasks +^^^^^ + +In asynchronous frameworks like asyncio, coroutines are run by +an event loop, and need to be explicitly scheduled (in asyncio +coroutines are run by ``asyncio.Task``.) + +To enable correct Execution Context propagation into Tasks, the +asynchronous framework needs to assist the interpreter: + +* When ``create_task`` is called, it should capture the current + execution context with ``sys.get_execution_context()`` and save it + on the Task object. + +* The ``__logical_context__`` of the wrapped coroutine should be + initialized to a new empty logical context. + +* When the Task object runs its coroutine object, it should execute + ``.send()`` and ``.throw()`` methods within the captured + execution context, using the ``sys.run_with_execution_context()`` + function. + +For ``asyncio.Task``:: + + class Task: + def __init__(self, coro): + ... + self.exec_context = sys.get_execution_context() + coro.__logical_context__ = sys.new_logical_context() + + def _step(self, val): + ... + sys.run_with_execution_context( + self.exec_context, + self.coro.send, val) + ... + +This makes any changes to execution context made by nested coroutine +calls within a Task to be visible throughout the Task:: + + ci = sys.new_context_key('ci') + + async def nested(): + ci.set('nested') + + async def main(): + ci.set('main') + print('before:', ci.get()) + await nested() + print('after:', ci.get()) + + asyncio.get_event_loop().run_until_complete(main()) + + # Will print: + # before: main + # after: nested + +New Tasks, started within another Task, will run in the correct +execution context too:: + + current_request = sys.new_context_key('current_request') + + async def child(): + print('current request:', repr(current_request.get())) + + async def handle_request(request): + current_request.set(request) + event_loop.create_task(child) + + run(top_coro()) + + # Will print: + # current_request: None + +The above snippet will run correctly, and the ``child()`` +coroutine will be able to access the current request object +through the ``current_request`` Context Key. + +Any of the above examples would work if one the coroutines +was a generator decorated with ``@asyncio.coroutine``. + + +Event Loop Callbacks +^^^^^^^^^^^^^^^^^^^^ + +Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` +should capture the current execution context with +``sys.get_execution_context()`` and execute callbacks +within it with ``sys.run_with_execution_context()``. + +This way the following code will work:: + + current_request = sys.new_context_key('current_request') + + def log(): + request = current_request.get() + print(request) + + async def request_handler(request): + current_request.set(request) + get_event_loop.call_soon(log) + + Asynchronous Generators ----------------------- Asynchronous Generators (AG) interact with the Execution Context similarly to regular generators. -They have an ``ag_local_context`` attribute, which, similarly to +They have an ``__logical_context__`` attribute, which, similarly to regular generators, can be set to ``None`` to make them use the outer -Local Context. This is used by the new +Logical Context. This is used by the new ``contextlib.asynccontextmanager`` decorator. -The EC support of ``await`` expression is implemented using the same -approach as in coroutines, see the `Coroutine Object Modifications`_ -section. - Greenlets --------- @@ -718,14 +779,14 @@ implementation details, but at the same time provide enough control over EC and LC to re-implement all of Python built-in objects in pure Python. -1. ``sys.new_context_item(description='...')``: create a - ``ContextItem`` object used to access/set values in EC. +1. ``sys.new_context_key(name: str='...')``: create a + ``ContextKey`` object used to access/set values in EC. -2. ``ContextItem``: +2. ``ContextKey``: - * ``.description``: read-only attribute. - * ``.get()``: return the current value for the item. - * ``.set(o)``: set the current value in the EC for the item. + * ``.name``: read-only attribute. + * ``.get()``: return the current value for the key. + * ``.set(o)``: set the current value in the EC for the key. 3. ``sys.get_execution_context()``: return the current ``ExecutionContext``. @@ -733,31 +794,32 @@ in pure Python. 4. ``sys.new_execution_context()``: create a new empty ``ExecutionContext``. -5. ``sys.new_local_context()``: create a new empty ``LocalContext``. +5. ``sys.new_logical_context()``: create a new empty + ``LogicalContext``. 6. ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)``. -7. ``sys.run_with_local_context(lc:LocalContext, +7. ``sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs)``. C API ----- -1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a - ``PyContextItem`` object. +1. ``PyContextKey * PyContext_NewKey(char *desc)``: create a + ``PyContextKey`` object. -2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the - current value for the context item. +2. ``PyObject * PyContext_GetKey(PyContextKey *)``: get the + current value for the context key. -3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set - the current value for the context item. +3. ``int PyContext_SetKey(PyContextKey *, PyObject *)``: set + the current value for the context key. -4. ``PyLocalContext * PyLocalContext_New()``: create a new empty - ``PyLocalContext``. +4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty + ``PyLogicalContext``. -5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty +5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the @@ -766,41 +828,41 @@ C API 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. -8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *, - PyLocalContext *)``: allows to implement - ``sys.run_with_local_context`` Python API. +8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, + PyLogicalContext *)``: allows to implement + ``sys.run_with_logical_context`` Python API. Implementation Strategy ======================= -LocalContext is a Weak Key Mapping ----------------------------------- +LogicalContext is a Weak Key Mapping +------------------------------------ -Using a weak key mapping for ``LocalContext`` implementation +Using a weak key mapping for ``LogicalContext`` implementation enables the following properties with regards to garbage collection: -* ``ContextItem`` objects are strongly-referenced only from the +* ``ContextKey`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their garbage collection. * Values put in the Execution Context are guaranteed to be kept - alive while there is a ``ContextItem`` key referencing them in + alive while there is a ``ContextKey`` key referencing them in the thread. -* If a ``ContextItem`` is garbage collected, all of its values will +* If a ``ContextKey`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning - up all values bound to all Context Items in the thread. + up all values bound to all Context Keys in the thread. -ContextItem.get() Cache ------------------------ +ContextKey.get() Cache +---------------------- We can add three new fields to ``PyThreadState`` and ``PyInterpreterState`` structs: @@ -810,28 +872,24 @@ We can add three new fields to ``PyThreadState`` and ``PyInterpreterState`` and increment it when a new thread state is created.) -* ``uint64_t PyInterpreterState->context_item_deallocs``: every time - a ``ContextItem`` is GCed, all Execution Contexts in all threads - will lose track of it. ``context_item_deallocs`` will simply - count all ``ContextItem`` deallocations. - -* ``uint64_t PyThreadState->execution_context_ver``: every time - a new item is set, or an existing item is updated, or the stack - of execution contexts is changed in the thread, we increment this - counter. +* ``uint64_t ContextKey->version``: every time the key is updated + in any logical context or thread, this key will be incremented. The above two fields allow implementing a fast cache path in -``ContextItem.get()``, in pseudo-code:: +``ContextKey.get()``, in pseudo-code:: + + class ContextKey: + + def set(self, value): + ... # implementation + self.version += 1 - class ContextItem: def get(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and - self.last_ver == tstate.execution_context_ver - self.last_deallocs == - tstate.iterp.context_item_deallocs): + self.last_version == self.version): return self.last_value value = None @@ -842,14 +900,13 @@ The above two fields allow implementing a fast cache path in self.last_value = value # borrowed ref self.last_tstate_id = tstate.unique_id - self.last_ver = tstate.execution_context_ver - self.last_deallocs = tstate.interp.context_item_deallocs + self.last_version = self.version return value Note that ``last_value`` is a borrowed reference. The assumption -is that if all counters tests are OK, the object will be alive. -This allows the CI values to be properly GCed. +is that if current thread and key version tests are OK, the object +will be alive. This allows the CK values to be properly GCed. This is similar to the trick that decimal C implementation uses for caching the current decimal context, and will have the same @@ -857,8 +914,8 @@ performance characteristics, but available to all Execution Context users. -Approach #1: Use a dict for LocalContext ----------------------------------------- +Approach #1: Use a dict for LogicalContext +------------------------------------------ The straightforward way of implementing the proposed EC mechanisms is to create a ``WeakKeyDict`` on top of Python @@ -870,21 +927,21 @@ pre-allocation optimizations). This approach will have the following runtime complexity: -* O(M) for ``ContextItem.get()``, where ``M`` is the number of - Local Contexts in the stack. +* O(M) for ``ContextKey.get()``, where ``M`` is the number of + Logical Contexts in the stack. - It is important to note that ``ContextItem.get()`` will implement + It is important to note that ``ContextKey.get()`` will implement a cache making the operation O(1) for packages like ``decimal`` and ``numpy``. -* O(1) for ``ContextItem.set()``. +* O(1) for ``ContextKey.set()``. * O(N) for ``sys.get_execution_context()``, where ``N`` is the - total number of items in the current **execution** context. + total number of keys/values in the current **execution** context. -Approach #2: Use HAMT for LocalContext --------------------------------------- +Approach #2: Use HAMT for LogicalContext +---------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. @@ -896,28 +953,28 @@ which is essentially O(1) for relatively small mappings `Appendix: HAMT Performance`_ section.) In this approach we use the same design of the ``ExecutionContext`` -as in Approach #1, but we will use HAMT backed weak key Local Context +as in Approach #1, but we will use HAMT backed weak key Logical Context implementation. With that we will have the following runtime complexity: -* O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``, - where ``M`` is the number of Local Contexts in the stack, - and ``N`` is the number of items in the EC. The operation will - essentially be O(M), because execution contexts are normally not - expected to have more than a few dozen of items. +* O(M * log\ :sub:`32`\ N) for ``ContextKey.get()``, + where ``M`` is the number of Logical Contexts in the stack, + and ``N`` is the number of keys/values in the EC. The operation + will essentially be O(M), because execution contexts are normally + not expected to have more than a few dozen of keys/values. - (``ContextItem.get()`` will have the same caching mechanism as in + (``ContextKey.get()`` will have the same caching mechanism as in Approach #1.) -* O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the - number of items in the current **local** context. This will +* O(log\ :sub:`32`\ N) for ``ContextKey.set()`` where ``N`` is the + number of keys/values in the current **logical** context. This will essentially be an O(1) operation most of the time. * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where - ``N`` is the total number of items in the current **execution** - context. + ``N`` is the total number of keys/values in the current + **execution** context. -Essentially, using HAMT for Local Contexts instead of Python dicts, +Essentially, using HAMT for Logical Contexts instead of Python dicts, allows to bring down the complexity of ``sys.get_execution_context()`` from O(N) to O(log\ :sub:`32`\ N) because of the more efficient merge algorithm. @@ -927,21 +984,26 @@ Approach #3: Use HAMT and Immutable Linked List ----------------------------------------------- We can make an alternative ``ExecutionContext`` design by using -a linked list. Each ``LocalContext`` in the ``ExecutionContext`` +a linked list. Each ``LogicalContext`` in the ``ExecutionContext`` object will be wrapped in a linked-list node. -``LocalContext`` objects will use an HAMT backed weak key +``LogicalContext`` objects will use an HAMT backed weak key implementation described in the Approach #2. -Every modification to the current ``LocalContext`` will produce a +Every modification to the current ``LogicalContext`` will produce a new version of it, which will be wrapped in a **new linked list node**. Essentially this means, that ``ExecutionContext`` is an -immutable forest of ``LocalContext`` objects, and can be safely +immutable forest of ``LogicalContext`` objects, and can be safely copied by reference in ``sys.get_execution_context()`` (eliminating the expensive "merge" operation.) -With this approach, ``sys.get_execution_context()`` will be an -**O(1) operation**. +With this approach, ``sys.get_execution_context()`` will be a +constant time **O(1) operation**. + +In case we decide to apply additional optimizations such as +flattening ECs with too many Logical Contexts, HAMT-backed +immutable mapping will have a O(log\ :sub:`32`\ N) merge +complexity. Summary @@ -950,7 +1012,7 @@ Summary We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance. -`ContextItem.get() Cache`_ enables fast retrieval of context items +`ContextKey.get() Cache`_ enables fast retrieval of context keys for performance critical libraries like decimal and numpy. Fast ``sys.get_execution_context()`` enables efficient management @@ -984,12 +1046,15 @@ The following code:: class Context: + def __init__(self): + self.key = new_context_key('key') + def __enter__(self): - self.old_x = get_execution_context_item('x') - set_execution_context_item('x', 'something') + self.old_x = self.key.get() + self.key.set('something') def __exit__(self, *err): - set_execution_context_item('x', self.old_x) + self.key.set(self.old_x) would become this:: @@ -1016,7 +1081,7 @@ that uses context managers, and will notably complicate the interpreter implementation. :pep:`521` also does not provide any mechanism to propagate state -in a local context, like storing a request object in an HTTP request +in a logical context, like storing a request object in an HTTP request handler to have better logging. Nor does it solve the leaking state problem for greenlet/gevent. @@ -1032,6 +1097,20 @@ trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. +Should we update sys.displayhook and other APIs to use EC? +---------------------------------------------------------- + +APIs like redirecting stdout by overwriting ``sys.stdout``, or +specifying new exception display hooks by overwriting the +``sys.displayhook`` function are affecting the whole Python process +**by design**. Their users assume that the effect of changing +them will be visible across OS threads. Therefore we cannot +just make these APIs to use the new Execution Context. + +That said we think it is possible to design new APIs that will +be context aware, but that is outside of the scope of this PEP. + + Backwards Compatibility ======================= @@ -1041,10 +1120,14 @@ This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance ========================== -First, while investigating possibilities of how to implement an -immutable mapping in CPython, we were able to improve the efficiency -of ``dict.copy()`` up to 5 times: [4]_. All benchmarks in this -section were run against the optimized dict. +While investigating possibilities of how to implement an immutable +mapping in CPython, we were able to improve the efficiency +of ``dict.copy()`` up to 5 times: [4]_. One caveat is that the +improved ``dict.copy()`` does not resize the dict, which is a +necessary thing to do when items get deleted from the dict. +Which means that we can make ``dict.copy()`` faster for only dicts +that don't need to be resized, and the ones that do, will use +a slower version. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. @@ -1055,9 +1138,17 @@ it in CPython [7]_. Figure 1. Benchmark code can be found here: [9]_. -Figure 1 shows that HAMT indeed displays O(1) performance for all -benchmarked dictionary sizes. For dictionaries with less than 100 -items, HAMT is a bit slower than Python dict/shallow copy. +The chart illustrates the following: + +* HAMT displays near O(1) performance for all benchmarked + dictionary sizes. + +* If we can use the optimized ``dict.copy()`` implementation ([4]_), + the performance of immutable mapping implemented with Python + ``dict`` is good up until 100 items. + +* A dict with an unoptimized ``dict.copy()`` becomes very slow + around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center @@ -1072,18 +1163,14 @@ considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. -There is a limitation of Python ``dict`` design which makes HAMT -the preferred choice for immutable mapping implementation: -dicts need to be resized periodically, and resize is expensive. -The ``dict.copy()`` optimization we were able to do (see [4]_) will -only work for dicts that had no deleted items. Dicts that had -deleted items need to be resized during ``copy()``, which makes it -much slower. +The bottom line is that it is possible to imagine a scenario when +an application has more than 100 items in the Execution Context, in +which case the dict-backed implementation of an immutable mapping +becomes a subpar choice. -Because adding and deleting items from LocalContext is a very common -operation, we would not be able to always use the optimized -``dict.copy()`` for LocalContext, frequently resorting to use the -slower version of it. +HAMT on the other hand guarantees that its ``set()``, ``get()``, +and ``merge()`` operations will execute in O(log\ :sub:`32`\ ) time, +which means it is a more future proof solution. Acknowledgments @@ -1092,7 +1179,7 @@ Acknowledgments I thank Elvis Pranskevichus and Victor Petrovykh for countless discussions around the topic and PEP proof reading and edits. -Thanks to Nathaniel Smith for proposing the ``ContextItem`` design +Thanks to Nathaniel Smith for proposing the ``ContextKey`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. @@ -1107,12 +1194,36 @@ Version History 1. Posted on 11-Aug-2017, view it here: [20]_. +2. Posted on 15-Aug-2017, view it here: [21]_. + The fundamental limitation that caused a complete redesign of the - PEP was that it was not possible to implement an iterator that - would interact with the EC in the same way as generators + first version was that it was not possible to implement an iterator + that would interact with the EC in the same way as generators (see [19]_.) -2. Posted on 15-Aug-2017: the current version. + Version 2 was a complete rewrite, introducing new terminology + (Local Context, Execution Context, Context Item) and new APIs. + +3. Posted on 18-Aug-2017: the current version. + + Updates: + + * Local Context was renamed to Logical Context. The term "local" + was ambiguous and conflicted with local name scopes. + + * Context Item was renamed to Context Key, see the thread with Nick + Coghlan, Stefan Krah, and Yury Selivanov [22]_ for details. + + * Context Item get cache design was adjusted, per Nathaniel Smith's + idea in [24]_. + + * Coroutines are created without a Logical Context; ceval loop + no longer needs to special case the ``await`` expression + (proposed by Nick Coghlan in [23]_.) + + * `Appendix: HAMT Performance`_ section was updated with more + details about the proposed ``dict.copy()`` optimization and + its limitations. References @@ -1158,6 +1269,14 @@ References .. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst +.. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst + +.. [22] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html + +.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html + +.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html + Copyright =========