Check in the memory model PEP. (#849)
This old draft PEP came up in an in-person conversation. It turns out to be almost vanished from the internet; the only source of it I could find is on Launchpad: https://code.launchpad.net/~jyasskin/python/memory-model-pep So, I'm assigning it a number, marking it Withdrawn, and checking it in so as not to lose a historical record.
This commit is contained in:
parent
373d2ba05b
commit
c02d9f8f72
|
@ -0,0 +1,841 @@
|
|||
PEP: 583
|
||||
Title: A Concurrency Memory Model for Python
|
||||
Version: $Revision: 56116 $
|
||||
Last-Modified: $Date: 2007-06-28 12:53:41 -0700 (Thu, 28 Jun 2007) $
|
||||
Author: Jeffrey Yasskin <jyasskin@google.com>
|
||||
Status: Withdrawn
|
||||
Type: Informational
|
||||
Content-Type: text/x-rst
|
||||
Created: 22-Mar-2008
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP describes how Python programs may behave in the presence of
|
||||
concurrent reads and writes to shared variables from multiple threads.
|
||||
We use a *happens before* relation to define when variable accesses
|
||||
are ordered or concurrent. Nearly all programs should simply use locks
|
||||
to guard their shared variables, and this PEP highlights some of the
|
||||
strange things that can happen when they don't, but programmers often
|
||||
assume that it's ok to do "simple" things without locking, and it's
|
||||
somewhat unpythonic to let the language surprise them. Unfortunately,
|
||||
avoiding surprise often conflicts with making Python run quickly, so
|
||||
this PEP tries to find a good tradeoff between the two.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
So far, we have 4 major Python implementations -- CPython, Jython_,
|
||||
IronPython_, and PyPy_ -- as well as lots of minor ones. Some of
|
||||
these already run on platforms that do aggressive optimizations. In
|
||||
general, these optimizations are invisible within a single thread of
|
||||
execution, but they can be visible to other threads executing
|
||||
concurrently. CPython currently uses a `GIL`_ to ensure that other
|
||||
threads see the results they expect, but this limits it to a single
|
||||
processor. Jython and IronPython run on Java's or .NET's threading
|
||||
system respectively, which allows them to take advantage of more cores
|
||||
but can also show surprising values to other threads.
|
||||
|
||||
.. _Jython: http://www.jython.org/
|
||||
|
||||
.. _IronPython: http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython
|
||||
|
||||
.. _PyPy: http://codespeak.net/pypy/dist/pypy/doc/home.html
|
||||
|
||||
.. _GIL: http://en.wikipedia.org/wiki/Global_Interpreter_Lock
|
||||
|
||||
So that threaded Python programs continue to be portable between
|
||||
implementations, implementers and library authors need to agree on
|
||||
some ground rules.
|
||||
|
||||
|
||||
A couple definitions
|
||||
====================
|
||||
|
||||
Variable
|
||||
A name that refers to an object. Variables are generally
|
||||
introduced by assigning to them, and may be destroyed by passing
|
||||
them to ``del``. Variables are fundamentally mutable, while
|
||||
objects may not be. There are several varieties of variables:
|
||||
module variables (often called "globals" when accessed from within
|
||||
the module), class variables, instance variables (also known as
|
||||
fields), and local variables. All of these can be shared between
|
||||
threads (the local variables if they're saved into a closure).
|
||||
The object in which the variables are scoped notionally has a
|
||||
``dict`` whose keys are the variables' names.
|
||||
|
||||
Object
|
||||
A collection of instance variables (a.k.a. fields) and methods.
|
||||
At least, that'll do for this PEP.
|
||||
|
||||
Program Order
|
||||
The order that actions (reads and writes) happen within a thread,
|
||||
which is very similar to the order they appear in the text.
|
||||
|
||||
Conflicting actions
|
||||
Two actions on the same variable, at least one of which is a write.
|
||||
|
||||
Data race
|
||||
A situation in which two conflicting actions happen at the same
|
||||
time. "The same time" is defined by the memory model.
|
||||
|
||||
|
||||
Two simple memory models
|
||||
========================
|
||||
|
||||
Before talking about the details of data races and the surprising
|
||||
behaviors they produce, I'll present two simple memory models. The
|
||||
first is probably too strong for Python, and the second is probably
|
||||
too weak.
|
||||
|
||||
|
||||
Sequential Consistency
|
||||
----------------------
|
||||
|
||||
In a sequentially-consistent concurrent execution, actions appear to
|
||||
happen in a global total order with each read of a particular variable
|
||||
seeing the value written by the last write that affected that
|
||||
variable. The total order for actions must be consistent with the
|
||||
program order. A program has a data race on a given input when one of
|
||||
its sequentially consistent executions puts two conflicting actions
|
||||
next to each other.
|
||||
|
||||
This is the easiest memory model for humans to understand, although it
|
||||
doesn't eliminate all confusion, since operations can be split in odd
|
||||
places.
|
||||
|
||||
|
||||
Happens-before consistency
|
||||
--------------------------
|
||||
|
||||
The program contains a collection of *synchronization actions*, which
|
||||
in Python currently include lock acquires and releases and thread
|
||||
starts and joins. Synchronization actions happen in a global total
|
||||
order that is consistent with the program order (they don't *have* to
|
||||
happen in a total order, but it simplifies the description of the
|
||||
model). A lock release *synchronizes with* all later acquires of the
|
||||
same lock. Similarly, given ``t = threading.Thread(target=worker)``:
|
||||
|
||||
* A call to ``t.start()`` synchronizes with the first statement in
|
||||
``worker()``.
|
||||
|
||||
* The return from ``worker()`` synchronizes with the return from
|
||||
``t.join()``.
|
||||
|
||||
* If the return from ``t.start()`` happens before (see below) a call
|
||||
to ``t.isAlive()`` that returns ``False``, the return from
|
||||
``worker()`` synchronizes with that call.
|
||||
|
||||
We call the source of the synchronizes-with edge a *release* operation
|
||||
on the relevant variable, and we call the target an *acquire* operation.
|
||||
|
||||
The *happens before* order is the transitive closure of the program
|
||||
order with the synchronizes-with edges. That is, action *A* happens
|
||||
before action *B* if:
|
||||
|
||||
* A falls before B in the program order (which means they run in the
|
||||
same thread)
|
||||
* A synchronizes with B
|
||||
* You can get to B by following happens-before edges from A.
|
||||
|
||||
An execution of a program is happens-before consistent if each read
|
||||
*R* sees the value of a write *W* to the same variable such that:
|
||||
|
||||
* *R* does not happen before *W*, and
|
||||
* There is no other write *V* that overwrote *W* before *R* got a
|
||||
chance to see it. (That is, it can't be the case that *W* happens
|
||||
before *V* happens before *R*.)
|
||||
|
||||
You have a data race if two conflicting actions aren't related by
|
||||
happens-before.
|
||||
|
||||
|
||||
An example
|
||||
''''''''''
|
||||
|
||||
Let's use the rules from the happens-before model to prove that the
|
||||
following program prints "[7]"::
|
||||
|
||||
class Queue:
|
||||
def __init__(self):
|
||||
self.l = []
|
||||
self.cond = threading.Condition()
|
||||
|
||||
def get():
|
||||
with self.cond:
|
||||
while not self.l:
|
||||
self.cond.wait()
|
||||
ret = self.l[0]
|
||||
self.l = self.l[1:]
|
||||
return ret
|
||||
|
||||
def put(x):
|
||||
with self.cond:
|
||||
self.l.append(x)
|
||||
self.cond.notify()
|
||||
|
||||
myqueue = Queue()
|
||||
|
||||
def worker1():
|
||||
x = [7]
|
||||
myqueue.put(x)
|
||||
|
||||
def worker2():
|
||||
y = myqueue.get()
|
||||
print y
|
||||
|
||||
thread1 = threading.Thread(target=worker1)
|
||||
thread2 = threading.Thread(target=worker2)
|
||||
thread2.start()
|
||||
thread1.start()
|
||||
|
||||
1. Because ``myqueue`` is initialized in the main thread before
|
||||
``thread1`` or ``thread2`` is started, that initialization happens
|
||||
before ``worker1`` and ``worker2`` begin running, so there's no way
|
||||
for either to raise a NameError, and both ``myqueue.l`` and
|
||||
``myqueue.cond`` are set to their final objects.
|
||||
|
||||
2. The initialization of ``x`` in ``worker1`` happens before it calls
|
||||
``myqueue.put()``, which happens before it calls
|
||||
``myqueue.l.append(x)``, which happens before the call to
|
||||
``myqueue.cond.release()``, all because they run in the same
|
||||
thread.
|
||||
|
||||
3. In ``worker2``, ``myqueue.cond`` will be released and re-acquired
|
||||
until ``myqueue.l`` contains a value (``x``). The call to
|
||||
``myqueue.cond.release()`` in ``worker1`` happens before that last
|
||||
call to ``myqueue.cond.acquire()`` in ``worker2``.
|
||||
|
||||
4. That last call to ``myqueue.cond.acquire()`` happens before
|
||||
``myqueue.get()`` reads ``myqueue.l``, which happens before
|
||||
``myqueue.get()`` returns, which happens before ``print y``, again
|
||||
all because they run in the same thread.
|
||||
|
||||
5. Because happens-before is transitive, the list initially stored in
|
||||
``x`` in thread1 is initialized before it is printed in thread2.
|
||||
|
||||
Usually, we wouldn't need to look all the way into a thread-safe
|
||||
queue's implementation in order to prove that uses were safe. Its
|
||||
interface would specify that puts happen before gets, and we'd reason
|
||||
directly from that.
|
||||
|
||||
|
||||
.. _hazards:
|
||||
|
||||
Surprising behaviors with races
|
||||
===============================
|
||||
|
||||
Lots of strange things can happen when code has data races. It's easy
|
||||
to avoid all of these problems by just protecting shared variables
|
||||
with locks. This is not a complete list of race hazards; it's just a
|
||||
collection that seem relevant to Python.
|
||||
|
||||
In all of these examples, variables starting with ``r`` are local
|
||||
variables, and other variables are shared between threads.
|
||||
|
||||
|
||||
Zombie values
|
||||
-------------
|
||||
|
||||
This example comes from the `Java memory model`_:
|
||||
|
||||
Initially ``p is q`` and ``p.x == 0``.
|
||||
|
||||
========== ========
|
||||
Thread 1 Thread 2
|
||||
========== ========
|
||||
r1 = p r6 = p
|
||||
r2 = r1.x r6.x = 3
|
||||
r3 = q
|
||||
r4 = r3.x
|
||||
r5 = r1.x
|
||||
========== ========
|
||||
|
||||
Can produce ``r2 == r5 == 0`` but ``r4 == 3``, proving that
|
||||
``p.x`` went from 0 to 3 and back to 0.
|
||||
|
||||
A good compiler would like to optimize out the redundant load of
|
||||
``p.x`` in initializing ``r5`` by just re-using the value already
|
||||
loaded into ``r2``. We get the strange result if thread 1 sees memory
|
||||
in this order:
|
||||
|
||||
========== ======== ============================================
|
||||
Evaluation Computes Why
|
||||
========== ======== ============================================
|
||||
r1 = p
|
||||
r2 = r1.x r2 == 0
|
||||
r3 = q r3 is p
|
||||
p.x = 3 Side-effect of thread 2
|
||||
r4 = r3.x r4 == 3
|
||||
r5 = r2 r5 == 0 Optimized from r5 = r1.x because r2 == r1.x.
|
||||
========== ======== ============================================
|
||||
|
||||
|
||||
Inconsistent Orderings
|
||||
----------------------
|
||||
|
||||
From `N2177: Sequential Consistency for Atomics`_, and also known as
|
||||
Independent Read of Independent Write (IRIW).
|
||||
|
||||
Initially, ``a == b == 0``.
|
||||
|
||||
======== ======== ======== ========
|
||||
Thread 1 Thread 2 Thread 3 Thread 4
|
||||
======== ======== ======== ========
|
||||
r1 = a r3 = b a = 1 b = 1
|
||||
r2 = b r4 = a
|
||||
======== ======== ======== ========
|
||||
|
||||
We may get ``r1 == r3 == 1`` and ``r2 == r4 == 0``, proving both
|
||||
that ``a`` was written before ``b`` (thread 1's data), and that
|
||||
``b`` was written before ``a`` (thread 2's data). See `Special
|
||||
Relativity
|
||||
<http://en.wikipedia.org/wiki/Relativity_of_simultaneity>`__ for a
|
||||
real-world example.
|
||||
|
||||
This can happen if thread 1 and thread 3 are running on processors
|
||||
that are close to each other, but far away from the processors that
|
||||
threads 2 and 4 are running on and the writes are not being
|
||||
transmitted all the way across the machine before becoming visible to
|
||||
nearby threads.
|
||||
|
||||
Neither acquire/release semantics nor explicit memory barriers can
|
||||
help with this. Making the orders consistent without locking requires
|
||||
detailed knowledge of the architecture's memory model, but Java
|
||||
requires it for volatiles so we could use documentation aimed at its
|
||||
implementers.
|
||||
|
||||
.. _`N2177: Sequential Consistency for Atomics`:
|
||||
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2177.html
|
||||
|
||||
|
||||
A happens-before race that's not a sequentially-consistent race
|
||||
---------------------------------------------------------------
|
||||
|
||||
From the POPL paper about the Java memory model [#JMM-popl].
|
||||
|
||||
Initially, ``x == y == 0``.
|
||||
|
||||
============ ============
|
||||
Thread 1 Thread 2
|
||||
============ ============
|
||||
r1 = x r2 = y
|
||||
if r1 != 0: if r2 != 0:
|
||||
y = 42 x = 42
|
||||
============ ============
|
||||
|
||||
Can ``r1 == r2 == 42``???
|
||||
|
||||
In a sequentially-consistent execution, there's no way to get an
|
||||
adjacent read and write to the same variable, so the program should be
|
||||
considered correctly synchronized (albeit fragile), and should only
|
||||
produce ``r1 == r2 == 0``. However, the following execution is
|
||||
happens-before consistent:
|
||||
|
||||
============ ===== ======
|
||||
Statement Value Thread
|
||||
============ ===== ======
|
||||
r1 = x 42 1
|
||||
if r1 != 0: true 1
|
||||
y = 42 1
|
||||
r2 = y 42 2
|
||||
if r2 != 0: true 2
|
||||
x = 42 2
|
||||
============ ===== ======
|
||||
|
||||
WTF, you are asking yourself. Because there were no inter-thread
|
||||
happens-before edges in the original program, the read of x in thread
|
||||
1 can see any of the writes from thread 2, even if they only happened
|
||||
because the read saw them. There *are* data races in the
|
||||
happens-before model.
|
||||
|
||||
We don't want to allow this, so the happens-before model isn't enough
|
||||
for Python. One rule we could add to happens-before that would
|
||||
prevent this execution is:
|
||||
|
||||
If there are no data races in any sequentially-consistent
|
||||
execution of a program, the program should have sequentially
|
||||
consistent semantics.
|
||||
|
||||
Java gets this rule as a theorem, but Python may not want all of the
|
||||
machinery you need to prove it.
|
||||
|
||||
|
||||
Self-justifying values
|
||||
----------------------
|
||||
|
||||
Also from the POPL paper about the Java memory model [#JMM-popl].
|
||||
|
||||
Initially, ``x == y == 0``.
|
||||
|
||||
============ ============
|
||||
Thread 1 Thread 2
|
||||
============ ============
|
||||
r1 = x r2 = y
|
||||
y = r1 x = r2
|
||||
============ ============
|
||||
|
||||
Can ``x == y == 42``???
|
||||
|
||||
In a sequentially consistent execution, no. In a happens-before
|
||||
consistent execution, yes: The read of x in thread 1 is allowed to see
|
||||
the value written in thread 2 because there are no happens-before
|
||||
relations between the threads. This could happen if the compiler or
|
||||
processor transforms the code into:
|
||||
|
||||
============ ============
|
||||
Thread 1 Thread 2
|
||||
============ ============
|
||||
y = 42 r2 = y
|
||||
r1 = x x = r2
|
||||
if r1 != 42:
|
||||
y = r1
|
||||
============ ============
|
||||
|
||||
It can produce a security hole if the speculated value is a secret
|
||||
object, or points to the memory that an object used to occupy. Java
|
||||
cares a lot about such security holes, but Python may not.
|
||||
|
||||
.. _uninitialized values:
|
||||
|
||||
Uninitialized values (direct)
|
||||
-----------------------------
|
||||
|
||||
From several classic double-checked locking examples.
|
||||
|
||||
Initially, ``d == None``.
|
||||
|
||||
================== ====================
|
||||
Thread 1 Thread 2
|
||||
================== ====================
|
||||
while not d: pass d = [3, 4]
|
||||
assert d[1] == 4
|
||||
================== ====================
|
||||
|
||||
This could raise an IndexError, fail the assertion, or, without
|
||||
some care in the implementation, cause a crash or other undefined
|
||||
behavior.
|
||||
|
||||
Thread 2 may actually be implemented as::
|
||||
|
||||
r1 = list()
|
||||
r1.append(3)
|
||||
r1.append(4)
|
||||
d = r1
|
||||
|
||||
Because the assignment to d and the item assignments are independent,
|
||||
the compiler and processor may optimize that to::
|
||||
|
||||
r1 = list()
|
||||
d = r1
|
||||
r1.append(3)
|
||||
r1.append(4)
|
||||
|
||||
Which is obviously incorrect and explains the IndexError. If we then
|
||||
look deeper into the implementation of ``r1.append(3)``, we may find
|
||||
that it and ``d[1]`` cannot run concurrently without causing their own
|
||||
race conditions. In CPython (without the GIL), those race conditions
|
||||
would produce undefined behavior.
|
||||
|
||||
There's also a subtle issue on the reading side that can cause the
|
||||
value of d[1] to be out of date. Somewhere in the implementation of
|
||||
``list``, it stores its contents as an array in memory. This array may
|
||||
happen to be in thread 1's cache. If thread 1's processor reloads
|
||||
``d`` from main memory without reloading the memory that ought to
|
||||
contain the values 3 and 4, it could see stale values instead. As far
|
||||
as I know, this can only actually happen on Alphas and maybe Itaniums,
|
||||
and we probably have to prevent it anyway to avoid crashes.
|
||||
|
||||
|
||||
Uninitialized values (flag)
|
||||
---------------------------
|
||||
|
||||
From several more double-checked locking examples.
|
||||
|
||||
Initially, ``d == dict()`` and ``initialized == False``.
|
||||
|
||||
=========================== ====================
|
||||
Thread 1 Thread 2
|
||||
=========================== ====================
|
||||
while not initialized: pass d['a'] = 3
|
||||
r1 = d['a'] initialized = True
|
||||
r2 = r1 == 3
|
||||
assert r2
|
||||
=========================== ====================
|
||||
|
||||
This could raise a KeyError, fail the assertion, or, without some
|
||||
care in the implementation, cause a crash or other undefined
|
||||
behavior.
|
||||
|
||||
Because ``d`` and ``initialized`` are independent (except in the
|
||||
programmer's mind), the compiler and processor can rearrange these
|
||||
almost arbitrarily, except that thread 1's assertion has to stay after
|
||||
the loop.
|
||||
|
||||
|
||||
Inconsistent guarantees from relying on data dependencies
|
||||
---------------------------------------------------------
|
||||
|
||||
This is a problem with Java ``final`` variables and the proposed
|
||||
`data-dependency ordering`_ in C++0x.
|
||||
|
||||
First execute::
|
||||
|
||||
g = []
|
||||
def Init():
|
||||
g.extend([1,2,3])
|
||||
return [1,2,3]
|
||||
h = None
|
||||
|
||||
Then in two threads:
|
||||
|
||||
=================== ==========
|
||||
Thread 1 Thread 2
|
||||
=================== ==========
|
||||
while not h: pass r1 = Init()
|
||||
assert h == [1,2,3] freeze(r1)
|
||||
assert h == g h = r1
|
||||
=================== ==========
|
||||
|
||||
If h has semantics similar to a Java ``final`` variable (except
|
||||
for being write-once), then even though the first assertion is
|
||||
guaranteed to succeed, the second could fail.
|
||||
|
||||
Data-dependent guarantees like those ``final`` provides only work if
|
||||
the access is through the final variable. It's not even safe to
|
||||
access the same object through a different route. Unfortunately,
|
||||
because of how processors work, final's guarantees are only cheap when
|
||||
they're weak.
|
||||
|
||||
.. _data-dependency ordering:
|
||||
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
|
||||
|
||||
|
||||
The rules for Python
|
||||
====================
|
||||
|
||||
The first rule is that Python interpreters can't crash due to race
|
||||
conditions in user code. For CPython, this means that race conditions
|
||||
can't make it down into C. For Jython, it means that
|
||||
NullPointerExceptions can't escape the interpreter.
|
||||
|
||||
Presumably we also want a model at least as strong as happens-before
|
||||
consistency because it lets us write a simple description of how
|
||||
concurrent queues and thread launching and joining work.
|
||||
|
||||
Other rules are more debatable, so I'll present each one with pros and
|
||||
cons.
|
||||
|
||||
|
||||
Data-race-free programs are sequentially consistent
|
||||
---------------------------------------------------
|
||||
|
||||
We'd like programmers to be able to reason about their programs as if
|
||||
they were sequentially consistent. Since it's hard to tell whether
|
||||
you've written a happens-before race, we only want to require
|
||||
programmers to prevent sequential races. The Java model does this
|
||||
through a complicated definition of causality, but if we don't want to
|
||||
include that, we can just assert this property directly.
|
||||
|
||||
|
||||
No security holes from out-of-thin-air reads
|
||||
--------------------------------------------
|
||||
|
||||
If the program produces a self-justifying value, it could expose
|
||||
access to an object that the user would rather the program not see.
|
||||
Again, Java's model handles this with the causality definition. We
|
||||
might be able to prevent these security problems by banning
|
||||
speculative writes to shared variables, but I don't have a proof of
|
||||
that, and Python may not need those security guarantees anyway.
|
||||
|
||||
|
||||
Restrict reorderings instead of defining happens-before
|
||||
--------------------------------------------------------
|
||||
|
||||
The .NET [#CLR-msdn] and x86 [#x86-model] memory models are based on
|
||||
defining which reorderings compilers may allow. I think that it's
|
||||
easier to program to a happens-before model than to reason about all
|
||||
of the possible reorderings of a program, and it's easier to insert
|
||||
enough happens-before edges to make a program correct, than to insert
|
||||
enough memory fences to do the same thing. So, although we could
|
||||
layer some reordering restrictions on top of the happens-before base,
|
||||
I don't think Python's memory model should be entirely reordering
|
||||
restrictions.
|
||||
|
||||
|
||||
Atomic, unordered assignments
|
||||
-----------------------------
|
||||
|
||||
Assignments of primitive types are already atomic. If you assign
|
||||
``3<<72 + 5`` to a variable, no thread can see only part of the value.
|
||||
Jeremy Manson suggested that we extend this to all objects. This
|
||||
allows compilers to reorder operations to optimize them, without
|
||||
allowing some of the more confusing `uninitialized values`_. The
|
||||
basic idea here is that when you assign a shared variable, readers
|
||||
can't see any changes made to the new value before the assignment, or
|
||||
to the old value after the assignment. So, if we have a program like:
|
||||
|
||||
Initially, ``(d.a, d.b) == (1, 2)``, and ``(e.c, e.d) == (3, 4)``.
|
||||
We also have ``class Obj(object): pass``.
|
||||
|
||||
========================= =========================
|
||||
Thread 1 Thread 2
|
||||
========================= =========================
|
||||
r1 = Obj() r3 = d
|
||||
r1.a = 3 r4, r5 = r3.a, r3.b
|
||||
r1.b = 4 r6 = e
|
||||
d = r1 r7, r8 = r6.c, r6.d
|
||||
r2 = Obj()
|
||||
r2.c = 6
|
||||
r2.d = 7
|
||||
e = r2
|
||||
========================= =========================
|
||||
|
||||
``(r4, r5)`` can be ``(1, 2)`` or ``(3, 4)`` but nothing else, and
|
||||
``(r7, r8)`` can be either ``(3, 4)`` or ``(6, 7)`` but nothing
|
||||
else. Unlike if writes were releases and reads were acquires,
|
||||
it's legal for thread 2 to see ``(e.c, e.d) == (6, 7) and (d.a,
|
||||
d.b) == (1, 2)`` (out of order).
|
||||
|
||||
This allows the compiler a lot of flexibility to optimize without
|
||||
allowing users to see some strange values. However, because it relies
|
||||
on data dependencies, it introduces some surprises of its own. For
|
||||
example, the compiler could freely optimize the above example to:
|
||||
|
||||
========================= =========================
|
||||
Thread 1 Thread 2
|
||||
========================= =========================
|
||||
r1 = Obj() r3 = d
|
||||
r2 = Obj() r6 = e
|
||||
r1.a = 3 r4, r7 = r3.a, r6.c
|
||||
r2.c = 6 r5, r8 = r3.b, r6.d
|
||||
r2.d = 7
|
||||
e = r2
|
||||
r1.b = 4
|
||||
d = r1
|
||||
========================= =========================
|
||||
|
||||
As long as it didn't let the initialization of ``e`` move above any of
|
||||
the initializations of members of ``r2``, and similarly for ``d`` and
|
||||
``r1``.
|
||||
|
||||
This also helps to ground happens-before consistency. To see the
|
||||
problem, imagine that the user unsafely publishes a reference to an
|
||||
object as soon as she gets it. The model needs to constrain what
|
||||
values can be read through that reference. Java says that every field
|
||||
is initialized to 0 before anyone sees the object for the first time,
|
||||
but Python would have trouble defining "every field". If instead we
|
||||
say that assignments to shared variables have to see a value at least
|
||||
as up to date as when the assignment happened, then we don't run into
|
||||
any trouble with early publication.
|
||||
|
||||
|
||||
Two tiers of guarantees
|
||||
-----------------------
|
||||
|
||||
Most other languages with any guarantees for unlocked variables
|
||||
distinguish between ordinary variables and volatile/atomic variables.
|
||||
They provide many more guarantees for the volatile ones. Python can't
|
||||
easily do this because we don't declare variables. This may or may
|
||||
not matter, since python locks aren't significantly more expensive
|
||||
than ordinary python code. If we want to get those tiers back, we could:
|
||||
|
||||
1. Introduce a set of atomic types similar to Java's [#Java-atomics]_
|
||||
or C++'s [#Cpp-atomics]_. Unfortunately, we couldn't assign to
|
||||
them with ``=``.
|
||||
|
||||
2. Without requiring variable declarations, we could also specify that
|
||||
*all* of the fields on a given object are atomic.
|
||||
|
||||
3. Extend the ``__slots__`` mechanism [#slots]_ with a parallel
|
||||
``__volatiles__`` list, and maybe a ``__finals__`` list.
|
||||
|
||||
|
||||
Sequential Consistency
|
||||
----------------------
|
||||
|
||||
We could just adopt sequential consistency for Python. This avoids
|
||||
all of the hazards_ mentioned above, but it prohibits lots of
|
||||
optimizations too. As far as I know, this is the current model of
|
||||
CPython, but if CPython learned to optimize out some variable reads,
|
||||
it would lose this property.
|
||||
|
||||
If we adopt this, Jython's ``dict`` implementation may no longer be
|
||||
able to use ConcurrentHashMap because that only promises to create
|
||||
appropriate happens-before edges, not to be sequentially consistent
|
||||
(although maybe the fact that Java volatiles are totally ordered
|
||||
carries over). Both Jython and IronPython would probably need to use
|
||||
`AtomicReferenceArray
|
||||
<http://java.sun.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicReferenceArray.html>`__
|
||||
or the equivalent for any ``__slots__`` arrays.
|
||||
|
||||
|
||||
Adapt the x86 model
|
||||
-------------------
|
||||
|
||||
The x86 model is:
|
||||
|
||||
1. Loads are not reordered with other loads.
|
||||
2. Stores are not reordered with other stores.
|
||||
3. Stores are not reordered with older loads.
|
||||
4. Loads may be reordered with older stores to different locations but
|
||||
not with older stores to the same location.
|
||||
5. In a multiprocessor system, memory ordering obeys causality (memory
|
||||
ordering respects transitive visibility).
|
||||
6. In a multiprocessor system, stores to the same location have a
|
||||
total order.
|
||||
7. In a multiprocessor system, locked instructions have a total order.
|
||||
8. Loads and stores are not reordered with locked instructions.
|
||||
|
||||
In acquire/release terminology, this appears to say that every store
|
||||
is a release and every load is an acquire. This is slightly weaker
|
||||
than sequential consistency, in that it allows `inconsistent
|
||||
orderings`_, but it disallows `zombie values`_ and the compiler
|
||||
optimizations that produce them. We would probably want to weaken the
|
||||
model somehow to explicitly allow compilers to eliminate redundant
|
||||
variable reads. The x86 model may also be expensive to implement on
|
||||
other platforms, although because x86 is so common, that may not
|
||||
matter much.
|
||||
|
||||
|
||||
Upgrading or downgrading to an alternate model
|
||||
----------------------------------------------
|
||||
|
||||
We can adopt an initial memory model without totally restricting
|
||||
future implementations. If we start with a weak model and want to get
|
||||
stronger later, we would only have to change the implementations, not
|
||||
programs. Individual implementations could also guarantee a stronger
|
||||
memory model than the language demands, although that could hurt
|
||||
interoperability. On the other hand, if we start with a strong model
|
||||
and want to weaken it later, we can add a ``from __future__ import
|
||||
weak_memory`` statement to declare that some modules are safe.
|
||||
|
||||
|
||||
Implementation Details
|
||||
======================
|
||||
|
||||
The required model is weaker than any particular implementation. This
|
||||
section tries to document the actual guarantees each implementation
|
||||
provides, and should be updated as the implementations change.
|
||||
|
||||
|
||||
CPython
|
||||
-------
|
||||
|
||||
Uses the GIL to guarantee that other threads don't see funny
|
||||
reorderings, and does few enough optimizations that I believe it's
|
||||
actually sequentially consistent at the bytecode level. Threads can
|
||||
switch between any two bytecodes (instead of only between statements),
|
||||
so two threads that concurrently execute::
|
||||
|
||||
i = i + 1
|
||||
|
||||
with ``i`` initially ``0`` could easily end up with ``i==1`` instead
|
||||
of the expected ``i==2``. If they execute::
|
||||
|
||||
i += 1
|
||||
|
||||
instead, CPython 2.6 will always give the right answer, but it's easy
|
||||
to imagine another implementation in which this statement won't be
|
||||
atomic.
|
||||
|
||||
|
||||
PyPy
|
||||
----
|
||||
|
||||
Also uses a GIL, but probably does enough optimization to violate
|
||||
sequential consistency. I know very little about this implementation.
|
||||
|
||||
|
||||
Jython
|
||||
------
|
||||
|
||||
Provides true concurrency under the `Java memory model`_ and stores
|
||||
all object fields (except for those in ``__slots__``?) in a
|
||||
`ConcurrentHashMap
|
||||
<http://java.sun.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html>`__,
|
||||
which provides fairly strong ordering guarantees. Local variables in
|
||||
a function may have fewer guarantees, which would become visible if
|
||||
they were captured into a closure that was then passed to another
|
||||
thread.
|
||||
|
||||
|
||||
IronPython
|
||||
----------
|
||||
|
||||
Provides true concurrency under the CLR memory model, which probably
|
||||
protects it from `uninitialized values`_. IronPython uses a locked
|
||||
map to store object fields, providing at least as many guarantees as
|
||||
Jython.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. _Java Memory Model: http://java.sun.com/docs/books/jls/third_edition/html/memory.html
|
||||
|
||||
.. _sequentially consistent: http://en.wikipedia.org/wiki/Sequential_consistency
|
||||
|
||||
.. [#JMM-popl] The Java Memory Model, by Jeremy Manson, Bill Pugh, and
|
||||
Sarita Adve
|
||||
(http://www.cs.umd.edu/users/jmanson/java/journal.pdf). This paper
|
||||
is an excellent introduction to memory models in general and has
|
||||
lots of examples of compiler/processor optimizations and the
|
||||
strange program behaviors they can produce.
|
||||
|
||||
.. [#Cpp0x-memory-model] N2480: A Less Formal Explanation of the
|
||||
Proposed C++ Concurrency Memory Model, Hans Boehm
|
||||
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2480.html)
|
||||
|
||||
.. [#CLR-msdn] Memory Models: Understand the Impact of Low-Lock
|
||||
Techniques in Multithreaded Apps, Vance Morrison
|
||||
(http://msdn2.microsoft.com/en-us/magazine/cc163715.aspx)
|
||||
|
||||
.. [#x86-model] Intel(R) 64 Architecture Memory Ordering White Paper
|
||||
(http://www.intel.com/products/processor/manuals/318147.pdf)
|
||||
|
||||
.. [#Java-atomics] Package java.util.concurrent.atomic
|
||||
(http://java.sun.com/javase/6/docs/api/java/util/concurrent/atomic/package-summary.html)
|
||||
|
||||
.. [#Cpp-atomics] C++ Atomic Types and Operations, Hans Boehm and
|
||||
Lawrence Crowl
|
||||
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html)
|
||||
|
||||
.. [#slots] __slots__ (http://docs.python.org/ref/slots.html)
|
||||
|
||||
.. [#] Alternatives to SC, a thread on the cpp-threads mailing list,
|
||||
which includes lots of good examples.
|
||||
(http://www.decadentplace.org.uk/pipermail/cpp-threads/2007-January/001287.html)
|
||||
|
||||
.. [#safethread] python-safethread, a patch by Adam Olsen for CPython
|
||||
that removes the GIL and statically guarantees that all objects
|
||||
shared between threads are consistently
|
||||
locked. (http://code.google.com/p/python-safethread/)
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
Thanks to Jeremy Manson and Alex Martelli for detailed discussions on
|
||||
what this PEP should look like.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
|
Loading…
Reference in New Issue