926. Sequentially consistent fences, relaxed operations and modification order

Section: 32.5.4 [atomics.order] Status: NAD Editorial Submitter: Anthony Williams Opened: 2008-10-19 Last modified: 2016-01-28

Priority: Not Prioritized

View other active issues in [atomics.order].

View all other issues in [atomics.order].

View all issues with NAD Editorial status.

Discussion:

Addresses UK 313

There was an interesting issue raised over on comp.programming.threads today regarding the following example

// Thread 1:
x.store(1, memory_order_relaxed);           // SX
atomic_thread_fence(memory_order_seq_cst);  // F1
y.store(1, memory_order_relaxed);           // SY1
atomic_thread_fence(memory_order_seq_cst);  // F2
r1 = y.load(memory_order_relaxed);          // RY

// Thread 2:
y.store(0, memory_order_relaxed);          // SY2
atomic_thread_fence(memory_order_seq_cst); // F3
r2 = x.load(memory_order_relaxed);         // RX

is the outcome r1 == 0 and r2 == 0 possible?

I think the intent is that this is not possible, but I am not sure the wording guarantees that. Here is my analysis:

Since all the fences are SC, there must be a total order between them. F1 must be before F2 in that order since they are in the same thread. Therefore F3 is either before F1, between F1 and F2 or after F2.

If F3 is after F2, then we can apply 32.5.4 [atomics.order]p5 from N2798:

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.

In this case, A is SX, B is RX, the fence X is F2 and the fence Y is F3, so RX must see 1.

If F3 is before F2, this doesn't apply, but F3 can therefore be before or after F1.

If F3 is after F1, the same logic applies, but this time the fence X is F1. Therefore again, RX must see 1.

Finally we have the case that F3 is before F1 in the SC ordering. There are now no guarantees about RX, and RX can see r2==0.

We can apply 32.5.4 [atomics.order]p5 again. This time, A is SY2, B is RY, X is F3 and Y is F1. Thus RY must observe the effects of SY2 or a later modification of y in its modification order.

Since SY1 is sequenced before RY, RY must observe the effects of SY1 or a later modification of y in its modification order.

In order to ensure that RY sees (r1==1), we must see that SY1 is later in the modification order of y than SY2.

We're now skating on thin ice. Conceptually, SY2 happens-before F3, F3 is SC-ordered before F1, F1 happens-before SY1, so SY1 is later in the modification order M of y, and RY must see the result of SY1 (r1==1). However, I don't think the words are clear on that.

[ Post Summit Hans adds: ]

In my (Hans') view, our definition of fences will always be weaker than what particular hardware will guarantee. Memory_order_seq_cst fences inherently don't guarantee sequential consistency anyway, for good reasons (e.g. because they can't enforce a total order on stores). Hence I don't think the issue demonstrates a gross failure to achieve what we intended to achieve. The example in question is a bit esoteric. Hence, in my view, living with the status quo certainly wouldn't be a disaster either.

In any case, we should probably add text along the lines of the following between p5 and p6 in 32.5.4 [atomics.order]:

[Note: Memory_order_seq_cst only ensures sequential consistency for a data-race-free program that uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences only ensure a total order for the fences themselves. They cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications.]

Also see thread beginning at c++std-lib-23271.

[ Herve's correction: ]

Minor point, and sorry for the knee jerk reaction: I admit to having no knowledge of Memory_order_seq_cst, but my former boss (John Lakos) has ingrained an automatic introspection on the use of "only". I think you meant:

[Note: Memory_order_seq_cst ensures sequential consistency only for . . . . In particular, memory_order_seq_cst fences ensure a total order only for . . .

Unless, of course, Memory_order_seq_cst really do nothing but ensure sequential consistency for a data-race-free program that uses exclusively memory_order_seq_cst operations.

[ 2009-10 Santa Cruz: ]

NAD Editorial. Solved by N2992.

Proposed resolution:

Add a new paragraph after 32.5.4 [atomics.order]p5 that says

For atomic operations A and B on an atomic object M, where A and B modify M, if there are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B occurs later than A in the modifiction order of M.