Issue 2265: 29.3p9 appears to rule out some acceptable executions

This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Open status.

2265. 29.3p9 appears to rule out some acceptable executions

Section: 32.5.4 [atomics.order] Status: Open Submitter: Brian Demsky Opened: 2013-06-17 Last modified: 2016-01-28

Priority: 4

View other active issues in [atomics.order].

View all other issues in [atomics.order].

View all issues with Open status.

Discussion:

I believe that the following variation on IRIW should admit executions in which c1 = d1 = 5 and c2 = d2 = 0. If this is allowed, then what is sequence of program evaluations for 32.5.4 [atomics.order] p9 that justifies the store to z? It seems that 32.5.4 [atomics.order] p9 should not allow this execution because one of the stores to x or y has to appear earlier in the sequence, each of the fetch_adds reads the previous load in the thread (and thus must appear later in the sequence), and 32.5.4 [atomics.order] p9 states that each load must read from the last prior assignment in the sequence.

atomic_int x;
atomic_int y;
atomic_int z;
int c1, c2, d1, d2;

static void a(void* obj)
{
  atomic_store_explicit(&x, 5, memory_order_relaxed); 
}

static void b(void* obj)
{
  atomic_store_explicit(&y, 5, memory_order_relaxed); 
}

static void c(void* obj)
{
  c1 = atomic_load_explicit(&x, memory_order_relaxed);
  // this could also be an atomic load if the address depends on c1:
  c2 = atomic_fetch_add_explicit(&y, c1, memory_order_relaxed);  
}

static void d(void* obj)
{
  d1 = atomic_load_explicit(&y, memory_order_relaxed);
  d2 = atomic_fetch_add_explicit(&x, d1, memory_order_relaxed); 
}

int user_main(int argc, char** argv)
{
  thrd_t t1, t2, t3, t4;

  atomic_init(&x, 0);
  atomic_init(&y, 0);

  printf("Main thread: creating 4 threads\n");
  thrd_create(&t1, (thrd_start_t)&a, NULL);
  thrd_create(&t2, (thrd_start_t)&b, NULL);
  thrd_create(&t3, (thrd_start_t)&c, NULL);
  thrd_create(&t4, (thrd_start_t)&d, NULL);

  thrd_join(t1);
  thrd_join(t2);
  thrd_join(t3);
  thrd_join(t4);
  printf("c1=%d c2=%d\n",c1,c2);
  printf("d1=%d d2=%d\n",d1,d2);

  // Can this store write 1000 (i.e., c1=d1=5, c2=d2=0)?
  atomic_store(&z, (c1+d1)*100+c2+d2);

  printf("Main thread is finished\n");

  return 0;
}

It seems that the easiest fix is to allow a load in 32.5.4 [atomics.order] p9 to read from any prior store in the evaluation order.

That said, I would personally advocate the following: It seems to me that C/C++ atomics are in a bit of different situation than Java because:

People are expected to use relaxed C++ atomics in potentially racy situations, so it isn't clear that semantics as complicated as the JMM's causality would be sane.
People who use C/C++ atomics are likely to be experts and use them in a very controlled fashion. I would be really surprised if compilers would find any real wins by optimizing the use of atomics.

Why not do something like:

There is satisfaction DAG of all program evaluations. Each evaluation observes the values of variables as computed by some prior assignment in the DAG.

There is an edge x->y between two evaluations x and y if:

the evaluation y observes a value computed by the evaluation x or
the evaluation y is an atomic store, the evaluation x is an atomic load, and there is a condition branch c that may depend (intrathread dependence) on x and x-sb->c and c-sb->y.

This seems to allow reordering of relaxed atomics that processors do without extra fence instructions, allows most reorderings by the compiler, and gets rid of satisfaction cycles.

[2015-02 Cologne]

Handed over to SG1.

[2015-05 Lenexa, SG1 response]

This was partially addressed (weasel-worded) in C++14 (See N3786). The remainder is an open research problem. N3710 outlines a "solution" that doesn't have a consensus behind it because it costs performance. We have no better solution at the moment.

Proposed resolution: