Another wonderful example of a CAS being useful is a pthread_once variable. What it's supposed to do is indicate whether some initialization has happened, but logically, it can be in three states: Either the initialization has not yet happened, or it has happened, or it is currently happening. By interface, the initialization function only runs once. So how about we define pthread_once_t to be an integer type, and then 0 means not initialized, 1 means initialization is running, and 2 means initialization has concluded.
What you can do to implement pthread_once() is that, in a first step, you try to CAS the variable given from 0 to 1. Even with multiple threads attempting it, only one of them will ever be able to do it. So that thread then runs the initialization function and then sets the variable to 2. All other threads spin in a loop, waiting for the variable to turn from 1 to 2. Linux futexes help a lot here, since you can just make them all sleep on the once variable if it is 1. Add in some logic to handle thread cancellation and you pretty much have a production-ready implementation of pthread_once():
Code:
typedef atomic_int pthread_once_t;
#define PTHREAD_ONCE_INIT 0
static void cleanup(void *o) {
atomic_store_explicit((pthread_once_t*)o, 0, memory_order_release);
}
int pthread_once(pthread_once_t *o, void (*init)(void)) {
for (;;) {
int e = 0;
if (atomic_compare_exchange_weak_explicit(o, &e, 1, memory_order_acq_rel, memory_order_acquire)) {
pthread_cleanup_push(cleanup, o);
init();
pthread_cleanup_pop(0);
atomic_store_explicit(o, 2, memory_order_release);
futex_wake(o, INT_MAX); /* wake as many threads as are waiting on the variable. */
return 0;
} else if (e == 1) {
futex_wait(o, 1, 0); /* wait indefinitely if *o still has value 1 */
} else if (e == 2) /* the CAS above is weak, so can fail spuriously, so could fail with e == 0 */
return 0;
}
}
Boy, I hope I have those memory orders correct, because they confuse the hell out of me. But the release stores ought to pair with the acquire loads in case of failing comparison, so each thread seeing a variable of value 2 ought to see the effects of the initialization functions.
Anyway, point is, only one thread can change the variable from 0 to 1. Many can see it being 0, or being 1, but only one can actually affect the change successfully.