Hi,
jbemmel wrote:
The issue is that the cmpxchg will loop until some other thread releases the lock by calling spin_unlock. In algorithms that do not use spinlocks ( which is what I meant by "lock-less", not a common term in computer theory ) one would typically use cmpxchg in the following way:
Code:
retry:
var h = list->head;
if (h) {
var n = h->next;
if (!cmpxchg( &list->head, h, n )) goto retry;
}
return h;
Here it may happen that some other thread changes list->head after it is read, causing the cmpxchg to fail ( disregarding ABA issues ). However, the thread can retry and succeed on the 2nd iteration, independent of any other thread's execution
No. Imagine a typical 80x86 "4-core with hyper-threading" system. The first problem is that (e.g.) CPU #0 can read list->head, do some work, then fail to commit the work (the cmpxchg) because CPU #1 modified list->head; then CPU #0 retries and wastes more time doing work and fails to commit it because CPU #2 modified list->head; then wastes more time and fails to commit it because another CPU modified list->head again; and so on. For "worst case" there is no limit on how many times this can happen, and therefore no guarantee that CPU #0 will ever make progress.
The other problem is that, for hyper-threading, a core's resources are shared and work done by one logical CPU effects the performance of the other logical CPU in that core; regardless of whether the work done is successfully committed or discarded/retried. Basically, lock-less algorithms mean that other CPUs get less work done for no reason. Of course this problem isn't limited to the core's shared resources, but effects all shared resources that CPUs compete for (e.g. bus bandwidth, RAM bandwidth, cache slots, etc), which means that it can also effect systems that don't use hyper-threading.
The solution to the first problem is "fairness"; which is typically done using something "loosely derived from"
Lamport's bakery algorithm where some form of counter/ticket is used to limit the worst case. For example, with our typical 80x86 "4-core with hyper-threading" system, with a ticket lock you can guarantee that CPU #0 will make progress after waiting for a maximum of 7 other CPUs to have their turn, and therefore guarantee that CPU #0 will make progress eventually.
The only solution to the second problem is not wasting time doing work unless you know it can be committed.
Now; your spinlock example code and your lock-free example code both suffer from both of these problems. For the second problem they should both be using the "pause" instruction to minimise "core resources wastage", where the spinlock would minimise wastage more. For the first problem, it'd be relatively easy to upgrade the spinlock and make it a "ticket lock"; but for the lock-free version you're basically screwed - to improve it you need to use a lock.
Cheers,
Brendan