hmm ... so they have a queue of CPU identifiers with the spinlock ... That looks more and more like a semaphore to me. Honnestly, i don't know if signalling spinlock release on a per-cpu variable really helps.
Since you now have a 'read-only' waiting loop, that may indeed reduce the amount of LOCK and WRITE cycles on the system bus ... My guess is that the real performance improvement will depend on how frequent actual conflict will occur against the overhead of getting/releasing the resource when noone else wants it.
Their claims lack clear performance tests (or references to these tests) to have any scientific impact. And claiming that "we now scale better to multiple processors with Win2K" while NT4.0 license prevents you to run it on more than 2 processors on a SMP system sounds like an april fool joke to me
Code:
r0 = processor_id;
r1 = 1;
cli
xchg [spinlock],r1
cmp r1,1
jne got_it;
;; here, we have to enqueue. but how will we be sure that
;; we enqueue *safely* ? we need a spinlock to protect the queue length or something ...
try_again:
cmp [processors_signal+r0],1
jne try_again
mov [processors_signal+r0],0
got_it:
If your 'critical region' is long enough to have a significant probability of concurrent access, then using hardware-assisted IPI may be more paying. You then have a 'real' semaphore and direct a message towards the next requestor. Locked CPUs will be in a 'halt' state, somehow, that only IPI could re-activate.
(yes, i'm working on
local APIC and
MSRs atm.