Proper SMP reentrant mutex implementation

max · **Posted:** Thu Mar 07, 2019 1:15 pm

So I've implemented a reentrant mutex implementation and wanted to hear some opinions whether if I have covered all the edge-cases.

It's multiprocessor safe and modification is protected from preemption (hence the interrupt disabling meanwhile).
taskingGetLocal() returns a processor-local structure - as long as locksHeld variable is > 0, no tasks are scheduled to avoid kernel deadlocking.

Code:

#define G_MUTEX_INITIALIZED 0xFEED

struct g_mutex
{
   volatile int initialized = 0;
   volatile int lock = 0;
   int depth = 0;
   uint32_t owner = -1;
};

Code:

void mutexAcquire(g_mutex* mutex, bool smp)
{
   if(mutex->initialized != G_MUTEX_INITIALIZED)
      mutexErrorUninitialized(mutex);

   while(!mutexTryAcquire(mutex, smp)) {
      asm("pause");
   }
}

bool mutexTryAcquire(g_mutex* mutex, bool smp)
{
   if(mutex->initialized != G_MUTEX_INITIALIZED)
      mutexErrorUninitialized(mutex);

   // Disable scheduling
   bool enableInt = interruptsAreEnabled();
   if(enableInt) interruptsDisable();

   // Lock editing
   while(!__sync_bool_compare_and_swap(&mutex->lock, 0, 1))
      asm("pause");

   // Update mutex
   bool success = false;
   if(mutex->depth == 0)
   {
      mutex->owner = processorGetCurrentId();
      mutex->depth = 1;
      if(smp) __sync_fetch_and_add(&taskingGetLocal()->locksHeld, 1);
      success = true;

   } else if(mutex->owner == processorGetCurrentId())
   {
      mutex->depth++;
      success = true;
   }

   // Allow editing again
   mutex->lock = 0;

   // Enable interrupts
   if(enableInt) interruptsEnable();

   return success;
}

void mutexRelease(g_mutex* mutex, bool smp)
{
   if(mutex->initialized != G_MUTEX_INITIALIZED)
      mutexErrorUninitialized(mutex);

   // Disable interrupts
   bool enableInt = interruptsAreEnabled();
   if(enableInt) interruptsDisable();

   // Lock editing
   while(!__sync_bool_compare_and_swap(&mutex->lock, 0, 1))
      asm("pause");

   // Update mutex
   if(mutex->depth > 0 && --mutex->depth == 0)
   {
      mutex->depth = 0;
      mutex->owner = -1;

      if(smp) __sync_fetch_and_sub(&taskingGetLocal()->locksHeld, 1);
   }

   // Allow editing again
   mutex->lock = 0;

   // Enable interrupts
   if(enableInt) interruptsEnable();
}

volatile int mutexInitializerLock = 0;

void mutexInitialize(g_mutex* mutex)
{
   while(!__sync_bool_compare_and_swap(&mutexInitializerLock, 0, 1))
      asm("pause");

   if(mutex->initialized != G_MUTEX_INITIALIZED) {
      mutex->initialized = G_MUTEX_INITIALIZED;
      mutex->lock = 0;
      mutex->depth = 0;
      mutex->owner = -1;
   }

   mutexInitializerLock = 0;
}

What are your best practices for mutual exclusion?

max · **Posted:** Thu Mar 07, 2019 1:37 pm

Ah, I've just seen that newer versions of GCC have new builtins for this purpose.

Anyone used them yet? I assume the equivalent to __sync_bool_compare_and_swap(&mutex, 0, 1) would be this? Bit hard to tell from the documentation:

Code:

int lockUnlocked = 0;
int lockLocked = 1;
__atomic_compare_exchange(&mutex, &lockUnlocked, &lockLocked, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

max wrote:

So I've implemented a reentrant mutex implementation and wanted to hear some opinions whether if I have covered all the edge-cases.

It's multiprocessor safe and modification is protected from preemption (hence the interrupt disabling meanwhile).
taskingGetLocal() returns a processor-local structure - as long as locksHeld variable is > 0, no tasks are scheduled to avoid kernel deadlocking.

Well, that won't help with the stated problem. A deadlock can happen as soon as two processes can hold two locks each, and you have SMP support, so that is sort of a given. So the only thing you managed to do is increase latency.

Code:

void mutexAcquire(g_mutex* mutex, bool smp)
{
   if(mutex->initialized != G_MUTEX_INITIALIZED)
      mutexErrorUninitialized(mutex);

   while(!mutexTryAcquire(mutex, smp)) {
      asm("pause");
   }
}

Well, that's going to burn CPU. As long as one thread is holding the lock, all the other ones are left, uselessly spinning in vain, hoping for a free lock each time, but never finding it.

The rest looks solid enough. So it is a recursive mutex implementation. I have never actually seen a use for recursive mutexes that wasn't a design error. But to each their own, I suppose.

max wrote:

What are your best practices for mutual exclusion?

For short-time locks, I have spinlocks. Similar to your version, that you buried in the mutex implementation. Spinlocks disable interrupts while held. You are never allowed to hold more than one spinlock, and you are not allowed to go to userspace or to schedule another task while holding a spinlock. This is typically accomplished by returning the lock in the function that takes it, but there is an exception: Since signal handlers are declared on a process level, I have to guard against setting them from multiple threads, which I do with a spinlock. However, if a CPU exception makes my kernel send a signal to the current thread, I have to prevent the thread from just ignoring the signal. So I have a function force_signal(), which resets the signal handler for the signal that is about to be forced on the thread, if the signal was blocked or ignored. But the function that actually handles signals is called from the code that returns to userspace. I can't return the lock in force_signal(), since then another thread might change signal handlers while the current thread transitions to handle_sigpending(). So in this one instance, force_signal() is allowed to take a spinlock and not return it, because handle_sigpending() will return it instead.

For long-time locks, I have mutexes. Non-recursive, mind. These mutexes use futexes internally. Which means that only system calls and kernel tasks can use it, but not interrupts. But that's OK: Interrupts aren't supposed to do much, anyway. Mostly just to schedule in a kernel task.

Futexes are basically pointers to int, with two major operations to them: Wait and wake. When waiting on a futex, the kernel ensures the futex still has a certain value (given as argument), and if so, puts the calling process to sleep, in such a way, that if another process changes the value and calls wake(), the current process is not put to sleep.

Wake then obviously wakes a process up.

For the locks themselves, I stole the design from musl. That is, both the lock and the waiter count are kept in a single int. If that int is zero, the lock is free and there are no waiters. If it is a positive number x, the lock is free, but there are x waiters. If it is a negative number INT_MIN + x, the lock is held and there are x - 1 waiters.

Code:

void lll_lock(volatile int* lock, int priv) {
  int l;
  /* 1st: Try locking an uncontended lock */
  l = a_cas(lock, 0, INT_MIN + 1);
  if (!l) return;

  /* 2nd: Spinlock */
  for (int try = 10; try; try--) {
    if (l < 0)  /* lock is held: Hope the other thread gave it back when we didn't look. */
      l -= INT_MIN + 1;
    int t = a_cas(lock, l, l + INT_MIN + 1);
    if (t == l)
      return;
    l = t;
  }

  /* 3rd: Bite the bullet and just wait already. */
  l = a_fetch_add(lock, 1) + 1;
  for (;;) {
    if (l < 0) {
      futex_wait(lock, l, 0, priv);
      l = (l - INT_MIN) - 1;
    }
    int t = a_cas(lock, l, l + INT_MIN);
    if (t == l)
      return;
    l = t;
  }
}

lll_unlock(volatile int *lock, int priv) {
  if (a_fetch_add(lock, -(INT_MIN + 1)) != INT_MIN + 1)
    futex_wake(lock, 1, priv);
}

The futex functions themselves then first translate the address into a physical address (possibly triggering a page load in the process, and locking the page into memory, so it can't be swapped out). This way, multiple processes can talk about the same futex in shared memory. Then locate the correct queue (there's a global futex queue and a queue in each process, and the priv flag tells the functions which to use), take a spinlock on it, ensure the value is still correct, then put an object in the queue, before releasing the spinlock and going to the scheduler.

The wake function takes the spinlock and proceeds to wake the thread in there. If the wake happens to get between the waiter releasing the spinlock and going to the scheduler, then the waiter looses a single round through the scheduler. Not the end of the world. And with a little modification, I can make this interruptible, so a signal can be delivered while waiting for a lock.

And yes, the new atomics are just the old sync functions. But I just keep my own assembly atomics around. That way, I at least know the memory order semantics.

cyr1x · **Posted:** Thu Mar 07, 2019 2:41 pm

Adding to what nullplan said, if you were really searching for reentrant spinlocks (not mutices),
I would do something along those line (using C11 atomics):

Code:

_Atomic unsigned lock = (unsigned)-1;
unsigned depth = 0;

void lock(void)
{
    unsigned this = cpu()
    if (atomic_load_explicit(&lock, memory_order_acquire) != this)
    {
        unsigned expected = -1;
        while(!atomic_compare_exchange_strong_explicit(&lock, &expected, this, memory_order_acquire, memory_order_relaxed)) {
            pause();
            expected = -1;
        }
    }
    depth++
}

void unlock()
{
    if (--depth == 0) atomic_store_explicit(&lock, -1, memory_order_release);
}

OSDev.org

Proper SMP reentrant mutex implementation

Who is online