While I was thinking about the question how to create a page frame allocator which is thread-safe, lock-free and most of all simple and easy to implement, I stumbled across an idea which I'd like to share. Note that this allocator is certainly not the fastest solution, but performance was not the goal of this idea. It's more a neat trick.
The basic concept of this allocator is a bitmap with one bit for each physical page, where the bit is clear if the page is free, and set if the page is allocated. Allocating a page is done by searching for a clear bit and setting it, while freeing a page is done by clearing its bit. Of course it must be made sure that two concurrent threads (such as running on different CPUs at the same time) will not allocate the same page simultaneously, so one can, for example, have a global spin lock for the whole allocator. While one thread is looking for a free page to allocate, all other ones have to wait. Of course this is terrible, leads to lock contention and threads will spend lots of time waiting for the lock.
The idea I had is to have one lock per page, and to simply use its bit in the bitmap as lock. So basically the allocator becomes:
Code:
page alloc(void)
{
for each page
{
if(!test_and_set_bit(page)) // atomically test and set bit, and if it was clear, the page was free and is now allocated
return page;
}
error("No free pages.");
}
void free(page)
{
clear_bit(page);
}
So while this would work in principle and resolves the lock contention problem, because there is no waiting / spinning at a lock anymore, it would be terribly slow to test every single bit like that. Normally one would rather want to check a whole block of pages at once, say 64 pages by comparing whether there are any free bits in a block of type int64_t, or whether it is equal to -1 (all bits are set). So the allocator would look like this:
Code:
page alloc(void)
{
for each block
{
pattern = atomic_exchange(&block, -1); // get all pages in this block and allocate any free ones
if(pattern != -1) // free pages found?
{
page = find_clear_bit(pattern); // find free page among those we just took
set_bit(pattern, page); // set the bit in the pattern - no need to be atomic, since no other thread accesses this (local) variable
atomic_and(&block, pattern); // atomically free those pages which we allocated, but did not use
return page;
}
error("No free pages.");
}
void free(page)
{
clear_bit(page);
}
Of course one can think of more efficient approaches, but this was the most simple one I could think of. Certainly I'm not the first one who got this idea, and certainly there are people who have implemented this and use it (unless it has any flaws I don't see right now). Any comments and ideas are welcome!