allocating frames using buddy allocation

Sourcer · **Joined:** Fri Jun 17, 2016 11:29 pm **Posts:** 58

Hey,

I'm implementing a physical memory allocator using the buddy approach.

I'm not sure if my method is the most efficient / correct, i really doubt i understood buddy allocation.
so i'll be glad if you guys review my approach:

In general, i have an array of pointers whose size is MAX_ORDER + 1, where MAX_ORDER is the highest order
one can allocate(happens to be 10, i.e., 2^10 pages)

And this is what i do when initializing available memory regions(reported by GRUB):

Code:

foreach (base_address, size) in available regions:
        size = page_align(size) // page == 4096
        order = log2(size / page_size)
        vmm_address = map_to_vmm(base_address, size)
        store_buddy_information(vmm_address)
        add_to_list(order, base_address)
        unmap_from_vmm(vmm_address, size)

Basically i use the regions themselves to store a struct that keeps information about their own region.

allocating will be like:

Code:

alloc(order):
      if not_empty(buddy_lists[order]):
            address = pop_element_from_list(buddy_lists[order]) // This will update the new list head and return the old HEAD's address
            return address

      order++
      // Order's list is empty, we need to divide..
      while order < MAX_ORDER - 1:
             if not_empty(buddy_lists[order]):
                     address = pop_element_from_list(buddy_lists[order])
                     vmm_address = map_to_vmm(address, 2^order * PAGE_SIZE)
                     first_buddy = vmm_address
                     second_buddy = vmm_address + 2^order * PAGE_SIZE / 2  
                     add_to_list(order - 1, physical_address(second_buddy))
                     return physical_address(first_buddy)

Not sure if this is clear enough.. this is highly pseudo.

davidv1992 · **Joined:** Thu Jul 05, 2007 8:58 am **Posts:** 223

I've implemented buddy allocation in my personal project, and from what you write down most of it seems reasonable. I do have a few minor things you might want to think about a bit more.

First of all, Grub might (and most likely will) report memory areas whose size is not a proper power of 2. This means that your initialization routine in it's current form would either need to round in the log function (which means not using all the memory), or would need some modifications to handle the decreasing block sizes.

Second, you seem to be planning to use the pages themselves to hold entries for the linked lists of free regions. While this will save you memory, the trade off is that you will be doing quite a bit of manipulation of the page tables, which may or may not be worth it. Personally, I decided that the 8-16 bytes per page it takes extra to store the linked lists separately was worth the speedup, but if you plan on running on systems with very tight memory, you might feel differently

Finally, the power of a buddy allocator comes from also being able to merge a region with it's buddy when that is free, and it is freed. Your current pseudo-code does not contain code for freeing pages, and I can't really see how you are planning your data structures to make that efficient.

If you want some reference material on buddy allocation, The Art of Computer Programming Volume 1 by Donald E Knuth contains a really good explanation of it in section 2.5. You should be able to find this book in any library that has a decent computer science collection.

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

Out of curiosity, is there some benefit using buddy allocation for physical memory? Or are you going to use large pages as well?

Haven't done any benchmarks but is there any performance benefit compared to a linked list (linked pages each containing ~1k entries, not individually linked) or something similar?

davidv1992 · **Joined:** Thu Jul 05, 2007 8:58 am **Posts:** 223

From a performance standpoint, just using something along the lines of a stack/list of free pages is about as fast as you are going to get things. The advantage of a buddy allocator is that it allows you to allocate larger, continuous segments of physical memory. This is needed for things like DMA buffers in device drivers.

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

True, but is DMA really relevant? How many of them want large contiguous blocks?

Some might not support ISA at all, but even if they do that's all low address which can be treated special.

I'm not saying that there aren't any uses, I'm asking are there any real world uses?

If the system only uses 4KiB pages, then you don't need for demand-allocation either.. If scatter/gather, again not much need.. I've thought about buddy but not sure if there's real benefits. I'd like to hear what others think..

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

davidv1992 wrote:

First of all, Grub might (and most likely will) report memory areas whose size is not a proper power of 2. This means that your initialization routine in it's current form would either need to round in the log function (which means not using all the memory), or would need some modifications to handle the decreasing block sizes.

Didn't have time to read thru the algo properly, but what do you mean by "proper power of 2" and rounding? I mean, all pages are powers of 2 and if you only ever allocate at minimum 4KiB pages, then..?

As for initialization, I would probably handle it so that the buddy starts from containing 0 free 4KiB pages and populate it by de-allocating all the 4KiB pages I got from free memory map (BIOS or Multiboot), since buddy will have to be able to handle the case of 0 free pages anyway..

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

LtG wrote:

True, but is DMA really relevant? How many of them want large contiguous blocks?

Some might not support ISA at all, but even if they do that's all low address which can be treated special.

I do not know a single device that does not need contiguous DMA blocks. Sure, modern devices will support scatter/gather lists but usually those lists (and other control structures) need to be stored in contiguous memory. And then there are devices that do not support scatter/gather lists but are still commonly used. For example UHCI falls into this category. The ability to allocate large pages in definitely required to drive many kinds of hardware, not to mention huge page support.

dozniak · **Posted:** Fri May 05, 2017 12:17 am

LtG wrote:

Didn't have time to read thru the algo properly, but what do you mean by "proper power of 2" and rounding? I mean, all pages are powers of 2 and if you only ever allocate at minimum 4KiB pages, then..?

Grub will return memory regions that are not whole multiples of page size.

davidv1992 · **Joined:** Thu Jul 05, 2007 8:58 am **Posts:** 223

LtG wrote:

davidv1992 wrote:

First of all, Grub might (and most likely will) report memory areas whose size is not a proper power of 2. This means that your initialization routine in it's current form would either need to round in the log function (which means not using all the memory), or would need some modifications to handle the decreasing block sizes.

Didn't have time to read thru the algo properly, but what do you mean by "proper power of 2" and rounding? I mean, all pages are powers of 2 and if you only ever allocate at minimum 4KiB pages, then..?

As for initialization, I would probably handle it so that the buddy starts from containing 0 free 4KiB pages and populate it by de-allocating all the 4KiB pages I got from free memory map (BIOS or Multiboot), since buddy will have to be able to handle the case of 0 free pages anyway..

The way the initialization code is currently organized, each region reported by grub is converted into precisely one region for the buddy allocator. However, grub may return regions with a size that is not of the form 2^k * page_size, meaning that you will either need to ignore some excess at this point (wasteful), or split the region reported by grub into several smaller ones.

dozniak wrote:

LtG wrote:

Didn't have time to read thru the algo properly, but what do you mean by "proper power of 2" and rounding? I mean, all pages are powers of 2 and if you only ever allocate at minimum 4KiB pages, then..?

Grub will return memory regions that are not whole multiples of page size.

I'm not sure if grub would sanitize regions to page boundaries if the hardware reports them, but I've never seen them occur. Either way, if this would happen, a paging OS pretty much has no other option but to ignore the excess. And as windows is using paging (and most hardware seems to be produced with the idea that if windows works with it, then it is okay), I doubt many hardware/bios implementers would choose to have non-page-aligned memory regions

MollenOS · **Joined:** Wed Oct 26, 2011 12:00 pm **Posts:** 202

davidv1992 wrote:

dozniak wrote:

LtG wrote:

Didn't have time to read thru the algo properly, but what do you mean by "proper power of 2" and rounding? I mean, all pages are powers of 2 and if you only ever allocate at minimum 4KiB pages, then..?

Grub will return memory regions that are not whole multiples of page size.

I'm not sure if grub would sanitize regions to page boundaries if the hardware reports them, but I've never seen them occur. Either way, if this would happen, a paging OS pretty much has no other option but to ignore the excess. And as windows is using paging (and most hardware seems to be produced with the idea that if windows works with it, then it is okay), I doubt many hardware/bios implementers would choose to have non-page-aligned memory regions

This is wrong. I have two laptops that report memory regions not on page boundaries, and this is primarily seen with device-memory regions. It reports twice regions that end and start either on 0x800 or 0x400. So you should absolutely support this on real hardware when you begin to develop drivers.

Sourcer · **Joined:** Fri Jun 17, 2016 11:29 pm **Posts:** 58

davidv1992 wrote:

I've implemented buddy allocation in my personal project, and from what you write down most of it seems reasonable. I do have a few minor things you might want to think about a bit more.

First of all, Grub might (and most likely will) report memory areas whose size is not a proper power of 2. This means that your initialization routine in it's current form would either need to round in the log function (which means not using all the memory), or would need some modifications to handle the decreasing block sizes.

Second, you seem to be planning to use the pages themselves to hold entries for the linked lists of free regions. While this will save you memory, the trade off is that you will be doing quite a bit of manipulation of the page tables, which may or may not be worth it. Personally, I decided that the 8-16 bytes per page it takes extra to store the linked lists separately was worth the speedup, but if you plan on running on systems with very tight memory, you might feel differently

Finally, the power of a buddy allocator comes from also being able to merge a region with it's buddy when that is free, and it is freed. Your current pseudo-code does not contain code for freeing pages, and I can't really see how you are planning your data structures to make that efficient.

If you want some reference material on buddy allocation, The Art of Computer Programming Volume 1 by Donald E Knuth contains a really good explanation of it in section 2.5. You should be able to find this book in any library that has a decent computer science collection.

Thank you for your reply.
You're right about GRUB not reporting regions whose size is a not power of 2, my bad.
Secondly, by '8-16 bytes' per page, do you mean you calculate how much available memory you have
and 'statically'(once) allocate a chunk of memory to hold that list? i thought about it too, but i'm not sure
what is the advantage of this approach. Is mapping the physical page to virtual memory is that expensive?(sure i'll need to flush the TLB, but is it..?)

About freeing, i thought about it just a bit, and i thought about keeping a 'state' of a buddy within its struct.
state could contain:
1. Split, right side.
2. Split, left side.
3. Not split(i.e. buddies are together)

This way when the user frees address (X), i could check its buddie's state(by a simple calculation), and restore buddies accordingly.

And someone also asked why use a buddy allocating: I tried to merge both of the 'stack allocation' approach and the buddy allocation. I use the free pages to store information about free pages(just like a stack), but i like to think about it like i have N stacks of N orders(2^N)

davidv1992 · **Joined:** Thu Jul 05, 2007 8:58 am **Posts:** 223

Sourcer wrote:

Thank you for your reply.
You're right about GRUB not reporting regions whose size is a not power of 2, my bad.
Secondly, by '8-16 bytes' per page, do you mean you calculate how much available memory you have
and 'statically'(once) allocate a chunk of memory to hold that list? i thought about it too, but i'm not sure
what is the advantage of this approach. Is mapping the physical page to virtual memory is that expensive?(sure i'll need to flush the TLB, but is it..?)

At bootup, currently It takes stock of what regions GRUB reports, and for each of those it allocates enough pages to store all the bookkeeping needed for the pages in that region. The reason I went with this approach are the following:
1. It is a lot simpler to code. Not having to change page maps and the like removed a big chunk of potential bugs, mainly by removing the code needed to manage those mappings
2. It provides better cache locality. Because of the (admittedly somewhat simplistic) way I've implemented other parts of the kernel, it is relatively common for my page allocator to be called several times in quick succession to allocate a number of pages that don't necessarily need to be consecutive. By having the managing datastructures in one location, I have a better chance of hitting the cache for those on the later invocations in such a cycle.
3. This also ties into the third reason: TLB flushes are expensive. Even in the best-case scenario, when there is only a need to flush one entry, this will cause several extra memory accesses, and the fact that you are then immediately accessing the newly mapped page for a read ensures that one of those is almost guaranteed to be a direct read from memory, which incurs an extremely large performance penalty on modern x86 processors.

Of these, reason number 1 was the primary choice (I only have finite time, and I decided the easier development was worth the increase in memory use), though the other advantages are definitely nice to have.

Again, the other choice, using the pages themselves to keep the linked list structure is also valid, it will definitely save you some memory. It's a trade off.

Sourcer wrote:

About freeing, i thought about it just a bit, and i thought about keeping a 'state' of a buddy within its struct.
state could contain:
1. Split, right side.
2. Split, left side.
3. Not split(i.e. buddies are together)

This way when the user frees address (X), i could check its buddie's state(by a simple calculation), and restore buddies accordingly.

And someone also asked why use a buddy allocating: I tried to merge both of the 'stack allocation' approach and the buddy allocation. I use the free pages to store information about free pages(just like a stack), but i like to think about it like i have N stacks of N orders(2^N)

I have a few small points. If you keep proper track of starting points of the various regions handed to you by GRUB, you wont need to differentiate between cases 1 and 2. Instead, you can simply use the address itself to determine whether this block is the left or right block of its pair. This means that you can get away with a single bit per block, simply stating whether it is used or free. Again, [Knuth] has an excellent explanation of the details.

Sourcer · **Joined:** Fri Jun 17, 2016 11:29 pm **Posts:** 58

davidv1992 wrote:

Sourcer wrote:

Thank you for your reply.
You're right about GRUB not reporting regions whose size is a not power of 2, my bad.
Secondly, by '8-16 bytes' per page, do you mean you calculate how much available memory you have
and 'statically'(once) allocate a chunk of memory to hold that list? i thought about it too, but i'm not sure
what is the advantage of this approach. Is mapping the physical page to virtual memory is that expensive?(sure i'll need to flush the TLB, but is it..?)

At bootup, currently It takes stock of what regions GRUB reports, and for each of those it allocates enough pages to store all the bookkeeping needed for the pages in that region. The reason I went with this approach are the following:
1. It is a lot simpler to code. Not having to change page maps and the like removed a big chunk of potential bugs, mainly by removing the code needed to manage those mappings
2. It provides better cache locality. Because of the (admittedly somewhat simplistic) way I've implemented other parts of the kernel, it is relatively common for my page allocator to be called several times in quick succession to allocate a number of pages that don't necessarily need to be consecutive. By having the managing datastructures in one location, I have a better chance of hitting the cache for those on the later invocations in such a cycle.
3. This also ties into the third reason: TLB flushes are expensive. Even in the best-case scenario, when there is only a need to flush one entry, this will cause several extra memory accesses, and the fact that you are then immediately accessing the newly mapped page for a read ensures that one of those is almost guaranteed to be a direct read from memory, which incurs an extremely large performance penalty on modern x86 processors.

Of these, reason number 1 was the primary choice (I only have finite time, and I decided the easier development was worth the increase in memory use), though the other advantages are definitely nice to have.

Again, the other choice, using the pages themselves to keep the linked list structure is also valid, it will definitely save you some memory. It's a trade off.

Sourcer wrote:

About freeing, i thought about it just a bit, and i thought about keeping a 'state' of a buddy within its struct.
state could contain:
1. Split, right side.
2. Split, left side.
3. Not split(i.e. buddies are together)

This way when the user frees address (X), i could check its buddie's state(by a simple calculation), and restore buddies accordingly.

And someone also asked why use a buddy allocating: I tried to merge both of the 'stack allocation' approach and the buddy allocation. I use the free pages to store information about free pages(just like a stack), but i like to think about it like i have N stacks of N orders(2^N)

I have a few small points. If you keep proper track of starting points of the various regions handed to you by GRUB, you wont need to differentiate between cases 1 and 2. Instead, you can simply use the address itself to determine whether this block is the left or right block of its pair. This means that you can get away with a single bit per block, simply stating whether it is used or free. Again, [Knuth] has an excellent explanation of the details.

How do you calculate 'enough pages'? and do you bitmap or a list for keeping information about the pages?
Also, even in this approach you need to map the 'enough pages' to virtual memory(only once, though{)

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

Sourcer wrote:

How do you calculate 'enough pages'? and do you bitmap or a list for keeping information about the pages?
Also, even in this approach you need to map the 'enough pages' to virtual memory(only once, though{)

You have the memory map (from BIOS, GRUB, etc), once you've "normalized" it (removed overlaps, truncated to pages as you probably don't want to use partial pages, etc) you know how many pages of usable memory there is, and given that you've designed your own structs you can then multiply those two to know how many bytes you need and thru division you get to know how many pages you need. Or was it something more complex that you were asking?

Note, once you know how many pages you're going to need you need to decide how you're going to handle those pages themselves. So suppose you need 10 pages for the alloc data, are those pages going to be represented in the alloc data as well, if so you need to make sure that they are initially marked as allocated so you don't hand them out to some process. Or you can not include them in the alloc data at all and kind of "forget" that they even exist, this allows you to do an extremely micro optimization (which may be entirely pointless) as now the number of bytes needed for the alloc data is slightly lower.

Sourcer · **Joined:** Fri Jun 17, 2016 11:29 pm **Posts:** 58

LtG wrote:

Sourcer wrote:

How do you calculate 'enough pages'? and do you bitmap or a list for keeping information about the pages?
Also, even in this approach you need to map the 'enough pages' to virtual memory(only once, though{)

You have the memory map (from BIOS, GRUB, etc), once you've "normalized" it (removed overlaps, truncated to pages as you probably don't want to use partial pages, etc) you know how many pages of usable memory there is, and given that you've designed your own structs you can then multiply those two to know how many bytes you need and thru division you get to know how many pages you need. Or was it something more complex that you were asking?

Note, once you know how many pages you're going to need you need to decide how you're going to handle those pages themselves. So suppose you need 10 pages for the alloc data, are those pages going to be represented in the alloc data as well, if so you need to make sure that they are initially marked as allocated so you don't hand them out to some process. Or you can not include them in the alloc data at all and kind of "forget" that they even exist, this allows you to do an extremely micro optimization (which may be entirely pointless) as now the number of bytes needed for the alloc data is slightly lower.

This is obvious, what i meant is suppose you have a region sized X, X / PAGE_SIZE = 2^N. So basically a buddy allocator can divide this region 2^N times. For each time it divides it, you'll need another struct.(as they're now buddies, not the same region).

OSDev.org

allocating frames using buddy allocation

Who is online