(Answered)Heap/Pool Dynamic Memory Allocation Implementation

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

Hello everybody!

I have a conceptual dilemma on how to implement this thing. I don't know where to start, what to look for. I already searched quite a few online pages, but they don't talk about the implementation itself.
How to implement it? To use a double linked list? I know for sure that I do not and will not want port an existing allocator, just don't try to convince me. Do I need to look at other OS specific implementations and try to kind of
"clone" them or should I come up with my own idea? The problem is that I don't even know how it technically works and what it does underneath.
I have quite a few tabs open with different source files to look at, but the problem is that they mostly rely on James Molloy's implementation. Is that implementation even worth taking a look at?
I know this is one really big question, I really need help. I can't continue working on my OS until I have something as important as this working perfectly.
Also I've read about best fist, first fit algorithms. I for sure don't want to waste any memory to fragmentation.
So I would have to find the hole that satisfies the request, take requested + overhead and then split that hole leaving the extra for future allocation.
My primary goal right now is the be able to understand every little detail on how it works and how to implement it. Thanks for understanding!

Ch4ozz · **Joined:** Mon Jul 18, 2016 2:46 pm **Posts:** 170

Quote:

How to implement it? To use a double linked list?

If you wanna waste hours of searching the memory then do that.
The standard data structure for virtual memory is a balanced tree, usually AVL trees because they are the fastest within this category followed by red-black trees.
Their timings are safe O(log(n)) and can be better for insert and delete [O(1)]
Just as comparison, a doubly linked list has O(n), this is a pretty big gap especially for big n.

Looking at other code might not help to understand the basic concept, you better try to read about bigger OS' such as unix or windows.
In windows this tree is called the VAD (Virtual Address Descriptor): https://dfrws.org/sites/default/files/s ... memory.pdf

So, how does a virtual memory manager work?
Its a simple list noting down any memory allocation for each process.
If you understood virtual memory itself you should also know how a manager would work.
Finding a good place for your memory in this tree can be improved later, you should focus on understanding the important part for now.

zaval · **Posted:** Wed Jun 21, 2017 6:51 am

According to this paper, Splay trees are faster than AVL-trees or red black trees. This is because, in this scenario, of the memory management, it's very probable that a recently reserved/commited block will be accessed next. Splay trees put such blocks near the root, and thus give considerably faster times than others. even though they are not as balanced as AVL or RB.
I am going to use them for my VAD trees.

But there is also a PFN database, ... and that's abysmal, when your are faced with the different page size case.

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

I see problems, lots of problems. This is even worse, never heard about some of those binary list things, how am I supposed to use them and implement them? Then if linked lists are so bad why do people use them massively? Are they really that slow that it is noticeable? What about ordered list/array that James used in his tutorial series, is that any good?

Ch4ozz · **Joined:** Mon Jul 18, 2016 2:46 pm **Posts:** 170

Quote:

I see problems, lots of problems. This is even worse, never heard about some of those binary list things, how am I supposed to use them and implement them? Then if linked lists are so bad why do people use them massively? Are they really that slow that it is noticeable? What about ordered list/array that James used in his tutorial series, is that any good?

They are used when they fit, all data structures exist for a certain reason, and linked lists are not meant for something like this.
They for example have a fast insertion and deletion time O(1) while traversing them is O(n).
That is perfectly fine when you have to iterate the whole thing anyways, but when searching for free space this is not suitable because it can be done faster.
I dont know how his ordered list works, but I dont think its the right approach either.
You can easily copy an implementation of those data structures, but please, read up on how they work (atleast a bit) and remember their best average and worst case speeds for your needed scenarios.

Side note: OSDev is meant (imo) for people who are coding for a long time at system level already, so they usually already know alot of the used concepts and can implement them without even thinking about it.

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

Ch4ozz wrote:

Quote:

I see problems, lots of problems. This is even worse, never heard about some of those binary list things, how am I supposed to use them and implement them? Then if linked lists are so bad why do people use them massively? Are they really that slow that it is noticeable? What about ordered list/array that James used in his tutorial series, is that any good?

They are used when they fit, all data structures exist for a certain reason, and linked lists are not meant for something like this.
They for example have a fast insertion and deletion time O(1) while traversing them is O(n).
That is perfectly fine when you have to iterate the whole thing anyways, but when searching for free space this is not suitable because it can be done faster.
I dont know how his ordered list works, but I dont think its the right approach either.
You can easily copy an implementation of those data structures, but please, read up on how they work (atleast a bit) and remember their best average and worst case speeds for your needed scenarios.

Side note: OSDev is meant (imo) for people who are coding for a long time at system level already, so they usually already know alot of the used concepts and can implement them without even thinking about it.

Okay, I will forget about linked lists. Then what to choose from suggested? What is the easiest tree to implement? What is that specific tree supposed to hold, branches? What are branches supposed to hold then? I always try to learn and understand as much of something before I try to actually implement it. I've never seen an actual implementation of this? So is there an OS I can take a look at to see how to implement that? Do you know for a good page/paper/document that explains how binary trees I need work in C++?
Also does anybody think this ordered list implementation is worth taking a look at?

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

Ch4ozz wrote:

Okay, I will forget about linked lists.

You will need them. They are the most commonplace data structure after basic arrays.

Ch4ozz wrote:

What is the easiest tree to implement? What is that specific tree supposed to hold, branches?

Trees are implementations for ADT (abstract data structure) called ordered dictionary (a.k.a. ordered set). Books on algorithms and data structures (or web tutorials) discuss those. Trees are not specific to OS development and solve a very broad generic problem.

Ch4ozz wrote:

Also does anybody think this ordered list implementation is worth taking a look at?

This is akin to the Windows kernel pool implementation. But appears to rely on magic number to detect freed memory, etc. The basic idea is the same though.

Which brings me to another question - do you recognize the difference between page allocation and sub-page allocation. If not, you need to first understand how memory management in OSes works in general. How processes are provided with pages, etc.

The page allocator on Windows is implemented using a bitmap. On Linux, a buddy-allocator system is used. For sub-page allocator, Windows uses coalescing of adjacent blocks like the one in your link above. Linux uses something called a "slab allocator", that is also called memory pools in broader context. I suggest researching those a bit.

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

simeonz wrote:

Ch4ozz wrote:

Okay, I will forget about linked lists.

You will need them. They are the most commonplace data structure after basic arrays.

Ch4ozz wrote:

What is the easiest tree to implement? What is that specific tree supposed to hold, branches?

Trees are implementations for ADT (abstract data structure) called ordered dictionary (a.k.a. ordered set). Books on algorithms and data structures (or web tutorials) discuss those. Trees are not specific to OS development and solve a very broad generic problem.

Ch4ozz wrote:

Also does anybody think this ordered list implementation is worth taking a look at?

This is akin to the Windows kernel pool implementation. But appears to rely on magic number to detect freed memory, etc. The basic idea is the same though.

Which brings me to another question - do you recognize the difference between page allocation and sub-page allocation. If not, you need to first understand how memory management in OSes works in general. How processes are provided with pages, etc.

The page allocator on Windows is implemented using a bitmap. On Linux, a buddy-allocator system is used. For sub-page allocator, Windows uses coalescing of adjacent blocks like the one in your link above. Linux uses something called a "slab allocator", that is also called memory pools in broader context. I suggest researching those a bit.

I didn't literary mean that I would forget them forever.

I was talking about not using them for this specific case.
I know they are not specific to OS development, I was wondering how to relate them to OS development.
So you think it is okay? If it is okay I might think of basing my implementation on it.
I don't quite use those expressions so I don't know what are you talking about. There are 3 layers of memory management: physical memory management (already implemented), virtual memory management (already implemented) and dynamic memory management (currently topic). Yes I am talking about memory pool, the heap that is the point of this topic.

Is it worth spending endless days and nights trying to understand something you've never heard of? Is it really going to pay off? What do you guys think? I do have plenty of free time but still is it worth spending 10 days on 1 task rather than 10 days on 10 tasks?

dchapiesky · **Joined:** Sun Dec 25, 2016 1:54 am **Posts:** 204

Here is an extremely useful allocator which you point at a region of memory and it manages that region of memory. Super simple. Super easy. No "porting". No mess. BSD licenced

https://github.com/mattconte/tlsf

Here is a nice paper on the subject but they released GPL'd code... http://www.gii.upv.es/tlsf/files/ecrts04_tlsf.pdf

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

dchapiesky wrote:

Here is an extremely useful allocator which you point at a region of memory and it manages that region of memory. Super simple. Super easy. No "porting". No mess. BSD licenced

https://github.com/mattconte/tlsf

Here is a nice paper on the subject but they released GPL'd code... http://www.gii.upv.es/tlsf/files/ecrts04_tlsf.pdf

That one looks quite nice but, it is copyrighted, which doesn't suit me. I like to write my own code, else where is the challenge? I also see that this is not an ordinary algorithm and that some thought has been put into it.
So I decided to seriously consider an AVL tree as a way to go.

Here are some AVL questions:
1.Does it get really badly fragmented?
2.Do nodes have to be of a specific value. Like do they have to go like 2 4 8 16 32 64 and each one of those has its twin?
3.A picture of the tree with nodes being heap related?
4.I still don't see how a tree can be contiguous. It can only be contiguous if there are 4194304 nodes -> 1 for each byte from 0x000001 to 0x400000. (4 GB of physical RAM example)??
5.How can I return an address when it is not even inserted? Do I need to insert thousands and thousands of nodes all at once so they can be accessed later? Is each node equal to 1 byte + overhead?

So far I know that I need a tree/structure that has 2 branches each one of those branches has to be equally "weighted" so the tree/struct is balanced. Also I need a way to balance the tree, left left, right right, left right and right left switching algorithms.

Edit: I decided to drop everything from above and use an ordered list. AVL trees are just way above my level. This is just a temporary solution until I figure out how AVL trees work more deeply. There are just too many question that I don't know how to answer / can't find an answer. Thanks for trying to help, I might post some updates as I go along.

LiamT · **Joined:** Sun Jul 12, 2015 2:21 am **Posts:** 9

Hiya

Octacone wrote:

Hello everybody!

I have a conceptual dilemma on how to implement this thing. I don't know where to start, what to look for. I already searched quite a few online pages, but they don't talk about the implementation itself.
How to implement it? To use a double linked list? I know for sure that I do not and will not want port an existing allocator, just don't try to convince me. Do I need to look at other OS specific implementations and try to kind of
"clone" them or should I come up with my own idea? The problem is that I don't even know how it technically works and what it does underneath.
I have quite a few tabs open with different source files to look at, but the problem is that they mostly rely on James Molloy's implementation. Is that implementation even worth taking a look at?
I know this is one really big question, I really need help. I can't continue working on my OS until I have something as important as this working perfectly.
Also I've read about best fist, first fit algorithms. I for sure don't want to waste any memory to fragmentation.
So I would have to find the hole that satisfies the request, take requested + overhead and then split that hole leaving the extra for future allocation.
My primary goal right now is the be able to understand every little detail on how it works and how to implement it. Thanks for understanding!

What exactly are you trying to implement at the moment? There are a lot of levels to memory management in a kernel, including, but not necessarily limited to, physical memory allocation, address space allocation/vmm, and then specific heap allocators. There are also a lot of different methods for each of these, with their own common and uncommon methods of implementation. It seems to me like you're trying to implement both a virtual memory manager, and a malloc/free style discrete allocator (for your kernel space?).

Algorithms like the ones used in tlsf, and other discrete heap allocators are rather complex, and usually involve some kind of mix between free lists, binning, trees, and doubly linked lists so as to either minimize time, memory waste, or both. You can work for as long as you want at writing an allocator, and still never have a perfect (or necessarily even good) implementation of one. I suggest that if you want to learn how to build a discrete heap allocator, you do so in user space, rather than trying to do it along side the rest of a kernel.

AVL trees, RB trees, and siblings of them, on the other hand, are a typical way of managing address spaces. (As has been mentioned before, linux uses RB trees for it's virtual memory manager, and if i'm not mistaken, also for the physical memory manager. These binary trees represent contiguous memory by existing in a sorted state, where each nth node from the left represents the next region of memory. That way, if you traverse the (sorted) tree in left-to-right order, you will cover the memory region you're managing completely and sequentially. The specific advantage of balanced binary trees is, of course, their worst case O(log n) for search, addition, and removal. In practice, this is usually pretty efficient in a VMM or PMM situation. That is what I personally use in my PMM and VMM, along with a set of size-binned doubly linked list for free zones. If you'd like to see that implementation, it's here and here, but it is rather complex. I suggest that if you want to learn about AVL trees, again, do it in user space first. That's how I did it.

As for fragmentation, It's important to understand the problem before you try and find a solution. Basically, the problem with fragmentation is that a program might allocate two sequential chunks of memory and free the first, but not the second. That means that your first free block is stuck at the size of the first allocation until the program frees the second block. This will happen no matter what allocation algorithm you use. What makes an allocator good at circumventing fragmentation is the ability to recognize and combine two adjacent free blocks of memory. balanced/sorted binary trees are nice for this because you can very quickly find the next sequential chunk of memory on the free list, and if it's sequential to another free chunk, combine it and rebalance the tree. This means that fragmentation is kept to a minimum, because you don't allow two adjacent free blocks to be disjointed.

I recommend figuring out which part of your kernel you're trying to implement, and then trying to solve the problem in user space first. It's a lot easier to debug an allocator in userland than in a custom kernel, especially if you haven't set up proper kernel debugging yet.

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

LiamT wrote:

Hiya

Octacone wrote:

Hello everybody!

I have a conceptual dilemma on how to implement this thing. I don't know where to start, what to look for. I already searched quite a few online pages, but they don't talk about the implementation itself.
How to implement it? To use a double linked list? I know for sure that I do not and will not want port an existing allocator, just don't try to convince me. Do I need to look at other OS specific implementations and try to kind of
"clone" them or should I come up with my own idea? The problem is that I don't even know how it technically works and what it does underneath.
I have quite a few tabs open with different source files to look at, but the problem is that they mostly rely on James Molloy's implementation. Is that implementation even worth taking a look at?
I know this is one really big question, I really need help. I can't continue working on my OS until I have something as important as this working perfectly.
Also I've read about best fist, first fit algorithms. I for sure don't want to waste any memory to fragmentation.
So I would have to find the hole that satisfies the request, take requested + overhead and then split that hole leaving the extra for future allocation.
My primary goal right now is the be able to understand every little detail on how it works and how to implement it. Thanks for understanding!

What exactly are you trying to implement at the moment? There are a lot of levels to memory management in a kernel, including, but not necessarily limited to, physical memory allocation, address space allocation/vmm, and then specific heap allocators. There are also a lot of different methods for each of these, with their own common and uncommon methods of implementation. It seems to me like you're trying to implement both a virtual memory manager, and a malloc/free style discrete allocator (for your kernel space?).

Algorithms like the ones used in tlsf, and other discrete heap allocators are rather complex, and usually involve some kind of mix between free lists, binning, trees, and doubly linked lists so as to either minimize time, memory waste, or both. You can work for as long as you want at writing an allocator, and still never have a perfect (or necessarily even good) implementation of one. I suggest that if you want to learn how to build a discrete heap allocator, you do so in user space, rather than trying to do it along side the rest of a kernel.

AVL trees, RB trees, and siblings of them, on the other hand, are a typical way of managing address spaces. (As has been mentioned before, linux uses RB trees for it's virtual memory manager, and if i'm not mistaken, also for the physical memory manager. These binary trees represent contiguous memory by existing in a sorted state, where each nth node from the left represents the next region of memory. That way, if you traverse the (sorted) tree in left-to-right order, you will cover the memory region you're managing completely and sequentially. The specific advantage of balanced binary trees is, of course, their worst case O(log n) for search, addition, and removal. In practice, this is usually pretty efficient in a VMM or PMM situation. That is what I personally use in my PMM and VMM, along with a set of size-binned doubly linked list for free zones. If you'd like to see that implementation, it's here and here, but it is rather complex. I suggest that if you want to learn about AVL trees, again, do it in user space first. That's how I did it.

As for fragmentation, It's important to understand the problem before you try and find a solution. Basically, the problem with fragmentation is that a program might allocate two sequential chunks of memory and free the first, but not the second. That means that your first free block is stuck at the size of the first allocation until the program frees the second block. This will happen no matter what allocation algorithm you use. What makes an allocator good at circumventing fragmentation is the ability to recognize and combine two adjacent free blocks of memory. balanced/sorted binary trees are nice for this because you can very quickly find the next sequential chunk of memory on the free list, and if it's sequential to another free chunk, combine it and rebalance the tree. This means that fragmentation is kept to a minimum, because you don't allow two adjacent free blocks to be disjointed.

I recommend figuring out which part of your kernel you're trying to implement, and then trying to solve the problem in user space first. It's a lot easier to debug an allocator in userland than in a custom kernel, especially if you haven't set up proper kernel debugging yet.

I already have both PMM and VMM implemented and working. That is not what I am talking about. Specifically I am talking about Malloc/Free/Realloc/Calloc functions and their implementation. I am not talking about user space. I am trying to implement a dynamic memory/heap/pool/whatever allocator for my kernel. I am not referring to user space. There is a long way until I can even think of user space. What I need is malloc so my kernel can say hey memory manager give me 27 bytes of memory so I can use it for this specific struct. The thing I am writing is meant for the entire OS global heap/pool/whatever you call it allocator that both kernel and user space can use. The point of this topic turned into how to store memory data sort of. Well, it doesn't matter anymore. I decided to go for an easier solution until I "grow up" (abstraction-ally) so I can do it by myself the "best" way possible. Thanks for spending time trying to explain it a bit more.

zaval · **Posted:** Thu Jun 22, 2017 6:05 am

I would not say, learning about balanced trees is that hard. you shouldn't give up. there is such a book, by Rivest and co, that huge and famous 1200 pages Bible, on algorithms, there RB trees described in a very easy to understand way. You even get to know how to implement tree operations in the iterative manner, more efficient than recursive.
but be aware that of that trinity, AVL trees are the most strict in balancing, thus they are kind of the hardest to learn and implement. And again, Stanford guys say, Splay trees are faster on VM management.

Basically I am going to use all of them, and thus need to implement. I already have RB trees in the UEFI protocol dictionary (but it's easy since it doesn't involve freaking deletion - the hardest thing on balanced trees). As I said I will be using splay trees for VADs. And, AVLs they have their applications where they are the best. That paper examples some TCP/IP stuff where they shine. So, you will need to learn them all.)

LiamT · **Joined:** Sun Jul 12, 2015 2:21 am **Posts:** 9

Octacone wrote:

LiamT wrote:

...

I recommend figuring out which part of your kernel you're trying to implement, and then trying to solve the problem in user space first. It's a lot easier to debug an allocator in userland than in a custom kernel, especially if you haven't set up proper kernel debugging yet.

I already have both PMM and VMM implemented and working. That is not what I am talking about. Specifically I am talking about Malloc/Free/Realloc/Calloc functions and their implementation. I am not talking about user space. I am trying to implement a dynamic memory/heap/pool/whatever allocator for my kernel. I am not referring to user space. There is a long way until I can even think of user space. What I need is malloc so my kernel can say hey memory manager give me 27 bytes of memory so I can use it for this specific struct. The thing I am writing is meant for the entire OS global heap/pool/whatever you call it allocator that both kernel and user space can use. The point of this topic turned into how to store memory data sort of. Well, it doesn't matter anymore. I decided to go for an easier solution until I "grow up" (abstraction-ally) so I can do it by myself the "best" way possible. Thanks for spending time trying to explain it a bit more.

Sure. What I mean by trying it first in user space though is to do it in a regular program on an already fully functioning OS, like wherever you write your kernel code. You might try writing a simple application, but replacing your default c library allocator with the one you write. The use case is almost exactly the same from an API point of view, and you will get a much better environment to debug your work than a homebrew kernel.

In any case, the wiki has a pretty good page on hooking in an existing discrete allocator. http://wiki.osdev.org/Memory_Allocation. I chose to do this in my kernel using LibAlloc.
(In my case, the hook functions are a tad more complex because I switch the lock that the allocator uses at a well-defined time in kernel setup.) This file is where I define the allocator hooks into the kernel.

Octacone · **Joined:** Fri Aug 07, 2015 6:13 am **Posts:** 1134

zaval wrote:

I would not say, learning about balanced trees is that hard. you shouldn't give up. there is such a book, by Rivest and co, that huge and famous 1200 pages Bible, on algorithms, there RB trees described in a very easy to understand way. You even get to know how to implement tree operations in the iterative manner, more efficient than recursive.
but be aware that of that trinity, AVL trees are the most strict in balancing, thus they are kind of the hardest to learn and implement. And again, Stanford guys say, Splay trees are faster on VM management.

Basically I am going to use all of them, and thus need to implement. I already have RB trees in the UEFI protocol dictionary (but it's easy since it doesn't involve freaking deletion - the hardest thing on balanced trees). As I said I will be using splay trees for VADs. And, AVLs they have their applications where they are the best. That paper examples some TCP/IP stuff where they shine. So, you will need to learn them all.)

I will definitely not give up. I really want to learn about different algorithms because they are very applicable and usable. What is the full title of that book? Is it worth buying it? Right now I am not capable enough to understand such a deep concept, but in the future as my knowledge grows and expands I think I will be able to understand such abstract concepts.

OSDev.org

(Answered)Heap/Pool Dynamic Memory Allocation Implementation

Who is online