Article on physical memory management

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

turdus wrote:

cyr1x,OSwhatever: I apologize, our teacher at the university said too much times (names differ on purpose). I thought it's a well known thing, although I wasn't 100% sure, that's why I wrote "physical memory" instead of "frame" on title. Now that I read your comments, I reread the AMD64 and the Intel64 manuals again, and they do not refer physical page as frame. It was my university only thing I guess, sorry. I wonder what it's called in Nelson's book.

EDIT: checked other documents as well, and that's interesting: all that's in my native language refers physical pages as frames consistently (I mean with the english word "frame"). English documents don't, they mix them.

Don't apologize to me, I'm just as confused as you are when it comes to the "page" and "frame" ambiguity. Was the "page frame" word a way to solve it?

FlashBurn · **Joined:** Fri Oct 20, 2006 10:14 am **Posts:** 313

turdus wrote:

FlashBurn wrote:

Another problem is, how will you allocate 4mb or 2mb pages?

Write your own page allocator to solve, this has nothing to do with it, as it is a frame allocator (article on physical memory, remember?). It does not solve the problem of woc either. The page allocator rely on a frame allocator, not backwards.

I don´t get the difference between frame and page, because they both mean the same to me. Your physical memory manager needs to manage also 2/4mb pages/frames if you want to use them!

Combuster · **Posted:** Tue Aug 16, 2011 1:12 am

turdus wrote:

Quote:

Furthermore you make the argument that a maximum of 512 fragments on physical memory is enough.

Stated empirically. It's rare that I see more than 3 fragments. I'm testing it for a while.

Which is why there is such a thing as worst case analysis. Your algorithm does not survive the worst case :wink:

Try the following reasonable usage:
Run two webservers (f.x. one hosting pages, the other hosting user avatars so that both get a lot of small files to serve), let both cache output data in RAM as much as possible. Run the system under load for a while, shut down one server, algorithm crashes under fragmentation, profit.

xenos · **Posted:** Tue Aug 16, 2011 1:26 am

turdus wrote:

I do not understand your point. "It isn't used for anything else"? Speak for yourself, I would rather use it for storage cache. And 1 page *is* smaller than 2^32*4 bytes.

I guess the point why the stack-based allocator consumes almost no memory is the following: The stack contains only the free pages. So if many pages are free (say, nearly 4 GB), you need a large stack to hold all of them (in this case 4 MB). However, spending lots of memory is not an issue when almost all of your physical memory is free.

Once you allocate memory, you pop more and more entries off the top of the stack. Thus, the memory which holds the top of the stack is not needed anymore and can be deallocated. The only information the stack needs to keep is the location of free pages, and when the number of free pages drops below 1024, the whole stack fits into a single 4 kB page and consumes almost no physical memory.

When you free pages again and the stack needs to grow, simply map freed pages to the top of the stack when necessary.

turdus · **Joined:** Tue Feb 08, 2011 1:58 pm **Posts:** 496

OSwhatever wrote:

Don't apologize to me, I'm just as confused as you are when it comes to the "page" and "frame" ambiguity. Was the "page frame" word a way to solve it?

It wouldn't. The point is, pages are in virtual memory, and form a linear address space, whilst frames are in physical memory and they do not form a continuous area. Hope it's clear now. We can call it as you like, just do not mix the two.

turdus · **Joined:** Tue Feb 08, 2011 1:58 pm **Posts:** 496

FlashBurn wrote:

I don´t get the difference between frame and page, because they both mean the same to me. Your physical memory manager needs to manage also 2/4mb pages/frames if you want to use them!

First: maybe if you understand the difference, you won't say such nonsenses. Physical memory manager has to deal with physical pages (allocating and freeing them), combining them into pages to form a per-thread address space is quite a different task. The latter calls the former if it needs memory for the paging structures, end of relation.

To clearify:
1. physical memory manager (kernel space, handles fix size blocks, allocates/deallocates frames)
2. virtual memory manager (kernel space, handles variable size areas (which are multiple of fix size blocks), resizes address space)
3. malloc/free (user space, handles any size of data, allocates/deallocates memory within bss section of a specific address space)

If 3. run out of space, it perform a brk syscall to invoke 2. As an opposite if 2. runs out of free frames, it calls 1. directly, no syscall involved. These are 3 different kind of memory managers, for different purposes. Furthermore, 1. has nothing to do with threads, while 2. and 3. have to know which paging structure belong to the current running thread.

turdus · **Joined:** Tue Feb 08, 2011 1:58 pm **Posts:** 496

Combuster wrote:

Which is why there is such a thing as worst case analysis. Your algorithm does not survive the worst case :wink:

...algorithm crashes under fragmentation, profit.

You are keep telling this, but without explanation. Tell me, at which line of code would it crash? At the part that's marked by "//safety check", followed by an is-fifo-full check? Don't be ridiculous.

FlashBurn · **Joined:** Fri Oct 20, 2006 10:14 am **Posts:** 313

@turdus

You don´t get it. If your pmm only supports 4kb pages you can´t use 2/4mb pages and yeah your pmm also needs to know of 2/4mb pages (physical one, because they need to be continuous and need to start at a 2/4mb address), because this can´t be handled (in a good performance way) in a layer above.

If you think your right, prove that you can use 2/4mb pages with a pmm which can only handle 4kb pages.

turdus · **Joined:** Tue Feb 08, 2011 1:58 pm **Posts:** 496

XenOS wrote:

I guess the point why the stack-based allocator consumes almost no memory is the following: The stack contains only the free pages. So if many pages are free (say, nearly 4 GB), you need a large stack to hold all of them (in this case 4 MB). However, spending lots of memory is not an issue when almost all of your physical memory is free.

Once you allocate memory, you pop more and more entries off the top of the stack. Thus, the memory which holds the top of the stack is not needed anymore and can be deallocated. The only information the stack needs to keep is the location of free pages, and when the number of free pages drops below 1024, the whole stack fits into a single 4 kB page and consumes almost no physical memory.

When you free pages again and the stack needs to grow, simply map freed pages to the top of the stack when necessary.

Correct, but you compare the best case (free pages drops below 1024) with the worst case (fragments exceed 512). And you can handle fifo's buffer dynamically as well. The point is that you have to allocate/deallocate memory for stack (4k-4M), while fifo can work on 1 or 2 preallocated pages as well. Let's see a worst case where same amount of memory required:
stack: all memory free, 4M
fifo: contains more than half million fragements, 4M
And the difference is even bigger if it comes to 64 bit. The stack would consume 256G for full RAM (2^48, 256Tb), while fifo can handle all of it in 4M easily. Scalability matters! Do not forget that the free frame stack (as well as fifo) has to be filled in at boot time when almost every frame is free!

turdus · **Joined:** Tue Feb 08, 2011 1:58 pm **Posts:** 496

FlashBurn wrote:

If you think your right, prove that you can use 2/4mb pages with a pmm which can only handle 4kb pages.

Sorry, but are you dumb? Prove that you push 2/4mb pages to your 4k stack?!?

I really do not understand why is it so difficult. The algorithm *DOES NOT* specify block size, it can be anything (as long as hw supports it)! If you want to have different units, you have to write your page allocator (at a higher level on top of this) to support more fifos, just like you'd do with stacks (one for 4k, one for 2/4mb, capisci?).

davidv1992 · **Joined:** Thu Jul 05, 2007 8:58 am **Posts:** 223

turdus wrote:

davidv1992 wrote:

The fact that it is in the kernel will guarantee that almost everything that isn't used on every interrupt will probably not reside in the cache on first access anyway, so the cache argument is moot

Allocating large amount of frames would. As you pop new values from stack, you're increasing the stack pointer constantly, it's most likely that you will cross a page border sooner or later.

Once again, in the situation of high load (which is where it would most matter) these pages probably won't be in the TLB anyway (and they definnitely will not be in the processors cache), so it will not matter

Quote:

The amount of memory you save because the amount of data on the stack is smaller is moot because that memory isn't used for anything else anyway.

I do not understand your point. "It isn't used for anything else"? Speak for yourself, I would rather use it for storage cache. And 1 page *is* smaller than 2^32*4 bytes.

When that memory is used for storage cache it isn't free, and thus shouldn't be in the structure that manages free memory anyway.

Quote:

Furthermore you make the argument that a maximum of 512 fragments on physical memory is enough.

Stated empirically. It's rare that I see more than 3 fragments. I'm testing it for a while.

As already stated by several others, the important case is that of a system under high load and/or running for long amounts of time. Even the laptop I'm writing this on regularly spends hours being on and running several tasks, most of which are system tasks. Fragmentation will happen, and the potential ammount of fragmentation is WAY bigger than what you are allowing for, even on systems with relatively little memory.

Quote:

...in which case your system fails horribly.

Where did you get that from?!? First: you can use more space for fifo, second: if this occurs, it calls defragmenter and/or swapper. If you can't write a swapper of your own, your implementation will fail horribly, I agree on that.

The moment you need to run a defragmenter you will be taking quite a bit of the memory bandwith of the system, which will seriously hurt performance. Of course you could implement some sort of queuing of free pages that cannot be returned to the main pool and have background tasks handle them, but that queue itself might grow quite big, and would certainly add a lot of complexity to the memory system (ie. greater chance for bugs). More space for the FIFO won't solve the problem unless one reserves a lot of memory for it.

Quote:

This all combined with the fact that it is a far more complex system than the other two I would strongly recommend against using it.

Don't like it, don't understand it, so don't use it.

If any of the points I made point to ignorance on my part I apologize, but I do believe that I have a reasonable understanding of your system, enough to state those things I see as flaws.

turdus wrote:

XenOS wrote:

I guess the point why the stack-based allocator consumes almost no memory is the following: The stack contains only the free pages. So if many pages are free (say, nearly 4 GB), you need a large stack to hold all of them (in this case 4 MB). However, spending lots of memory is not an issue when almost all of your physical memory is free.

Once you allocate memory, you pop more and more entries off the top of the stack. Thus, the memory which holds the top of the stack is not needed anymore and can be deallocated. The only information the stack needs to keep is the location of free pages, and when the number of free pages drops below 1024, the whole stack fits into a single 4 kB page and consumes almost no physical memory.

When you free pages again and the stack needs to grow, simply map freed pages to the top of the stack when necessary.

Correct, but you compare the best case (free pages drops below 1024) with the worst case (fragments exceed 512). And you can handle fifo's buffer dynamically as well. The point is that you have to allocate/deallocate memory for stack (4k-4M), while fifo can work on 1 or 2 preallocated pages as well. Let's see a worst case where same amount of memory required:
stack: all memory free, 4M
fifo: contains more than half million fragements, 4M
And the difference is even bigger if it comes to 64 bit. The stack would consume 256G for full RAM (2^48, 256Tb), while fifo can handle all of it in 4M easily. Scalability matters! Do not forget that the free frame stack (as well as fifo) has to be filled in at boot time when almost every frame is free!

This is one point you do have. At boot time stack cost some ammount of time to fill, and your system is no doubt faster. However as this is a one time penalty at boot and the process of pushing pages of the stack isn't that time consuming I doubt it matters much, unless you get at really big ammounts of memory, in which case it could be mitigated by doing the initial allocations of each frame from a different structure than the stack without much effort.

cyr1x · **Posted:** Tue Aug 16, 2011 3:35 am

turdus wrote:

stack: all memory free, 4M

I doubt it. A stack will require no more than a pointer of 4-bytes or 8-bytes in size and on allocation/deallocation a single "virtual" page to map the top of the stack. A stack actually *IS* the most effective way to store free frames.

FlashBurn · **Joined:** Fri Oct 20, 2006 10:14 am **Posts:** 313

turdus wrote:

FlashBurn wrote:

If you think your right, prove that you can use 2/4mb pages with a pmm which can only handle 4kb pages.

Sorry, but are you dumb? Prove that you push 2/4mb pages to your 4k stack?!?

I really do not understand why is it so difficult. The algorithm *DOES NOT* specify block size, it can be anything (as long as hw supports it)! If you want to have different units, you have to write your page allocator (at a higher level on top of this) to support more fifos, just like you'd do with stacks (one for 4k, one for 2/4mb, capisci?).

First, don´t be so aggressive, calm down! And then, when did I say that I use the classic stack approach? My pmm can handle 4kb and 2/4mb pages (and I´m using stacks for this).

What I don´t get is, how is your algorithm faster when it gets to merging than others (like my stacks)? As I see it, it is ways slower (O(n)) and the algorithm you posted at the beginning has one more problem, the fragments in your fifo aren´t sorted, so you always need to scan all fragments, only to know if you can merge. This is very slow only for getting some more free mem, when the system doesn´t use it, because as said, if most of the pages are used, a stack will only use as much space as needed.

In your pseudo code, does "return OK" mean the function returns and you wont scan more buffer elements? Also if you can merge 2 blocks and the page to free to one block than you need to delete one block out of your fifo and this also is very slow.

The next thing is, if we take the worst case, 1 page free, 1 page used, 1 page free, ..., your approach consumes more memory than a stack, because your approach needs 8bytes per page and a normal stack needs 4bytes per page (assuming 32bit adresses). As said, running a defragmenter isn´t an option if the system is under heavy load and it consumes memory (which then counts to your memory footprint).

gerryg400 · **Posted:** Tue Aug 16, 2011 3:56 am

cyr1x wrote:

turdus wrote:

stack: all memory free, 4M

I doubt it. A stack will require no more than a pointer of 4-bytes or 8-bytes in size and on allocation/deallocation a single "virtual" page to map the top of the stack. A stack actually *IS* the most effective way to store free frames.

True dat. Originally Turdus said

Code:

Code:
       |    asymp.    | resource
       +-------+------+ requirement
       | alloc | free | (memory footprint)
-------+-------+--------------------
ideal  |  Θ(1) | Θ(1) | RR(1)
bitmap |  Θ(n) | Θ(n) | RR(n)
stack  |  Θ(1) | Θ(1) | RR(n)          (RR(bitmap)<RR(stack))

Actually a stack is

Code:

Code:
stack  |  Θ(1) | Θ(1) | 1 pointer    (RR(bitmap) >> RR(stack))

A stack uses only 1 pointer. That's it.

However, I still prefer to use doubly linked lists for my physical memory. De-allocation is faster than O(1). I found that I tended to allocate pages 1 (or a few) at a time, but de-allocate in big chunks when processes are killed or a file is closed. If all the pages in a process are linked together, you can return the entire list with 4 pointer assignments.

OSwhatever · **Joined:** Mon Jul 05, 2010 4:15 pm **Posts:** 595

I had my thoughts on a multi level stack (for different page sizes). The benefit of a stack is that you can make completely lockless on most hardware when you pop pages from it, simple and fast. Then there is the problem when you coalescence blocks that were put back into the stack again. The problem is that it is pretty slow and when you do it you must lock the entire page allocator when you do it, so if someone needs a page really fast it will be locked out. You can have some kind of hybrid. A cache with pages using a lockless stack, then you have a page allocator with merge capability.

OSDev.org

Article on physical memory management

Who is online