Is it possible to decouple paging and kernel heap?

wlnirvana · **Joined:** Mon Feb 12, 2018 10:09 pm **Posts:** 8

zaval wrote:

I am not Brendan, of course, bu I'll say too. For process page tables and directories, you could use the kernel pool. You can have 1 process or 1000, so this dynamic allocation entity is good for it. I haven't done any process things yet, but I am going to do it this way.
Reserving might be needed for system page tables, those that describe the kernel part of an address space. It begins at the OS loader time, ultimately it's OS loader, that creates the page directory for the kernel space. and then switches to that space and jumps into kernel.

Hi Zaval, thank you very much for the response. All comments and ideas are very welcome. And I personally tend to agree with you that paging data structures for user-space processes (not for the kernel itself!) should be allocated/deallocated on demand, which means a dynamic allocation entity (or heap). The only thing that I am a bit concerned about is that this heap and the paging mechanism would be inter-dependent, which does not seem a very elegant solution.

wlnirvana · **Joined:** Mon Feb 12, 2018 10:09 pm **Posts:** 8

Brendan wrote:

That's mostly correct; except that you're forgetting that there's a "page_directory_mapping_area_base", so:

Sorry I was assuming page_directory_mapping_area_start (which I believe is the same with page_directory_mapping_area_base) is 0, thus my previous calculation.

Thank you for pointing out the mistake as well as the confirmation, but this is still kind of surprising, because

With this permanent mapping approach, to avoid kernel heap usage in the virtual memory manager (or maybe just in the paging map logic of the virtual memory manager?), the kernel has to reserve space. Reservation is of course acceptable, after all 65536 processes only take up 32MB from page_directory_mapping_area_base to page_directory_mapping_area_base + 0x0FFFFFFF. Harder is that a hard limit of maximum number of processes must be set, otherwise the kernel would not know where to end the reservation. Well, after speaking this out, it does not sound hard any more. But previously I thought kernels generally would be happy to spawn infinitely many processes as long as the resource permits.
Although paging directories are permanently mapped, what about the second level paging tables?. Each directory may contain 1024 tables if fully allocated, which is 1024 physical pages. Should these second level paging tables be managed dynamically with a kernel heap? If so, the virtual memory manager would fall back to depend on the kernel heap.

~ · **Joined:** Tue Mar 06, 2007 11:17 am **Posts:** 1225

You could design the paging mechanism as a subsystem of the kernel, an optional layer of the memory management. You could always manage memory chunks that are page aligned and page sized but you could choose to turn paging on or off.

Paging is similar to formatting a disk partition that grows gradually, so you could see it as a very crude file system for memory (yet advanced for memory management) that doesn't necessarily use directories and file names to track memory chunks, although you are free to design it as you can.

zaval · **Posted:** Wed Mar 07, 2018 12:20 pm

wlnirvana wrote:

Hi Zaval, thank you very much for the response. All comments and ideas are very welcome. And I personally tend to agree with you that paging data structures for user-space processes (not for the kernel itself!) should be allocated/deallocated on demand, which means a dynamic allocation entity (or heap). The only thing that I am a bit concerned about is that this heap and the paging mechanism would be inter-dependent, which does not seem a very elegant solution.

But it's not interdependent. For kernel pools it's just data, pool allocation function don't rely on this data. when your first user process is ready to start, kernel pools are fully functional.
Since you want to use paging (on disk paging), and since you want to have a complex page dynamics, thus having soft page faults, the number of page tables is a function of number of processes, and not physical (RAM) memory. That means, you are faced with unknown memory amounts you'd need for page tables. Pools are designed for dynamic allocations.
Every kernel page table (those, describing kernel part of the space) might go into a reserved zone for this purpose. This way you achieve decoupling kernel pool management from paging.
Of course, there is another tricky thing that might occur, - different processes have different views of the kernel areas. Process A can access address X, process B can't. I don't get it yet if this is needed, but it for sure will bring dynamic space requirements for the kernel page tables too. If this is worth of it, it is resolvable and needs additional efforts (may be postponed for later phases).

Brendan · **Posted:** Wed Mar 07, 2018 6:07 pm

Hi,

wlnirvana wrote:

Brendan wrote:

That's mostly correct; except that you're forgetting that there's a "page_directory_mapping_area_base", so:

Sorry I was assuming page_directory_mapping_area_start (which I believe is the same with page_directory_mapping_area_base) is 0, thus my previous calculation.

Thank you for pointing out the mistake as well as the confirmation, but this is still kind of surprising, because

With this permanent mapping approach, to avoid kernel heap usage in the virtual memory manager (or maybe just in the paging map logic of the virtual memory manager?), the kernel has to reserve space. Reservation is of course acceptable, after all 65536 processes only take up 32MB from page_directory_mapping_area_base to page_directory_mapping_area_base + 0x0FFFFFFF. Harder is that a hard limit of maximum number of processes must be set, otherwise the kernel would not know where to end the reservation. Well, after speaking this out, it does not sound hard any more. But previously I thought kernels generally would be happy to spawn infinitely many processes as long as the resource permits.

I consider that a feature - a hard limit provides a defence against resource leaks (e.g. "fork bomb" causing kernel to consume all of kernel space for new processes and causing other parts of the kernel to fail because they can't create anything new).

wlnirvana wrote:

Although paging directories are permanently mapped, what about the second level paging tables?. Each directory may contain 1024 tables if fully allocated, which is 1024 physical pages. Should these second level paging tables be managed dynamically with a kernel heap? If so, the virtual memory manager would fall back to depend on the kernel heap.

The second level paging tables shouldn't be mapped into kernel space - it costs too much space, and most of the time you only modify the "currently being used" virtual address space (where the second level paging tables can be accessed via. the "recursive mapping trick" I mentioned earlier). For kernel-space, because the page tables are shared by all virtual address spaces all of the page table entries are also shared by all virtual address space, so you can modify the second level paging tables for kernel space in any virtual address space and it will automatically change in all virtual address spaces.

Let's imagine an extremly simple kernel; where if you want to allocate a virtual page the virtual memory manager asks the physical memory manager to allocate a physical page and then maps it at the right virtual address; and where if you want to free a virtual page the virtual memory manager obtains the physical address of the page from the page table and then tells the physical memory manager to free it.

Now let's imagine a more complex kernel (where almost all physical pages are allocated from the page fault handler). For "allocate on write"...

Assume that you have a single physical page that's full of zeros. You can map that same physical page as "read only" at many different virtual addresses (and in many different virtual address spaces) and when software reads from any of those virtual addresses it will look like RAM containing zeros. When software tries to write to it you get a page fault (because it was read only), and the kernel's page fault handler asks the physical memory manager to allocate one physical page of memory, and fills it with zeros and map it at the same virtual address as "read write", then returning to the instruction that caused the page fault (so that the instruction will be re-tried). This creates the illusion that the virtual page was always "read-write RAM", even though most of it might be the same single physical page mapped everywhere (to avoid wasting RAM).

Now assume that you have a single page table where every page table entry is identical and every page table entry contains "physical address of that single page full of zeros I mentioned before, mapped as read only". You can use that same page table in many different page directory entries (and in many different virtual address spaces) to fill huge amounts of space with "looks like RAM full of zeros" extremely cheaply. When software writes to any of those virtual addresses you get a page fault (because it was read only), and the kernel's page fault handler asks the physical memory manager to allocate a physical page for a new page table and asks the physical memory manager to allocate another physical page for a new page; then maps the new page into the new page table as "read write" and fill the rest of the page table's entries with "physical address of that single page full of zeros I mentioned before, mapped as read only", and maps the new page table (into the page directory) before returning to the instruction that caused the page fault (so that the instruction will be re-tried). This creates the illusion that huge amounts of space were always "read-write RAM" even though most of it might be the a single physical page used as a page table everywhere (to avoid wasting RAM for other page tables) and a single page full of zeros everywhere (to avoid wasting RAM for pages).

Now... the word "virtual" means "big pile of lies", and virtual memory is "big pile of lies memory", and a virtual memory manager's job is to manage a big pile of lies.

When software allocates an area of virtual memory, you lie and don't allocate any physical RAM at all. Instead you configure the area to use "same page full of zeros mapped as read-only" trickery to avoid wasting any physical RAM (and because it's faster to lie than it is to allocate actual physical RAM). Of course if software writes to any of those pages the virtual memory manager (page fault handler) allocates a physical page (using the physical memory manager) to hide the fact that it lied about allocating memory originally.

Now let's look at "heap". You call "malloc()" and it checks for some free virtual memory and finds none, so it uses something (e.g. "sbrk()" or "mmap()" or whatever) to ask the kernel's virtual memory manager to allocate another area of virtual memory (and the kernel's virtual memory manager lies and pretends to do this while actually doing almost nothing); and after "malloc()" thinks it has memory it allocates some and returns the address of it to the caller. At this point; neither the "malloc()" nor its caller knows that the virtual memory manager is lying and that no physical memory was actually allocated. After that the caller writes something into the area that it thinks is memory that it allocated; and that's when the virtual memory manager (page fault handler) allocates a physical page (using the physical memory manager) to hide the fact that it lied about allocating memory originally.

Note that swap space is just another lie (saying that something is in memory when it's actually on disk), and memory mapped files are a different lie (saying that a file was "instantly" loaded into memory when it wasn't), and "fork()" is also a lie (saying that you cloned all the memory when you didn't).

Of course the virtual file system is also virtual - a big lie (saying that there's a single huge file system) with lies about writing data to disk (when it's only buffered in RAM) and lies about reading data from disk (when it actually came from caches in RAM) with a few more lies just for fun. This means that for some things (memory mapped files) you end up with a liar conspiring with a liar to lie about lies!

Cheers,

Brendan

wlnirvana · **Joined:** Mon Feb 12, 2018 10:09 pm **Posts:** 8

I think I've now got a rough idea how to completely separate paging with the kernel heap following Brendan's approach. I may have to go and implement my ideas, then come back to the tutorials and this thread. But let me summarize my thoughts here for future reference as well as to see if I fully understand the technique.

Paging data structures for the kernel (including page tables at all levels) can be permanently mapped, thus don't need dynamic memory management.
Paging data structures for userspace processes can be split into two parts: first-level page tables (a.k.a page directories) and everything else.
- First-level page tables are permanently mapped as well. In this sense, the virtual space for them is reserved even before the processes are spawned. But if these page tables are not yet used, the physical space for them does not need to be reserved. Instead these virtual pages/addresses can be all mapped to a same zero-filled physical page. This is "allocate-on-write".
- Everything else can be dynamically managed in the user space.

OSDev.org

Is it possible to decouple paging and kernel heap?

Who is online