Kernel memory layout and memory management

dimischiavone · **Posted:** Wed Jun 02, 2021 3:50 am

Hi everyone, new member here. I've just recently started os development as a way to extend the little pratical knowledge given about operating systems at universities.
As this is my first time delving into os development, i have decided to make a 32 bit (without PAE) os targetting x86 architecture. To make things even simpler (since for now i'm interested to make an os and not a bootloader) i have decided to make my kernel bootable by a multiboot2 compliant bootloader.
I have started from the beginners tutorials on the wiki and completed succesfully all up until "meaty skeleton" and then moved on by taking only inspiration from it. I'm now at the point where i have succesfully grabbed the multiboot2 memory map and designing the physical memory manager. Before jumping into coding a few questions arised in my mind and i was wondering if someone could give me some pointers (no pun intended).
The first question i wanted to ask is about kernel memory placement, i would like to go with an higher half kernel to be placed starting at 0xC0000000. I wanted to ask which is the best way to do this? In my linker script i place the kernel at the conventional 1M and after enabling paging i would map the kernel memory region to start at 0xC0000000. I've seen that another approach is to link the kernel at the virtual address from the start and enable paging in the boot initialization before jumping to c code. In my case, since i'm loaded by a multiboot2 bootloader i find myself already in 32 bit protected mode and was thinking about doing all the setup from c code (GDT, IVT, paging). What's the best way to go about it and why? For now i'm using the gdt provided by multiboot and haven't setup interrupts yet.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5145

dimischiavone wrote:

What's the best way to go about it and why?

The best way is to link your kernel at its virtual address (0xC0000000 or whatever you like) and use a bootloader that will set up appropriate page tables for it.

Since you're using Multiboot2, you have the choice of either using some linker trickery and setting up initial page tables in assembly, or putting your kernel in a multiboot module and writing a shim "bootloader" to set up the page tables and load your kernel binary correctly.

Of those options, the latter is more flexible: you can write another shim to support other bootloaders, and your kernel binary can be in a format that's not supported by Multiboot.

dimischiavone wrote:

For now i'm using the gdt provided by multiboot

Multiboot does not provide a GDT! It only provides descriptors in the hidden (descriptor cache) parts of the segment registers. You must set up your own GDT before you can do anything that requires a GDT to be present.

dimischiavone · **Posted:** Thu Jun 03, 2021 3:11 am

Octocontrabass wrote:

Multiboot does not provide a GDT! It only provides descriptors in the hidden (descriptor cache) parts of the segment registers. You must set up your own GDT before you can do anything that requires a GDT to be present.

It must've been late when i read the multiboot2 spec, i've just read it again and you're right (it even says that the gdtr might be invalid) thanks for the catch.
I like your suggestion but as i said i'm not interested in writing my own bootloader right now (i'm sure i'll do it later) so i guess i'll go with the link trickery.
Another question if i might: my original plan was to setup the GDT and paging in the c entry point after GRUB loads me. This meant that i was still dealing with physical addresses and i was going to setup physical memory management before enabling paging (by using a bitamp for low memory and a stack for the other zones). Now, if i start with paging enabled i should manage physical memory through the virtual memory manager , right? My p.m.m. should give me physical addresses and i just need to map them right?

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

dimischiavone wrote:

I like your suggestion but as i said i'm not interested in writing my own bootloader right now (i'm sure i'll do it later) so i guess i'll go with the link trickery.

Nono, not a whole boot loader, but a boot loader shim. Multiboot allows you to specify multiple files to load, so you can use that to separate the main kernel from the glue code between the boot loader and the main kernel.

I have something like this in my kernel, since I have a 64-bit kernel, and multiboot requires that I start in 32-bit mode, so I have to work around that. Also, design started when UEFI was not firmly established yet, and i wanted to be able to boot the same kernel on old and new systems, and that is easier if you can just slot in a different shim to abstract away the difference. The idea is to make one executable file, linked against some physical address, that does nothing other than initializing page tables suitably for the kernel to be loaded, and then jumps to the main kernel, and the main kernel goes in a completely different file, linked to its virtual base. The main kernel is specified to GRUB as a module, so the shim can find it in the modules list. No linking together has to occur. And indeed in my case, the main kernel is never told exactly where in physical memory it has been loaded to. That is all part of the list of reserved memory areas handed to the main kernel.

dimischiavone wrote:

Another question if i might: my original plan was to setup the GDT and paging in the c entry point after GRUB loads me. This meant that i was still dealing with physical addresses and i was going to setup physical memory management before enabling paging (by using a bitamp for low memory and a stack for the other zones). Now, if i start with paging enabled i should manage physical memory through the virtual memory manager , right? My p.m.m. should give me physical addresses and i just need to map them right?

You will always have to deal with physical addresses. The only question is how to access them. There are pretty much two schools of thought on this: Either map all memory all the time, or map use recursive paging and map only what memory is necessary. There are pros and cons to both. Mapping all memory of course requires a lot of virtual memory, which you might not have in 32-bit mode. But in 64-bit mode the issue is naught, and for 32-bit mode the argument can be made that it is unlikely you will come across a system with a lot of memory stuck in 32-bit mode. Even so, getting access to most of 1GB of RAM should suffice for a start. Using recursive page mappings has the positive of requiring less virtual memory, and not imposing a memory limit even in 32-bit mode, but the cons of not being portable (some architectures have different page table formats at different levels), taking up a lot of virtual memory in 32-bit mode with PAE[1], and being horrible performance-wise on multi-CPU systems.[2] And you can only change page mappings in the currently active address space, although this is usually shared across all processes for the kernel space half. That's why a large linear mapping is just conceptually simpler.

[1]The scheme uses one slot of the highest level page table you have. In non-PAE 32-bit mode, that is 4MB, or 1/1024 of the available virtual memory. In 64-bit mode with 4-level paging, that would be 512 GB, or 1/512 of the available virtual memory. But in 32-bit PAE mode, it is 1GB, or 1/4 of the available virtual memory. That is because in that mode, the highest level of the paging structure is indexed with only 2 bits. Whether this is a waste or not depends entirely on what else you would do with the memory if it were available, but 1/4 is hefty. Especially when you consider that usually 1/2 of VM goes away for user space, anyway, and leaves you with only 1GB to do anything other than paging in.

[2]The idea in the second case is that whenever you need to access a raw physical address, you would map it into whatever spot is reserved in your virtual memory for that, then perform the change there. Even in the most optimized version of this, you need to perform at least one TLB invalidation in the process, whenever you remap that page. You need to remap that page constantly. If you support multiple CPUs, and they all use the same paging structures on the kernel side (a common design choice), then the TLB invalidation has to be a TLB shootdown, where you send an IPI to the other cores to make them invalidate the address after changing it. But that does not scale for higher core counts, and those are getting ridiculous. The more cores you have, the more you have to wait for the completion of the TLB shootdown.

dimischiavone · **Posted:** Thu Jun 03, 2021 7:02 am

Let me see if i understood that correctly. Let's assume i have a system with 4GB of physical RAM, of course with 32 bits there's not enough virtual address space to linearly map it in the kernel virtual address space and the only solution would be to either move to 64 bit or use PAE. Either way since i'm doing this for the first time and for learning purposes i'd like to first do something with old fasioned 32 bit and bios (to run on emulators only or on older hardware). Returning to the 32 bit case, with the second method you mentioned (mapping on demand) i would be able to use all 4G of physical RAM (i mean all physical RAM that is not reserved for anything else like bios, etc...) even on a 32 bit system right? (With all the implications you mentioned for multi core support and TLB shootdowns).

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5145

nullplan wrote:

If you support multiple CPUs, and they all use the same paging structures on the kernel side (a common design choice), then the TLB invalidation has to be a TLB shootdown, where you send an IPI to the other cores to make them invalidate the address after changing it.

You can avoid the TLB shootdown if each CPU has its own virtual address for temporary mappings, you only use temporary mappings for things without read side effects, and you prevent kernel threads from migrating between CPUs while those temporary mappings are in use.

Whether this will work for you depends on how you intend to use those temporary mappings.

dimischiavone · **Posted:** Thu Jun 03, 2021 12:50 pm

Thanks everyone for the replies, I'll take some days try and understand what will work best for me. I can already see from all the suggestions that optimizing or doing the things the most efficient way possible from the beginning is a rabbit hole. I think I'll stick to typical design choices for starting and then move on to more complex things as I learn more.

dimischiavone · **Posted:** Tue Jun 08, 2021 3:08 pm

Hello guys, i have made some progress on my kernel and would like to get some more advice.
I have managed to setup an higher half kernel by mapping the first 4MB of physical memory starting at 1MB physical to 0xC0000000 virtual, successfully enabled paging and removed the identity mapping just after jumping to the higher half. I did this in the boot stub assembly code that runs prior to my kernel but which is part of it (it's all linked together and grub loads the entire file as a single elf object, so i had to fix object addresses in the stub by substracting the kernel virtual base address).
I have successfully accessed the multiboot information structure from the higher half. After that i have setup a new GDT and loaded it with it's virtual address and all is good.
The first question that comes from the back of my head is this:
Since i have enable paging in the assembly stub, is it a good idea to map the various elf sections of the kernel (text, data, rodata, bss, etc...) with the correct permissions already here to avoid doing it again in c code?
The next question is:
Having pre allocated (compile time) 4KB space for the kernel page directory in the boot stub, is it a good idea to make the symbol global for my kernel and keep using those 4KB to store future page tables or people usually throw it away and allocate a new one? Is it normal to initialize the physical memory manager at this stage? I mean i'm in the higher half with the memory map and ready to initialize the bitmap of free page frames.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5145

dimischiavone wrote:

The first question that comes from the back of my head is: since i have enable paging in the assembly stub, is it a good idea to map the various elf sections of the kernel (text, data, rodata, bss) with the correct permissions already here to avoid doing it again in c code?

I think that's a matter of preference.

If the correct permissions include the NX bit, you'll have to detect if the CPU supports it and then enable it before you can set the correct permissions. That might be easier to do in C.

dimischiavone wrote:

Next question is: since i have pre allocated 4KB space for the kernel page directory in the boot stub, is it a good idea to make the symbol global for my kernel and keep using those 4KB to store future page tables in the space allocated for the page directory or people usually throw it away and allocate a new one?

I think that's also a matter of preference. However, if you decide you want that memory to be reusable, you have to consider how your chosen method of preallocation will interact with your physical memory manager.

dimischiavone wrote:

Is it normal to initialize the physical memory manager at this stage?

Most other things in your OS will depend on it, so you need to initialize it pretty early. The only things I can think of that you might want before the physical memory manager are exception handlers and debugging output.

dimischiavone · **Posted:** Wed Jun 09, 2021 3:04 am

Octocontrabass wrote:

If the correct permissions include the NX bit, you'll have to detect if the CPU supports it and then enable it before you can set the correct permissions. That might be easier to do in C.

Unfortunately in 32 bit non PAE the NX bit is not availabe. I could switch to PAE or implement a longmode OS/Kernel, but since i'm doing this for the first time i want to stick with simpler things for now. I guess i'll have to resort to some tricks to avoid executing pages that should be data.
As for debugging output, i have already a working VGA text mode console working and was considering adding a serial one aswell (since i prefer to read the logs after each run rather than scrolling the text each time).

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5145

dimischiavone wrote:

I guess i'll have to resort to some tricks to avoid executing pages that should be data.

The only trick I know about involves using the CS limit to assign one range of memory to be executable and the rest not executable. Is that what you're thinking of?

dimischiavone · **Posted:** Wed Jun 09, 2021 11:02 pm

Octocontrabass wrote:

The only trick I know about involves using the CS limit to assign one range of memory to be executable and the rest not executable. Is that what you're thinking of?

You're right, i guess i'll need to atleast support at minimum processors with PAE, NX, SMAP, SMEP if i want to provide basic OS isolation and security. It would be natural to go 64 bit longmode probably, but i'm not yet ready for that and don't want to touch UEFI.

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

I would actually suggest starting out in 64-bit mode. It is so much easier to start there than to build something halfway stable in 32-bit mode and then look to adapt it. Plus 64-bit mode has so many advantages. Among other things, you have enough virtual memory to map all of physical memory, really simplifying access to physical addresses, which you need for paging and all kinds of drivers. I'm not even talking about TLB shootdowns, though that is part of it; it is just conceptually easier to be able to access any physical address with just an addition and a cast.

dimischiavone · **Posted:** Fri Jun 11, 2021 9:40 am

nullplan wrote:

I would actually suggest starting out in 64-bit mode. It is so much easier to start there than to build something halfway stable in 32-bit mode and then look to adapt it. Plus 64-bit mode has so many advantages. Among other things, you have enough virtual memory to map all of physical memory, really simplifying access to physical addresses, which you need for paging and all kinds of drivers. I'm not even talking about TLB shootdowns, though that is part of it; it is just conceptually easier to be able to access any physical address with just an addition and a cast.

I thought for starters it would've been easier to deal with older specs (BIOS, 32 bit) but i guess the huge address space in 64 bit makes it easier for the memory management and kernel protection (since most 64 bit modern processors all support nx, smap, smep atleast). Another good thing is that i could test on real hardware afterwards too. (Personally i don't have access to anything ancient, all i got has UEFI and 64 bit). However if i wanted to restart in 64 bit, does GRUB support booting 64 bit elf kernels in uefi mode? (I think i'll have to resort to EFI boot services for things like multiboot2 provides me now). I will read the info about doing this in 64 bit in the wiki i guess. Will post if i'll get in any trouble. Thanks for the comments.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5145

dimischiavone wrote:

However if i wanted to restart in 64 bit, does GRUB support booting 64 bit elf kernels in uefi mode?

Yes, if you specify appropriate boot services and entry address tags. Do note that boot services will be running, so you're responsible for calling ExitBootServices() with the appropriate (Windows) calling conventions.

OSDev.org

Kernel memory layout and memory management

Who is online