dimischiavone wrote:
I like your suggestion but as i said i'm not interested in writing my own bootloader right now (i'm sure i'll do it later) so i guess i'll go with the link trickery.
Nono, not a whole boot loader, but a boot loader shim. Multiboot allows you to specify multiple files to load, so you can use that to separate the main kernel from the glue code between the boot loader and the main kernel.
I have something like this in my kernel, since I have a 64-bit kernel, and multiboot requires that I start in 32-bit mode, so I have to work around that. Also, design started when UEFI was not firmly established yet, and i wanted to be able to boot the same kernel on old and new systems, and that is easier if you can just slot in a different shim to abstract away the difference. The idea is to make one executable file, linked against some physical address, that does nothing other than initializing page tables suitably for the kernel to be loaded, and then jumps to the main kernel, and the main kernel goes in a completely different file, linked to its virtual base. The main kernel is specified to GRUB as a module, so the shim can find it in the modules list. No linking together has to occur. And indeed in my case, the main kernel is never told exactly where in physical memory it has been loaded to. That is all part of the list of reserved memory areas handed to the main kernel.
dimischiavone wrote:
Another question if i might: my original plan was to setup the GDT and paging in the c entry point after GRUB loads me. This meant that i was still dealing with physical addresses and i was going to setup physical memory management before enabling paging (by using a bitamp for low memory and a stack for the other zones). Now, if i start with paging enabled i should manage physical memory through the virtual memory manager , right? My p.m.m. should give me physical addresses and i just need to map them right?
You will always have to deal with physical addresses. The only question is how to access them. There are pretty much two schools of thought on this: Either map all memory all the time, or map use recursive paging and map only what memory is necessary. There are pros and cons to both. Mapping all memory of course requires a lot of virtual memory, which you might not have in 32-bit mode. But in 64-bit mode the issue is naught, and for 32-bit mode the argument can be made that it is unlikely you will come across a system with a lot of memory stuck in 32-bit mode. Even so, getting access to most of 1GB of RAM should suffice for a start. Using recursive page mappings has the positive of requiring less virtual memory, and not imposing a memory limit even in 32-bit mode, but the cons of not being portable (some architectures have different page table formats at different levels), taking up a lot of virtual memory in 32-bit mode with PAE[1], and being horrible performance-wise on multi-CPU systems.[2] And you can only change page mappings in the currently active address space, although this is usually shared across all processes for the kernel space half. That's why a large linear mapping is just conceptually simpler.
[1]The scheme uses one slot of the highest level page table you have. In non-PAE 32-bit mode, that is 4MB, or 1/1024 of the available virtual memory. In 64-bit mode with 4-level paging, that would be 512 GB, or 1/512 of the available virtual memory. But in 32-bit PAE mode, it is 1GB, or 1/4 of the available virtual memory. That is because in that mode, the highest level of the paging structure is indexed with only 2 bits. Whether this is a waste or not depends entirely on what else you would do with the memory if it were available, but 1/4 is hefty. Especially when you consider that usually 1/2 of VM goes away for user space, anyway, and leaves you with only 1GB to do anything other than paging in.
[2]The idea in the second case is that whenever you need to access a raw physical address, you would map it into whatever spot is reserved in your virtual memory for that, then perform the change there. Even in the most optimized version of this, you need to perform at least one TLB invalidation in the process, whenever you remap that page. You need to remap that page constantly. If you support multiple CPUs, and they all use the same paging structures on the kernel side (a common design choice), then the TLB invalidation has to be a TLB shootdown, where you send an IPI to the other cores to make them invalidate the address after changing it. But that does not scale for higher core counts, and those are getting ridiculous. The more cores you have, the more you have to wait for the completion of the TLB shootdown.