Kernel mapping

CRoemheld · **Joined:** Wed May 02, 2018 1:26 pm **Posts:** 55

To describe the title more accurately:

My kernel linker script uses the VMA 0xffffffff80200000. I'm using the separate loader approach from OSDev tutorials, loading a 64-bit kernel by jumping to it from a 32-bit separate loader.
I recently found a file from kernel.org which lists the different virtual memory map entries using the PML4 page table (https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt).

In there I tried to adhere to the standard mapping, especially both the phys memory mapping:

Code:

ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory

with 1GiB of RAM -> 0xffff880000000000 to 0xffff880040000000 on 2MiB pages, and the kernel text mapping:

Code:

ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0

with 512MiB of RAM, just as stated in the file, on 4KiB pages.

However, I am currently mapping the kernel elf file directly, by only mapping the size of the kernel itself (in my current case, 0xf030 bytes), from 0xffffffff80200000 to 0xffffffff8020f030.
The problem seems to be clear: the first mapping of 512 MB completely surrounds the second mapping. Meaning, the virtual addresses overlap:

Code:

void bootstrap_map_kernel(elf64_ehdr_t *elf64_ehdr)
{
   elf64_phdr_t *elf64_phdr = get_elf64_phdr(elf64_ehdr);

   for(uint32_t i = 0; i < elf64_ehdr->e_phnum; i++) {
      elf64_phdr_t *elf64_phdr_segment = &elf64_phdr[i];

      uint64_t vaddr = elf64_phdr_segment->p_vaddr;
      uint64_t paddr = void_ptrtu32(elf64_ehdr) + elf64_phdr_segment->p_offset;
      int64_t memsz = elf64_phdr_segment->p_memsz & LONG_MASK;
      
      mmap_4kb(vaddr, paddr, X86_64_PTE_FLAG_PRESENT_RW, memsz);
   }
}

elf64_ehdr is the header from the kernel elf.

While the first mapping is from the standard of virtual memory layout, the second mapping is needed to make the OSDev tutorial work.

How do I deal with this issue? It should be noted that I want to make the virtual memory layout adhere to the one listed in the file. I thought about using either one of these approaches:
While the second mapping (the mapping of the kernel elf) is my current approach and it works perfectly fine, I am trying to find reasons to change it to the layout in the link. By changing to the other approach (kernel text mapping, 512MiB), I would need to rewrite my code so that the kernel entry point is still reachable.

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

CRoemheld wrote:

I recently found a file from kernel.org which lists the different virtual memory map entries using the PML4 page table (https://www.kernel.org/doc/Documentatio ... _64/mm.txt).

This is the mapping of the Linux kernel.

CRoemheld wrote:

In there I tried to adhere to the standard mapping, especially both the phys memory mapping

I wouldn't call it standard. It is particular to Linux, not Unix, not Windows, etc. It may have changed through the years. And it is particular to the amd64 architecture, whereas Linux supports a bunch of others.

There are two reasons for using that scheme to my understanding. One, because it makes the conversion between virtual addresses in that range and their corresponding physical addresses easy. Second, because you can use the "kernel" memory model of the GNU toolchain for linking and code generation. This means using 32-bit negative absolute addresses to directly access the last 2GB of the address space. That is why the kernel text mapping above is exactly positioned on the last 2GB. The same trick is used for kernel modules which follow in the memory layout. Note that neither of those are relevant if you generate PIC (position independent code), which has very little penalty on x64. And to use mcmodel=kernel, you need to process relocations for absolute memory addresses from the elf images when you load them.

CRoemheld wrote:

While the first mapping is from the standard of virtual memory layout, the second mapping is needed to make the OSDev tutorial work.

How do I deal with this issue? It should be noted that I want to make the virtual memory layout adhere to the one listed in the file. I thought about using either one of these approaches:
While the second mapping (the mapping of the kernel elf) is my current approach and it works perfectly fine, I am trying to find reasons to change it to the layout in the link. By changing to the other approach (kernel text mapping, 512MiB), I would need to rewrite my code so that the kernel entry point is still reachable.

Could you clarify? Why would using the Linux approach render the kernel entry point unreachable? Also, using your approach wouldn't render any functionality that Linux provides unfeasible in my opinion. All that Linux manages to do with this scheme (to my knowledge) is to fit inside the last 2GB and to translate virtual to physical addresses in that range using only address arithmetic.

CRoemheld · **Joined:** Wed May 02, 2018 1:26 pm **Posts:** 55

Thanks for your answer simeonz!

simeonz wrote:

CRoemheld wrote:

While the first mapping is from the standard of virtual memory layout, the second mapping is needed to make the OSDev tutorial work.

How do I deal with this issue? It should be noted that I want to make the virtual memory layout adhere to the one listed in the file. I thought about using either one of these approaches:
While the second mapping (the mapping of the kernel elf) is my current approach and it works perfectly fine, I am trying to find reasons to change it to the layout in the link. By changing to the other approach (kernel text mapping, 512MiB), I would need to rewrite my code so that the kernel entry point is still reachable.

Could you clarify? Why would using the Linux approach render the kernel entry point unreachable? Also, using your approach wouldn't render any functionality that Linux provides unfeasible in my opinion. All that Linux manages to do with this scheme (to my knowledge) is to fit inside the last 2GB and to translate virtual to physical addresses in that range using only address arithmetic.

Sorry, I'm a bit confused myself. As I said, I wanted to try to adhere to the linux mapping regarding the kernel. Since linux maps the kernel on 0xffffffff80000000, from physical address 0 going upwards, the addresses for the kernel symbols (entry point and all its functions) should be easily accessible by adding the physical addresses of the symbol to the kernel virtual start address (0xffffffff80000000):

0xffffffff80000000 + <symbol physical address>

Is the kernel mapping start address (0xffffffff80000000) just a kind of limit at which to map the kernel at any point upwards (between 0xffffffff80000000 - 0xffffffff9fffffff), or should the kernel definitely mapped at 0xffffffff80000000 and exactly mapping 512MiB of physical memory?

Since linux maps it at 0xffffffff80000000, starting from physical address 0x0, my kernel would be accessible at 0xffffffff80390000, since 0x390000 is the physical address of my kernel.

Then I tried to change the kernel VMA to 0xffffffff80000000. If I do that, the kernel entry point is located at 0xffffffff80000a70, but the mapping results in the address 0xffffffff80390a70, meaning the kernel won't start because it's the wrong address.
Before you tell me to add the offset of the kernel to the address, it won't work either since the entry point may now be reachable, but all other addresses are still at a different address:

I jump by using the %rax register: jump to 0xffffffff80390a70.
When a function inside the kernel is called, the addresses simply drop the 0x390000, resulting in addresses like 0xffffffff80002631. But as I said before, when mapping the kernel on 0xffffffff80000000 from physical memory 0x0 upwards, the functions always reside at 0xffffffff80390000, so the next call inside the kernel fails. So I'm trying to figure out how to change the kernel VMA in the linker script so that even the functions inside the kernel are reachable.

When I keep the VMA of currently 0xffffffff80200000, the entry point is now at 0xffffffff80200a70. BUT: In my current state, I am only mapping the kernel from physical address 0x390000 upwards, which means addresses like 0xffffffff80202631 from before is now reachable, since 0x2631 is the offset of the symbol from the kernel VMA. If I do that, it works fine, but it is not the kernel virtual address used in linux.

As you pointed out, you don't think it is a problem if i keep my current implementation regarding the kernel VMA and mapping it to 0xffffffff80200000 rather than 0xffffffff80000000 (because it is still in the range 0xffffffff80000000 - 0xffffffff9fffffff?).

In this case, I won't need the 512MiB kernel text mapping used in the linux kernel.

Again: TL;DR:

Is using virtual addresses like 0xffffffff80200000 rather than 0xffffffff80000000 for the kernel still considered to be similar/equal to the linux mapping?
Do I need to map 512MiB?
Should I definitely map the kernel twice, first by mapping to 0xfffffff80000000 from physical address 0x0 upwards and then directly to 0xffffffff80200000, but from the actual kernel physical address (0x390000) upwards?
If not, which virtual address should I use and why? Maybe a suggestion based on your own projects/experiences?

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

CRoemheld wrote:

Since linux maps it at 0xffffffff80000000, starting from physical address 0x0, my kernel would be accessible at 0xffffffff80390000, since 0x390000 is the physical address of my kernel.

Then I tried to change the kernel VMA to 0xffffffff80000000.

You simply need to set the executable base address to 0xffffffff80390000 in your linker script.

Note that the memory map given in the text document is very much for overview only. There are a lot of details missing, which alter the picture. First, the physical address of the linux kernel is statically configurable, which however is of somewhat lesser importance, because this is already reflected in the kernel liker script. However, the image can also be configured as relocatable, which allows setting the physical address dynamically through a boot parameter. In this case, Linux compensates the change with the virtual mapping. The virtual address of the kernel looks unchanged, but the mapping is not 0-based anymore. Lastly, the physical address (but not the virtual address you are referring to above) can be randomized at each boot. Other memory regions have their base addresses chosen randomly as well, assuming that the kernel is configured to use kernel address space layout randomization.

CRoemheld · **Joined:** Wed May 02, 2018 1:26 pm **Posts:** 55

simeonz wrote:

CRoemheld wrote:

Since linux maps it at 0xffffffff80000000, starting from physical address 0x0, my kernel would be accessible at 0xffffffff80390000, since 0x390000 is the physical address of my kernel.

Then I tried to change the kernel VMA to 0xffffffff80000000.

You simply need to set the executable base address to 0xffffffff80390000 in your linker script.

Unfortunately this does not solve anything.
I just tried to enter different VMAs for the kernel, and I noticed the following:

Using VMA 0xffffffff80390000, kernel was located at 0x320000 in physical memory:

Code:

Program Header:
    LOAD off    0x0000000000190000 vaddr 0xffffffff80390000 paddr 0x0000000000000000 align 2**21
         filesz 0x000000000000f030 memsz 0x000000000000f030 flags rwx

Using VMA 0xffffffff80320000, kernel was located at 0x2b0000 in physical memory:

Code:

Program Header:
    LOAD off    0x0000000000120000 vaddr 0xffffffff80320000 paddr 0x0000000000000000 align 2**21
         filesz 0x000000000000f030 memsz 0x000000000000f030 flags rwx

Using VMA 0xffffffff80210000, kernel was located at 0x1a0000 in physical memory:

Code:

Program Header:
    LOAD off    0x0000000000010000 vaddr 0xffffffff80210000 paddr 0x0000000000000000 align 2**21
         filesz 0x000000000000f030 memsz 0x000000000000f030 flags rwx

Using current VMA 0xffffffff80200000, kernel was located at 0x390000 in physical memory:

Code:

Program Header:
    LOAD off    0x0000000000200000 vaddr 0xffffffff80200000 paddr 0x0000000000000000 align 2**21
         filesz 0x000000000000f030 memsz 0x000000000000f030 flags rwx

So getting closer to 0xffffffff80200000 reduces the offset in the program header, but when I use 0xffffffff80200000, the offset is suddenly going up to 0x200000.

So there is kind of a dependency between the VMA used in the linker script and where the kernel is loaded.

The kernel is compiled using the following flags:

Code:

CCFLAGS    := -ffreestanding -mcmodel=large 
CCFLAGS    += -mno-red-zone -mno-mmx -mno-sse -mno-sse2

And linked using the code used in the tutorial.

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

First, you will notice that changing the base address causes the .text section to be pre-padded with a sizeable gap after the elf header. This is due to the 2MB alignment constraint of the loadable segment. It is actually not a bad thing if you intend to map the executable using large pages (which will improve the TLB efficiency), but you could disable it with the "-nmagic" ld option if you wanted to avoid having a gap for now. (Edit: Especially considering that this is not actually the kernel in the context of the tutorial.)

The more important issue is that, assuming you have been following the tutorial strictly, you are loading the kernel as a separate grub module. As such, its load address cannot be controlled. It will be loaded somewhere in free memory after grub makes other allocations, but its location will not be stable between grub versions or in relation to other factors. So, assuming that you do indeed split the loading process as the tutorial describes, if you insist having the kernel at any particular location, you are only left with the possibility of manually moving it there.

Anyway. I am not sure that all the suffering is worth it, considering that linux itself is following the memory map layout that you linked to rather flexibly. For example, assuming that you request module alignment by raising the respective flag in the multiboot header, the kernel will be loaded page-aligned. Then it can be mapped to the virtual address it was linked against, no matter where exactly it was loaded.

CRoemheld · **Joined:** Wed May 02, 2018 1:26 pm **Posts:** 55

simeonz wrote:

First, you will notice that changing the base address causes the .text section to be pre-padded with a sizeable gap after the elf header. This is due to the 2MB alignment constraint of the loadable segment. It is actually not a bad thing if you intend to map the executable using large pages (which will improve the TLB efficiency), but you could disable it with the "-nmagic" ld option if you wanted to avoid having a gap for now. (Edit: Especially considering that this is not actually the kernel in the context of the tutorial.)

I will look into that topic and the -nmagic flag.

simeonz wrote:

The more important issue is that, assuming you have been following the tutorial strictly, you are loading the kernel as a separate grub module. As such, its load address cannot be controlled. It will be loaded somewhere in free memory after grub makes other allocations, but its location will not be stable between grub versions or in relation to other factors. So, assuming that you do indeed split the loading process as the tutorial describes, if you insist having the kernel at any particular location, you are only left with the possibility of manually moving it there.

Anyway. I am not sure that all the suffering is worth it, considering that linux itself is following the memory map layout that you linked to rather flexibly. For example, assuming that you request module alignment by raising the respective flag in the multiboot header, the kernel will be loaded page-aligned. Then it can be mapped to the virtual address it was linked against, no matter where exactly it was loaded.

That's what I thought, so I don't really have any influence in the location the kernel is loaded to. Okay, but considering my current approach, mapping the kernel at the (as of now only working) address 0xffffffff80200000 would still be fine, since it is the closest to the linux mapping regarding address range (0xffffffff80200000 is between 0xffffffff80000000 and 0xffffffff9fffffff from the linux kernel text mapping). The 512 MiB used there are probably because of the size of the linux kernel, so my as of now very small kernel with less than 100KiB won't need 512MiB to be mapped.

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

CRoemheld wrote:

Okay, but considering my current approach, mapping the kernel at the (as of now only working) address 0xffffffff80200000 would still be fine, since it is the closest to the linux mapping regarding address range (0xffffffff80200000 is between 0xffffffff80000000 and 0xffffffff9fffffff from the linux kernel text mapping). The 512 MiB used there are probably because of the size of the linux kernel, so my as of now very small kernel with less than 100KiB won't need 512MiB to be mapped.

Yes. In fact, Linux does not map the entire 512MB either. Doing so would only make wild pointers harder to debug.

As I said, Linux has multiple configuration options. Both the physical and the virtual address of the image can vary. If you want to match the default configuration, then you have to match the base image address set in the linker script here, a macro constant defined here as the sum of the virtual region base and the physical load address. The load address (unless overriden dynamically) is set by the static configuration option, whose default value is 0x1000000. The virtual region for the image starts at the address defined here to 0xffffffff80000000. So, in summary, if you want to match the default kernel configuration, you should set your kernel base VMA to 0xffffffff81000000.

OSDev.org

Kernel mapping

Who is online