x86-64 Paging - Just to be sure

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

eryjus wrote:

If I am reading and understanding your concern correctly, you are having trouble understanding recursive mapping. Check out this wiki section: http://wiki.osdev.org/Page_Tables#Recursive_mapping. You can also search the forum (I know I had a post myself on the topic from several years ago). The key here is that you only need a single recursive map at the top level (where the PML4 has an entry that points back to itself) and the rest of the math works out. You do not have to map each table.

Taking the last entry in the PML4 is the easiest in my opinion to understand. Sit down with a piece of paper and prove to yourself that you can reach all of your table structures when you set PML4[511] = PML4. Use that as a fact as you traverse your tables. It's elegant, but that also means it can take some time to get your head around what is really happening.

Okay so PML4T[511] -> points to PML4T[0] as a fake PDPT -> points to PDPT[0] as a fake PD -> points to PD[0] as a fake PT -> points to PT[0] as a fake beginning of a page, right? Wouldn't that also mean that i can't acces PML4T to PD? Because their entry points to another table?

Wait maybe i'm getting it: If i use the highest index for all tables, the PML4T will go through a 4 * loop and land at PML4T[0] as a page[0]... Right?
And because each page has to be 4K aligned, i can set the offset to 4K and get PML4T[1] as a page[1], right?

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

I updated my code. I map 512 * 1GB pages for the kernel so virtual addresses are the same as physical addresses below 512GB. I'll then make similar constructs for processes, with the difference that those will get 32 PDPT pages which point to a PD with ~70 entries where the first one only points to a PT with ~400 entries (all together should need ~2MiB RAM). Or i might directly do a PD with ~400 entries. If i reach 400 entries, the first PD entry (or PDPT entry, depending on what i'll do) will be converted into a pointer to a 2MiB page and the second one will then point to the PT again. If i reach 60 PD entries, i'll drop the 4K tables and only map 2MiB tables and i'll also extend the amount of PD tables to ~400. If i reach 400 PD entries, same principle as above. But i am still unsure if i should use 4K pages as standard. Maybe i'll add a parameter to loaded programms that indicates the predicted amount of RAM required and then select the pages accordingly. Maybe it would be better if i draw that and post it.

My Code (it works okay? XD):

Code:

#include <mm/vmm.hpp>
#include <common.hpp>

#define PL2E  512 //512 --> 512 * 2MB Pages per PL3 Entry --> 512 * 16 * 2MB Pages per Process
#define PL3E  32  //32  --> 32 GB max per process
#define PL3KE 512 //512 --> 512 GB address space for the kernel
#define PL4E  512 //512 --> PID of the running process --> 512 processes possible without adding another PML4T

#define SIZE2M 0x000200000
#define SIZE1G 0x040000000

#define PG_PRESENT      0x1
#define PG_WRITABLE      0x2
#define PG_USER         0x4
#define PG_BIG         0x80
#define PG_NO_EXEC      0x8000000000000000
#define PG_ADDR_MASK   0xFFFFFFFFFF000

#define Proc_Kernel_VMA   0x000003E0000000 //The kernels base addres in each Process

#define PL4P   0x7E00000 //Position of the global PML4T

#define ALIGN   512 //Each entry has to be 4K aligned. Each entry is a uint64_t -> 8Bytes per Address -> PL4[1] = 8 + PL4[0] -> 8Bytes * 512 = 4096

namespace Paging
{
   uint64_t *PL4 = PL4P;
   uint64_t *PL3 = PL4P + 8 * ALIGN; //The address has to be 4K aligned --> set it with 8 * ALIGN
   void init(void)
   {
      PL4[(PL4E - 1) * ALIGN] = (((uint64_t) PL4) | PG_WRITABLE | PG_PRESENT);
      PL4[0] = (((uint64_t) PL3) | PG_WRITABLE | PG_PRESENT);
      for (uint64_t i = 0; i < (PL4E - 2); i++)
      {
         PL3[i * ALIGN] = (((uint64_t) i * SIZE1G) | PG_WRITABLE | PG_PRESENT | PG_BIG);
      }
      SetCR3((uint64_t) PL4);
      //asm volatile("mov %0, %%cr3" : : "r" (PL4));
   }
}

eryjus · **Posted:** Sun May 21, 2017 10:23 am

MathiLpHD wrote:

PML4T[511] -> points to PML4T[0]

Almost. PML4T[511] points to the address of the table PML4T, where you can then index that table as your would any other table.

Let's decompose one to illustrate. I want to allocate a new frame for the page at canonical address 0x0000 5566 7788 9000. By the way, I am checking my work against table 4-2 in the Intel Software Developer's Manual as I write this -- it's a good reference for you to have on-hand. Also, note the slight difference between traversing this address to access it's page versus traversing these tables to MANAGE its page.

* We are managing a page and assigning a frame to a page, so we will select PML4T[511]. This is not dependent on the address we are managing. This is binary 0b1 1111 1111.
* Next bits 47:39 (the PML4 entry number for access) are used to select which PDPT entry we will select for MANAGEMENT. The result is 0b0 1010 1010.
* And then the bit 38:30 (the PDPT entry number for access) are used to select which PDT Entry we will select for MANAGEMENT. The result is 0b1 1001 1001.
* Next we use the bits 29:21 (the PD entry number for access) are used to select which PT Entry we will select for MANAGEMENT. The result is 0b1 1011 1100. Also, note we now have a page.
* Finally we use bits 20:12 (the PT entry number for access) as an offset into the resulting PAGE we now have for MANAGEMENT. The result is 0b0 1000 1001. You will use this index to MANAGE your Page Table Entry for address 0x0000 5566 7788 9000.

The resulting address you used to get to that page table entry was:

Code:

0b1111 1111 1111 1111 1111 1111 1010 1010 1011 0011 0011 1011 1100 0000 0000 0000
---  or ---
0xffff ffaa b33b c000

And then take entry 0x089. You should be able to tie the bit pattern back above, remembering the most significant 16 bits are for canonical address bit extension.

OK, so what if you wanted to manage the PD for that same address? You would use PML4T[511] and PDPT[511] and then use bits 0b0 1010 1010 and 0b1 1001 1001 to complete the traversal to a Page, and then use bits 0b1 1011 1100 as the offset into that page. Note that you are discarding bits, which conceptually makes sense because you are managing a higher-level table. The higher you want to go in your structures, the more you use PML4T[511] and the more bits you ignore. I will leave it to you as an exercise to determine this address and offset like I did above, and consider how to manage the PDP and PML4 tables using the same method.

Also, I encourage you to stick with this topic until you understand it properly. It will help down the road.

EDIT: Your post hit as I was writing mine. I'm going to reiterate what others have said. Start with 4K pages -- it makes life easier believe it or not. Then if you want to mix page sizes down the road you can once you absolutely understand the concept.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

eryjus wrote:

MathiLpHD wrote:

PML4T[511] -> points to PML4T[0]

Almost. PML4T[511] points to the address of the table PML4T, where you can then index that table as your would any other table.

Let's decompose one to illustrate. I want to allocate a new frame for the page at canonical address 0x0000 5566 7788 9000. By the way, I am checking my work against table 4-2 in the Intel Software Developer's Manual as I write this -- it's a good reference for you to have on-hand. Also, note the slight difference between traversing this address to access it's page versus traversing these tables to MANAGE its page.

* We are managing a page and assigning a frame to a page, so we will select PML4T[511]. This is not dependent on the address we are managing. This is binary 0b1 1111 1111.
* Next bits 47:39 (the PML4 entry number for access) are used to select which PDPT entry we will select for MANAGEMENT. The result is 0b0 1010 1010.
* And then the bit 38:30 (the PDPT entry number for access) are used to select which PDT Entry we will select for MANAGEMENT. The result is 0b1 1001 1001.
* Next we use the bits 29:21 (the PD entry number for access) are used to select which PT Entry we will select for MANAGEMENT. The result is 0b1 1011 1100. Also, note we now have a page.
* Finally we use bits 20:12 (the PT entry number for access) as an offset into the resulting PAGE we now have for MANAGEMENT. The result is 0b0 1000 1001. You will use this index to MANAGE your Page Table Entry for address 0x0000 5566 7788 9000.

The resulting address you used to get to that page table entry was:

Code:

0b1111 1111 1111 1111 1111 1111 1010 1010 1011 0011 0011 1011 1100 0000 0000 0000
---  or ---
0xffff ffaa b33b c000

And then take entry 0x089. You should be able to tie the bit pattern back above, remembering the most significant 16 bits are for canonical address bit extension.

OK, so what if you wanted to manage the PD for that same address? You would use PML4T[511] and PDPT[511] and then use bits 0b0 1010 1010 and 0b1 1001 1001 to complete the traversal to a Page, and then use bits 0b1 1011 1100 as the offset into that page. Note that you are discarding bits, which conceptually makes sense because you are managing a higher-level table. The higher you want to go in your structures, the more you use PML4T[511] and the more bits you ignore. I will leave it to you as an exercise to determine this address and offset like I did above, and consider how to manage the PDP and PML4 tables using the same method.

Also, I encourage you to stick with this topic until you understand it properly. It will help down the road.

EDIT: Your post hit as I was writing mine. I'm going to reiterate what others have said. Start with 4K pages -- it makes life easier believe it or not. Then if you want to mix page sizes down the road you can once you absolutely understand the concept.

0b 0101 0101 0 - PML4T -> 0x0AA -> 170
0b 1100 1100 1 - PDPT -> 0x199 -> 409
0b 1101 1110 0 - PD -> 0x1BC -> 444
0b 0100 0100 1 - PT -> 0x089 -> 137

Virtual Address of PD: 511:511:170:409:444 -> 0xFFFF FFFF D559 9DE0 -> 0b‭1111 1111 1111 1111 1111 1111 1111 1111 1101 0101 0101 1001 1001 1101 1110 0000‬
Correct?

eryjus · **Posted:** Sun May 21, 2017 4:21 pm

Nice.

Now, pretend your PML4 entry does not exist (p == 0), what address would you use to manage that entry? Oh by the way, if your PML4 entry has p == 0, then you do not have a PDP Table, and once you get that established, you will need to update the proper entry in the PDPT... what is its address? You've already done the PD, and I took care of the PT for you.

Once you understand all of this, write some simple functions to take an address and return the address for each level of the 4 tables. Test the heck out of them with a simple user-space program to write the output based on an address you provide before you copy them into your kernel. Test lots of addresses. Check the output against your hand work and against the Paging wiki already mentioned (there are address ranges you should expect to see in your output).

These functions will become part of the foundation of your Virtual Memory Manager for 4K pages. Make sure you have defined what types of things you are going to put in what address ranges. It will make your life easier. I have been careful to distinguish between a page/address (VMM-related) versus a frame (PMM-related), it helps to keep this distinction in mind because different processes with 2 different Paging trees can have different data stored at the exact same address (backed by different frames of course).

From here it gets a bit more complicated for larger page sizes, but the concepts are still the same. If you really need to go down this path with mixed page sizes (and no one is recommending this right now), make it easy on yourself and have 2 separate frame allocators (1 for small 4K frames and 1 for large frames -- in essence 1 allocator for each frame size) and make each allocator manage frames in separate physical frame ranges that do not cross into each other's space.

Finally, since you are only 2 weeks into your new hobby, I will offer this bit of additional advice: Hobby OS development far more a research project than it is a coding project.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

eryjus wrote:

Nice.

Now, pretend your PML4 entry does not exist (p == 0), what address would you use to manage that entry? Oh by the way, if your PML4 entry has p == 0, then you do not have a PDP Table, and once you get that established, you will need to update the proper entry in the PDPT... what is its address? You've already done the PD, and I took care of the PT for you.

Once you understand all of this, write some simple functions to take an address and return the address for each level of the 4 tables. Test the heck out of them with a simple user-space program to write the output based on an address you provide before you copy them into your kernel. Test lots of addresses. Check the output against your hand work and against the Paging wiki already mentioned (there are address ranges you should expect to see in your output).

These functions will become part of the foundation of your Virtual Memory Manager for 4K pages. Make sure you have defined what types of things you are going to put in what address ranges. It will make your life easier. I have been careful to distinguish between a page/address (VMM-related) versus a frame (PMM-related), it helps to keep this distinction in mind because different processes with 2 different Paging trees can have different data stored at the exact same address (backed by different frames of course).

From here it gets a bit more complicated for larger page sizes, but the concepts are still the same. If you really need to go down this path with mixed page sizes (and no one is recommending this right now), make it easy on yourself and have 2 separate frame allocators (1 for small 4K frames and 1 for large frames -- in essence 1 allocator for each frame size) and make each allocator manage frames in separate physical frame ranges that do not cross into each other's space.

Finally, since you are only 2 weeks into your new hobby, I will offer this bit of additional advice: Hobby OS development far more a research project than it is a coding project.

I expect if PML4T[0] = 0? Then the address to manage PML4T[0] is 511:511:511:511:0 (in dotted decimal #Networking :mrgreen:

) and for PML4T[1] it would be 511:511:511:511:1 (you would have to fill the leading 2Byte with 1s in both cases for canonical addresses and the last 3Bit with 0s). I'll do the other things as a homework :mrgreen:

Well thanks a lot to all of you! =D>

eryjus · **Posted:** Sun May 21, 2017 6:52 pm

No problem.

Just to clear something up, when I say, "p == 0," I am referring to the present bit in the table entry.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

eryjus wrote:

No problem.

Just to clear something up, when I say, "p == 0," I am referring to the present bit in the table entry.

Well is PML4T[0] == 0 not right then? That means that the pointer that PML4T entry zero points at is zero, doesn't it? The only other option would be if CR3 is 0, but then there wouldn't be a PML4T at all and i would get a Triple Fault :mrgreen:

Another question: How do i handle interrupts in a programm? Because my OS is currently mapped to it's physical address. In a programm, it will be mapped to the highest PDPT entry (=511). Now if an interrupt happens, it will try to jump to the destination of the IDT which is in the virtual area of the programm. Do i have to map the OS to the same address in "kernel mode" and "user mode"?

eryjus · **Posted:** Mon May 22, 2017 9:01 am

MathiLpHD wrote:

Well is PML4T[0] == 0 not right then? That means that the pointer that PML4T entry zero points at is zero, doesn't it?

If PML4T[0] points to Frame 0, that is fine. My call-out is to distinguish between pointing to frame 0 and not being set. That is why you would check the present bit.

MathiLpHD wrote:

In a programm, it will be mapped to the highest PDPT entry (=511).

That's going to depend on which PML4 Entry you use -- as long as you are not using PML4T[511], PDPT[511] then you should be fine.

MathiLpHD wrote:

Do i have to map the OS to the same address in "kernel mode" and "user mode"?

This becomes a design decision. You will want to make sure you consider performance.

On one hand, you can map all your kernel space for each process. Then, when you call the kernel, you will have all your structures and code available already within the Paging tree. If you keep all of your kernel space mapped to the same PML4 Entries (for illustration, say 500 - 510), then you can reuse those frames for each additional page table. Making this exclusive that the highest level makes the entire "copy the kernel paging tables for my new user process" as simple as mapping the address at the user PML4T to the already existing frame for the rest of the paging tables and the rest of the tree follows.

On the other hand, you can define a "landing zone" for your system calls that is available for all your processes and keep the kernel out of the paging tree. Then when your OS call lands, it will replace the cr3 with a kernel version (which must also include this landing zone) and then restore the cr3 on the way back out. Just keep in mind that writing to the cr3 register invalidates the Translation Lookaside Buffer (TLB) and doing that when it is not necessary drags performance down. This will happen enough times naturally during a task swap. This is probably a bad idea, but if you want to do this make sure you know what is happening and that the consequence is acceptable to the goals for your OS.

So, do you have your memory space mapped out yet for your OS? If not, this would be a really good time to do that (and play with the map with your Paging Tables test program to see what patterns you can find -- and adjust your memory map to suit your needs). It will become part of a road map for all the rest of your OS development.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

eryjus wrote:

On one hand, you can map all your kernel space for each process. Then, when you call the kernel, you will have all your structures and code available already within the Paging tree. If you keep all of your kernel space mapped to the same PML4 Entries (for illustration, say 500 - 510), then you can reuse those frames for each additional page table. Making this exclusive that the highest level makes the entire "copy the kernel paging tables for my new user process" as simple as mapping the address at the user PML4T to the already existing frame for the rest of the paging tables and the rest of the tree follows.

On the other hand, you can define a "landing zone" for your system calls that is available for all your processes and keep the kernel out of the paging tree. Then when your OS call lands, it will replace the cr3 with a kernel version (which must also include this landing zone) and then restore the cr3 on the way back out. Just keep in mind that writing to the cr3 register invalidates the Translation Lookaside Buffer (TLB) and doing that when it is not necessary drags performance down. This will happen enough times naturally during a task swap. This is probably a bad idea, but if you want to do this make sure you know what is happening and that the consequence is acceptable to the goals for your OS.

So, do you have your memory space mapped out yet for your OS? If not, this would be a really good time to do that (and play with the map with your Paging Tables test program to see what patterns you can find -- and adjust your memory map to suit your needs). It will become part of a road map for all the rest of your OS development.

Well wouldn't i also have to include the IDT and GDT? Or is the CPU able to change the CR3 when an interrupt happens?

So this also means that i have to map my kernel in it's own paging structure to the same address as i do with userspace pages? So i can't map my kernel in the KPS (=kernel paging structure) to 0x0 and in userspace programms to 0x7F 8000 0000 (= 510th PDPT entry) right?

And you can see the current progress on my OS at https://github.com/MathiLpHD/MOS under system/kernel/src/
I have to say that i copied some code (which is not even in use currently except for multiboot.h/.c) from https://github.com/grahamedgecombe/arc because i needed some orientation and because most things i found on the internet where for x86 and x86-64 changes a LOT of things... especially if you thought you could develop your own OS without reading the manuals.

EDIT: And if you are wondering what i am currently working on: I am trying to somehow include dlmalloc into my system.

LtG · **Joined:** Thu Aug 13, 2015 4:57 pm **Posts:** 384

MathiLpHD wrote:

Well wouldn't i also have to include the IDT and GDT? Or is the CPU able to change the CR3 when an interrupt happens?

In practice yes, you will have to map IDT into every address space. Task gates _could_ be used in PM, and will cause CR3 to be changed, but since nobody uses hardware task switching it's really not worth it.

GDT is similar in practice, you will want it in every address space. I think it might be possible to circumvent that, but there's really no point. The IDT and GDT can be the same (and usually is) for every process, which means you can map the same underlying physical memory to each process, so there's really no extra overhead. You can also put it together with your kernel which means you don't need extra paging resources for it either and as your kernel is mapped to every process so are the GDT and IDT.

eryjus wrote:

Then when your OS call lands, it will replace the cr3 with a kernel version (which must also include this landing zone) and then restore the cr3 on the way back out. Just keep in mind that writing to the cr3 register invalidates the Translation Lookaside Buffer (TLB) and doing that when it is not necessary drags performance down.

Just wanted to point out that Intel does support PCID's (ASID's) so the TLB doesn't have to be flushed. Also I would imagine the TLB utilizes the normal caches normally and so you wouldn't have to go all the way to RAM unless the app has gone thru a lot of data (MiB's).

Sorry to go a bit OT, but does anyone know why AMD doesn't support PCID's? At least I can't remember seeing anything about it in their manuals, though I've not read them as much..

OSDev.org

x86-64 Paging - Just to be sure

Who is online