Page table manipulation

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
songziming
Member
Member
Posts: 69
Joined: Fri Jun 28, 2013 1:48 am
Contact:

Page table manipulation

Post by songziming »

I'm writing a (64-bit mode) page table manipulation module. The interface is very simple:

Code: Select all

uint64_t mmu_create_table();
void mmu_free_table(uint64_t cr3);
uint64_t mmu_translate(uint64_t cr3, uint64_t va, uint64_t *attrs);
void mmu_map(uint64_t cr3, uint64_t va, uint64_t va_end, uint64_t pa, uint64_t attrs);
void mmu_unmap(uint64_t cr3, uint64_t va, uint64_t va_end);
But I find the implementation is more difficult than I thought. There are many details and caveats.
  • mmu_map can overwrite existing mapping.
    I use 2M and 1G pages when possible, in order to use less memory. When remapping (a sub region of) a 2M large page, what's left out have to be remapped (using 4K pages).
    If 512 continuous 4K pages also map to continuous physical pages, and have same attributes, then they can be replaced with a single 2M entry.
That makes code very complex, I have to write map/unmap functions for each level of page table.

I want to support demand paging, growable stack/heap, dynamic linking, etc. So page table manipulation is required.

Is that the right way to paging? Am I making it too bloated?

Thanks
Reinventing the Wheel, code: https://github.com/songziming/wheel
nullplan
Member
Member
Posts: 1760
Joined: Wed Aug 30, 2017 8:24 am

Re: Page table manipulation

Post by nullplan »

I'd go with something like the Linux interface:

Code: Select all

enum pagesize {PS_4K, PS_2M, PS_1G};
uint64_t mmu_lookup(uint64_t cr3, uint64_t va, enum pagesize *ps);
int mmu_smash_page(uint64_t *pte, uint64_t attrs);
Don't bother with joining contiguous pages up again. That's just complexity you don't need in your life. The only thing you save in doing so is one page, and this is hardly worth the effort. What you do is when mapping pages, you use the largest page size possible. When modifying attributes of a region, you look up the responsible page table entry, and if you only want to change a part of the region, and the page table entry is for a huge page, you smash the PTE by allocating a new page and setting it to all contiguous pages. Then replace the huge page with the smaller page. This way you need to at worst smash a page twice (when going from 1G to 4K page) and allocate 2 pages.

Another thing you should think about is page table sharing. Having no sharing means a lot of waste. The easiest would be to have the kernel-space side shared and the user-space side all private for each process. That does mean that you need all functions twice, but since all processes share the kernel-side mappings, it should be worth it.
Carpe diem!
songziming
Member
Member
Posts: 69
Joined: Fri Jun 28, 2013 1:48 am
Contact:

Re: Page table manipulation

Post by songziming »

Thanks, that makes sense.

I have a `struct page` for each physical page, in which `ent_num` count the number of present entries. When `ent_num` become zero, I can free the page for that table.

As for page table sharing, I'd just mark kernel page entries as global, so reloading cr3 won't flush kernel mappings.
Reinventing the Wheel, code: https://github.com/songziming/wheel
nullplan
Member
Member
Posts: 1760
Joined: Wed Aug 30, 2017 8:24 am

Re: Page table manipulation

Post by nullplan »

songziming wrote:I have a `struct page` for each physical page, in which `ent_num` count the number of present entries. When `ent_num` become zero, I can free the page for that table.
That is not the same as unifying pages into a hugepage where it becomes possible. When you do that, you have to invalidate all 512 possible TLBs, whereas if you smash a page, you only have to invalidate 1. For the most part, I would start out using the largest page size possible for the given region, but not unify back. Only free the entire user-side hierarchy after the process exits.
songziming wrote:As for page table sharing, I'd just mark kernel page entries as global, so reloading cr3 won't flush kernel mappings.
But each process still has their own copies of the kernel-side page tables. This wastes space and causes more communication overhead. Whenever you allocate a new kernel-side page, you now have to write it into the kernel master page table, and then every kernel thread that accesses it will experience a page fault and get to copy the mapping. Whereas if you made all 256 kernel-side entries of the PML4 point to the same physical memory, this wouldn't be necessary. And I don't even know how you would unmap kernel memory. You'd need to remove the mapping from every process.
Carpe diem!
songziming
Member
Member
Posts: 69
Joined: Fri Jun 28, 2013 1:48 am
Contact:

Re: Page table manipulation

Post by songziming »

`struct page` is used to reclaim empty page tables, not for unifying pages into hugepage.

If a page table has no present PTE, the corresponding PDE can also be invalidated, and the space for that page table can be reclaimed. The exception is kernel space tables.

Kernel pml4 references kernel pdp, user pml4 also references kernel pdp (only higher half, item 256~511).

If kernel pdp, pd, pt updates, all user page tables automatically got updated.

If kernel pml4 updates, user page tables remain unchanged, so page fault might happen in kernel range. When page fault happens, OS update pml4 of current process.

Kernel PDP does not get deleted when its present item count is zero, since kernel pdp is referenced by many pml4.
Reinventing the Wheel, code: https://github.com/songziming/wheel
Post Reply