OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 3:35 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Memory Mapping issues
PostPosted: Tue Jun 08, 2021 11:07 pm 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
I have been stuck at the memory manager for a long time and have solved almost all bugs, but now I am stuck at one that I just can't fix. I have a couple of tests in the vmm to check if different parts of it work, 3rd is commented out, but the first two work, the problem is that even though they work, it for some reason only happens when only one of them isn't commented out which is strange as they are not connected in any way.

My Page Table mapping algorithm: I have two page-sized areas as preallocated(by C) global lists in my kernel. One area has an level 1 page table which has an entry for the second area, so by changing address in certain entry of first area(and setting flags) I map the second area to the chosen physical address. These areas are edited into page tables before my kernel loads them.

My Problem: So the first test tests the page table mapping function by mapping a random free page instead of a real page table, while the second one tests the function that maps data into virtual address space, the problem is that if first one isn't commented out, the second one either double faults(because of unahndled page fault) or just hangs as some address in it isn't aligned and I made it an error status which returns to an infinite loop. So the problem is that kmmap() receives some random data from the mapped page table(probably) which includes some random leftover integers which it interprets as an address and fails to continue going through page tables and that is most certainly caused by incorrect mapping from mapPageTable(). The problem gets even more strange when I understand that the second test fails not because I map something in the first one, but only when I map and then write/read from the mapped page(in the first test). Interrupts dump tells about a page fault but it is super useless as I would have guessed by myself on which address it occurs and what address did the instruction tries to access.

Also I would appreciate any ideas on code design improvement as my algorithm of mapping page tables after I reloaded my own page tables into cr3 is pretty clunky?


Last edited by rpio on Sun Jan 23, 2022 4:49 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Wed Jun 09, 2021 12:58 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
ngx wrote:
The problem gets even more strange when I understand that the second test fails not because I map something in the first one, but only when I map and then write/read from the mapped page(in the first test).

It sounds like you're not flushing stale TLB entries. You need a MOV to CR3 or an INVLPG to remove the stale TLB entry after you change the page tables.

Section 4.10.4 of Volume 3A of the Intel SDM explains it in great detail.


Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Thu Jun 10, 2021 12:12 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
Octocontrabass wrote:
ngx wrote:
The problem gets even more strange when I understand that the second test fails not because I map something in the first one, but only when I map and then write/read from the mapped page(in the first test).

It sounds like you're not flushing stale TLB entries. You need a MOV to CR3 or an INVLPG to remove the stale TLB entry after you change the page tables.

Section 4.10.4 of Volume 3A of the Intel SDM explains it in great detail.


A bigggggggggggggggggggggggggggggggggg thank you, I would have never guessed that the problem is with data not being re-flushed in the TLB, now everything works.

So should I run INVLPG everytime I edit page tables? Which is better, reloading CR3 or using INVLPG or should I do both?


Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Thu Jun 10, 2021 9:55 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
You should INVLPG every time you reduce access or change the address. So if Present changes from 1 to 0, RW changes from 1 to 0, Super changes from 0 to 1, or the address changes at all. Otherwise INVLPG is not necessary. Be aware that a stale TLB entry may cause a spurious page fault, or at least Linux has code for detecting and handling those. That is, you may get a page fault even though the access was permitted. In that case, just return. According to the manual it shouldn't happen, but I cannot imagine the Linux guys writing that special case just for fun.

Which to prefer depends. INVLPG is more targeted, MOV to CR3 is more of a shotgun approach. Normally you want to use INVLPG when just changing single mappings, but MOV to CR3 can be the better option if you don't know precisely how much you are changing. For example, my kernel is booting with PML4[0] == PML4[256], and PML4[256] maps all physical memory. So I need to remove all TLBs that came from PML4[0], but I really don't feel like using INVLPG on every 2MB page in physical memory. And at that time, the Global bit is not in use. So MOV to CR3 is just the easiest way to flush all possible TLBs from the identity mapping.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Thu Jun 10, 2021 11:21 am 
Offline
Member
Member

Joined: Sat Feb 20, 2021 3:11 pm
Posts: 93
nullplan wrote:
You should INVLPG every time you reduce access or change the address. So if Present changes from 1 to 0, RW changes from 1 to 0, Super changes from 0 to 1, or the address changes at all. Otherwise INVLPG is not necessary. Be aware that a stale TLB entry may cause a spurious page fault, or at least Linux has code for detecting and handling those. That is, you may get a page fault even though the access was permitted. In that case, just return. According to the manual it shouldn't happen, but I cannot imagine the Linux guys writing that special case just for fun.

Which to prefer depends. INVLPG is more targeted, MOV to CR3 is more of a shotgun approach. Normally you want to use INVLPG when just changing single mappings, but MOV to CR3 can be the better option if you don't know precisely how much you are changing. For example, my kernel is booting with PML4[0] == PML4[256], and PML4[256] maps all physical memory. So I need to remove all TLBs that came from PML4[0], but I really don't feel like using INVLPG on every 2MB page in physical memory. And at that time, the Global bit is not in use. So MOV to CR3 is just the easiest way to flush all possible TLBs from the identity mapping.


I understood, thanks


Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Fri Jun 11, 2021 3:02 pm 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
nullplan wrote:
Be aware that a stale TLB entry may cause a spurious page fault, or at least Linux has code for detecting and handling those. That is, you may get a page fault even though the access was permitted. In that case, just return. According to the manual it shouldn't happen, but I cannot imagine the Linux guys writing that special case just for fun.

On a SMP system, that can always happen if another CPU concurrently mapped a page (assuming that you let threads share page tables). Maybe the Linux code is meant to cover this case?

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Memory Mapping issues
PostPosted: Fri Jun 11, 2021 5:38 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
The Intel SDM volume 3A section 4.10.4.3 explains how stale TLB entries can cause spurious page faults. The Linux code I found for handling spurious page faults has a comment directly referencing that section of the manual.

I recall reading somewhere that some obscure x86 CPUs can have spurious page faults caused by stale TLB entries that aren't possible on Intel or AMD CPUs, but I have no idea if it's true. There are probably errata that can cause spurious page faults too.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Majestic-12 [Bot] and 50 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group