switching to new pagemap causes 0xe and more exceptions

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

I do get a page fault with e=000a and my CR2=ffff800100400000. Expanding on my previous comment - e=000a is also telling us that the page fault is because the page isn't present as well. This suggests there is something really wrong in an entry. In the QEMU monitor I did an `info tlb` . It is long but I have reduced it to a few entries of which 2 are relevant

Code: Select all

...
ffff8000ffc00000: 00000000ffc00000 --P-----W
ffff8000ffe00000: 00000000ffe00000 --P-----W
ffff800100400000: 0003800000033000 X-------W    <------- Messed up
ffff800100401000: 0003800000036000 X-------W    <------- Messed up
ffffffff80000000: 0000000002125000 ----A---W
ffffffff80001000: 0000000002126000 ----A---W
ffffffff80002000: 0000000002127000 ----A---W
...
You can see the physical addresses are wrong and the pages aren't marked present and the NX bit is set. Something has created those bogus entries. I would do an `info tlb` and look for the virtual address giving you a problem on your build (it might be different than mine) and find it in the TLB. .I don't have time this evening to see why, but I would use the QEMU monitor to look at your page structures in physical memory to see what level in the hierarchy the problems happened at and that might five a clue as to where in the code to look.
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

else if (page_table[idx] & PAGE2MB) {
uint64_t *guy = (uint64_t *)((uint64_t)pmm_alloc());
uint64_t old_phys = page_table[idx] & 0x000ffffffffff000;
uint64_t old_flags = page_table[idx] & ~0x000ffffffffff000;
for (int j = 0; j < 512; j++) {
guy[j] = (old_phys + j * 4096) | (old_flags & ~PAGE2MB);
}

is the inverted of PAGE2MB the problem?
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

anyone know???
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

I still dont understand, ive been looking with gdb as to why but i dont get the issue

:c
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

I haven't had time to revisit it yet. In the next couple days I'll be able to look if someone hasn't been able to find it before then.
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

okii :c
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

Just an update

still nothing found, im still trying to figure this out
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

So previously I pointed out I was getting a page fault at CR2=0xffff800100400000 and that we had these entries in `info tlb`:

Code: Select all

ffff800100400000: 0003800000033000 X-------W    <------- Messed up
ffff800100401000: 0003800000036000 X-------W    <------- Messed up
We want to find the page table (or page directory) for virtual address 0xffff800100400000. This address breaks down to:

Code: Select all

PML4 IDX = 256 (0x100)
PML3 IDX =   4 (0x004)
PML2 IDX =   2 (0x002)
PML1 IDX =   0 (0x000)
When I follow each page level using the QEMU monitor starting at physical address 0x8000 (PML4) I end up at a PML1 (page table) that starts with these entries (the first 2 entries are pml1[0] and pm1l[1]):

Code: Select all

(qemu) xp/512g 0x35000
0000000000035000: 0xffff800000033003 0xffff800000036003
0000000000035010: 0x0000000000000000 0x0000000000000000
0000000000035020: 0x0000000000000000 0x0000000000000000
0000000000035030: 0x0000000000000000 0x0000000000000000
... the rest are 0x0000000000000000
Now look closely at those entries. Those are supposed to be physical addresses (of 4KiB page frames) but the upper 17 bits are all set! The lower bits do appear to be physical addresses though. I haven't looked at the code, but maybe you can review your code to find a place where the upper bits are being set (or not being cleared) when updating/setting the physical addresses of page table entries. I am basically providing this to you as a courtesy so that you can attempt to continue the bug hunt.
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

could it be my allocation code?

Code: Select all

uint64_t *find_pte_and_allocate(uint64_t *pt, uint64_t virt) {
  uint64_t shift = 48;
  for (int i = 0; i < 4; i++) {
    shift -= 9;
    uint64_t idx = (virt >> shift) & 0x1ff;
    uint64_t *page_table =
        (uint64_t *)((uint64_t)pt + hhdm_request.response->offset);
    if (i == 3) {
      return page_table + idx;
    }
    if (!(page_table[idx] & PRESENT)) {
      uint64_t *guy =
          (uint64_t *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
      page_table[idx] = (uint64_t)guy | PRESENT | RWALLOWED;
      pt = guy;
    } else if (page_table[idx] & PAGE2MB) {
      uint64_t *guy = (uint64_t *)((uint64_t)pmm_alloc());
      uint64_t old_phys = page_table[idx] & 0x000ffffffffff000;
      uint64_t old_flags = page_table[idx] & ~0x000ffffffffff000;
      for (int j = 0; j < 512; j++) {
        guy[j] = (old_phys + j * 4096) | (old_flags & ~PAGE2MB);
      }
      pt = (uint64_t *)((uint64_t)guy - hhdm_request.response->offset);
    } else {
      pt = (uint64_t *)(page_table[idx] & 0x000ffffffffff000);
    }
  }
  return 0;
}
uint64_t *find_pte_and_allocate2mb(uint64_t *pt, uint64_t virt) {
  uint64_t shift = 48;
  for (int i = 0; i < 4; i++) {
    shift -= 9;
    uint64_t idx = (virt >> shift) & 0x1ff;
    uint64_t *page_table =
        (uint64_t *)((uint64_t)pt + hhdm_request.response->offset);
    if (i == 2) {
      return page_table + idx;
    }
    if (!(page_table[idx] & PRESENT)) {
      uint64_t *guy =
          (uint64_t *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
      page_table[idx] = (uint64_t)guy | PRESENT | RWALLOWED;
      pt = guy;
    } else {
      pt = (uint64_t *)(page_table[idx] & 0x000ffffffffff000);
    }
  }
  return 0;
}
or maybe my bitmask? im unsure
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

Other possibility though is that you are using a virtual address returned by pmm_alloc and treating it as a physical address somewhere.
Your pmm_init stores the virtual address in the free pages to create a linked list (not the physical address). Possibly you are eventually reading those virtual addresses back and treating them as physical addresses. I don't have time to go through your code at the moment. Trying to give you ideas so you can find the bug.
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

In `kvmm_region_alloc` you have:

Code: Select all

      for (uint64_t i = 0; i != amount_to_allocateinpages; i++) {
        void *page = pmm_alloc();
        map(ker_map.pml4, (uint64_t)page, new->base + (i * 4096), flags);
      }        
I believe that it should be:

Code: Select all

      for (uint64_t i = 0; i != amount_to_allocateinpages; i++) {
        void *page = (void *)((uint64_t)pmm_alloc() - hhdm_request.response->offset);
        map(ker_map.pml4, (uint64_t)page, new->base + (i * 4096), flags);
      }
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

That solved that issue but as usual new bugs

Code: Select all

Page Fault! CR2 0xffff800100c001d8
RIP is 0xffffffff80010666
NYAUX Panic! Reason: Page Fault:c
MichaelPetch
Member
Member
Posts: 791
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: switching to new pagemap causes 0xe and more exceptions

Post by MichaelPetch »

I noticed a potential cause of this bug last night but was hoping you might be able to track it down. You likely had a page fault with error code e=0002 which is a write to a non present page. In the QEMU monitor `info tlb` and `info mem` confirm this memory (0xffff800100c001d8) is in a page that is not present. You have many bugs in your vmm and one of the big ones I noticed is in `find_pte`. In particular this function returns back after only descending down to the pml2 (page directory). You need to modify that function to handle 2MiB pages and also descend to the PML1 (page tables) level when not using 2MiB pages.

Your going to need to start learning to debug these issues yourself because OSDev is hard but there is a skill set in doing proper debugging that needs to be learned. Are you using GDB to debug and do you use the QEMU monitor at all?
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

It is not find_pte, im afraid
RayanMargham
Member
Member
Posts: 43
Joined: Tue Jul 05, 2022 12:37 pm

Re: switching to new pagemap causes 0xe and more exceptions

Post by RayanMargham »

Also yes im using gdb and qemu monitor

im not great at using gdb tho
Post Reply