Page fault when searching for ACPI tables

Ethin · **Posted:** Mon Jun 22, 2020 6:11 pm

I'm trying to find the ACPI tables. My kernel maps each area and unmaps it if it can't find the RSDP. It gets only so far before it page faults in QEMU. My kernel maps the memory range F59C0h-F59E4h (size 36). (I've tried to round it up so that it maps the next power of two, but it throws a #gp. If I have it map 4096 pages all the time, the same thing happens; my kernel claims error code 7FFFFFh (according to the AMD manuals, this is a selector index).)
So, what could be causing this? Why is the system claiming a #pf when F59CFh is within the range F59C0h-F59E4h? And is there a better way of handling #gps (other than just reporting it)? I don't have program execution at the moment implemented.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

Ethin wrote:

My kernel maps the memory range F59C0h-F59E4h (size 36).

You can only map whole pages. Does this mean you've mapped one page that covers everything from 0xF5000 to 0xF5FFF?

Ethin wrote:

(I've tried to round it up so that it maps the next power of two, but it throws a #gp. If I have it map 4096 pages all the time, the same thing happens; my kernel claims error code 7FFFFFh (according to the AMD manuals, this is a selector index).)

That doesn't make sense as a #GP error code, because it claims the fault is due to an external interrupt using an IDT index higher than 255.

Ethin wrote:

So, what could be causing this? Why is the system claiming a #pf when F59CFh is within the range F59C0h-F59E4h?

If you're testing your OS on real hardware or using hardware virtualization, page fault reporting may be delayed when there's a valid entry in the TLB that hasn't been flushed yet. If it immediately reports the fault when you turn off hardware virtualization, that's why.

It could also be IRQ6 if you haven't set up interrupts properly.

Ethin wrote:

And is there a better way of handling #gps (other than just reporting it)?

If your kernel is not intentionally causing #GP, there is nothing you can do besides report it and halt. (I can't think of any reason to intentionally cause #GP off the top of my head, but swapping memory to disk is an example of intentionally causing #PF.)

Ethin · **Posted:** Tue Jun 23, 2020 11:48 am

Octocontrabass wrote:

Ethin wrote:

My kernel maps the memory range F59C0h-F59E4h (size 36).

You can only map whole pages. Does this mean you've mapped one page that covers everything from 0xF5000 to 0xF5FFF?

I'm not exactly sure. My kernel maps 4K pages up to this address. When it maps the range F59C0-F59E4, that is the explicit range it maps; it does not map 4K ranges like it does on every other search.

Octocontrabass wrote:

Ethin wrote:

(I've tried to round it up so that it maps the next power of two, but it throws a #gp. If I have it map 4096 pages all the time, the same thing happens; my kernel claims error code 7FFFFFh (according to the AMD manuals, this is a selector index).)

That doesn't make sense as a #GP error code, because it claims the fault is due to an external interrupt using an IDT index higher than 255.

That's the first error code I get. I believe the IDT is configured properly.

Octocontrabass wrote:

Ethin wrote:

So, what could be causing this? Why is the system claiming a #pf when F59CFh is within the range F59C0h-F59E4h?

If you're testing your OS on real hardware or using hardware virtualization, page fault reporting may be delayed when there's a valid entry in the TLB that hasn't been flushed yet. If it immediately reports the fault when you turn off hardware virtualization, that's why.

It could also be IRQ6 if you haven't set up interrupts properly.

I'm running this purely in Qemu, no hardware virt.

Octocontrabass wrote:

Ethin wrote:

And is there a better way of handling #gps (other than just reporting it)?

If your kernel is not intentionally causing #GP, there is nothing you can do besides report it and halt. (I can't think of any reason to intentionally cause #GP off the top of my head, but swapping memory to disk is an example of intentionally causing #PF.)

Aha, thanks.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

Ethin wrote:

I'm not exactly sure.

Try "info tlb" or "info mem" in the QEMU monitor.

Ethin wrote:

That's the first error code I get. I believe the IDT is configured properly.

It might be time to double-check that your #GP handler reports errors correctly. For example, load a nonsense selector into a segment register and you should receive #GP with that same selector as the error code. (Note that the lowest two bits will be zero in the error code.)

Ethin · **Posted:** Tue Jun 23, 2020 12:30 pm

Octocontrabass wrote:

Ethin wrote:

I'm not exactly sure.

Try "info tlb" or "info mem" in the QEMU monitor.

Ethin wrote:

That's the first error code I get. I believe the IDT is configured properly.

It might be time to double-check that your #GP handler reports errors correctly. For example, load a nonsense selector into a segment register and you should receive #GP with that same selector as the error code. (Note that the lowest two bits will be zero in the error code.)

I just ensured that the address was page-aligned (I had to rebuild my kernel cleanly). I also removed a "push rax" asm instruction in my #gp handler; I had that in because LLVM sometimes had a bug involving register clobbering (not precisely sure what the bug was off the top of my head) on interrupt handlers. After rebuilding my kernel cleanly, neither the page fault or the #gp happen. (Strange how that can happen...)

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

Ethin wrote:

I also removed a "push rax" asm instruction in my #gp handler; I had that in because LLVM sometimes had a bug involving register clobbering (not precisely sure what the bug was off the top of my head) on interrupt handlers.

Could it have been stack alignment? The System V ABI for x64 requires a 16-byte-aligned stack prior to function calls, and pushing one eight-byte register is enough to fix or break it.

For interrupt handlers that intend to return, you have to preserve several registers (including RAX) and either compile your kernel with the red zone disabled or switch to another stack using the IST.

Ethin · **Posted:** Tue Jun 23, 2020 1:52 pm

Most likely, though I'm not sure. Removing that asm instruction makes it return error code 0 when it does #gp. But it doesn't seem to be doing that anymore... now I'm suffering the part where I can't find the RSDP. Is there something I have to do to tell QEMU to provide me ACPI tables? According to the QEMU wiki, using -M 1.6 makes it do this, though I'm not sure how up to date the wiki is. I used -M q35 (because I'm going to proceed to PCE after this is done) but I still can't find the RSDP.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

Ethin wrote:

Most likely, though I'm not sure.

Sounds like you need to add some code to realign the stack. If you're not using the IST to switch to a new stack, you don't know if the stack is aligned correctly when an interrupt occurs.

Ethin wrote:

Is there something I have to do to tell QEMU to provide me ACPI tables?

Not that I know of. If you're using GRUB, it will tell you where to find the ACPI tables. On UEFI systems, searching for the ACPI tables won't work.

Ethin · **Posted:** Tue Jun 23, 2020 4:59 pm

Octocontrabass wrote:

Ethin wrote:

Most likely, though I'm not sure.

Sounds like you need to add some code to realign the stack. If you're not using the IST to switch to a new stack, you don't know if the stack is aligned correctly when an interrupt occurs.

Ethin wrote:

Is there something I have to do to tell QEMU to provide me ACPI tables?

Not that I know of. If you're using GRUB, it will tell you where to find the ACPI tables. On UEFI systems, searching for the ACPI tables won't work.

I'm on BIOS and not using Grub -- a custom bootloader (though hopefully I can switch to one that's more well-known around here). The bootloader doesn't provide me much data other than the memory map, and so I have to find the ACPI tables myself.

OSDev.org

Page fault when searching for ACPI tables

Who is online