bzt wrote:
Btw, it does not fail when you enable paging. It fails when you use lgdt, which is the first instruction that uses ds and a memory operation.
Enabling paging makes it fail when it tries to fetch the LGDT instruction. The LGDT is never executed. I verified this by putting an infinite loop right before the LGDT instruction. Sure enough, I get a page fault when trying to fetch the JMP instruction.
I am getting pretty convinced that my page tables have a problem. What's confusing me is that the same code (same page tables) run fine with the EFI version. Manual inspection also seems to show that everything is fine. Clearly I am missing something... In one case (BIOS) I compiled for ia32. In the EFI case I compile for x86_64. That's an obvious difference. So I've been trying to find anything in that paging code that would be different (type casts for example), but I don't see anything. I do a few explicit casts from 64 bits to 32 bits for physical addresses returned by the memory allocator, but this should be fine as all physical addresses are within the first 4 GB of memory (I have verified this as well).
bzt wrote:
For one, I don't see where you set up cr3 (which is required for long mode as you know).
Just before calling the assembly trampoline code, I call vmm_enable() here (which only sets CR3 in this case, it does not turn on paging):
https://github.com/kiznit/rainbow-os/bl ... t.cpp#L150Which ends up setting CR3 here:
https://github.com/kiznit/rainbow-os/bl ... 64.hpp#L89I know this parts work because I printed the value on on the screen and it matches what I see in the exception info from QEMU.
bzt wrote:
You should start
qemu with monitor, and try "info mem" and "info tbl" to list paging tables to see if they're correct. Also you could dump (with "x") the tables and compare them with the dump made in bochs. I think they should be the same bit by bit (unless you allocate the memory for the page tables dynamically). If the tables differ, then you should investigate why. Comparing the control registers' values could be useful too.
Thanks, I'll take a look tonight as I am at work right now. I don't have much experience debugging with QEMU. I am much more used to Bochs, where things work for some reason. The tables cannot match bit by bit as both emulator use a different BIOS and the memory for the page tables is dynamically allocated (ending up in different locations under each emulator). But I could try to use static tables instead to see if it helps track down the issue. It's a good idea to try to have both emulators be in the same state.
bzt wrote:
As a general rule, if your code works in one emulator but not the other, you should always look for any difference, then try to figure out what's causing that difference. Once you've found that, you'll know how to fix.
I run the exact same code on both emulators. So I am looking for differences between Bochs and QEMU, but this kind of information is hard to come by. Hence my post.
bzt wrote:
We should see if CR3=00000000bffcf000 is correct or not (that's around 3G, it's likely that the vm has less RAM).
My page allocator uses the memory info provided by GRUB. I initialize QEMU with 8 GB, which is correctly reported by GRUB.
bzt wrote:
Just a sidenote, the elf loader relies on ELF Section Headers. They may not exists, it would be better to use Program Headers.
This doesn't sound very likely... Both readelf and objdump confirms that my kernel does have section headers. The code also crashes after I load the kernel and before I jump to it (i.e. the crash is within the bootloader and has nothing to do with the kernel loading).
The exact same image run fines under Bochs. The ia32 version also runs fine other both Bochs and QEMU.
Thanks for your help!