OSDev.org
https://forum.osdev.org/

Very lost on botched interrupts causing triple fault
https://forum.osdev.org/viewtopic.php?f=1&t=37341
Page 1 of 3

Author:  austanss [ Mon Oct 12, 2020 5:08 pm ]
Post subject:  Very lost on botched interrupts causing triple fault

This is the repo I will be referring to in this post:
https://github.com/novavita/Novix
(branch: work/rizet+pontaoski/keyboard-input)

I am contributing to an open source OS project, with some of my friends helping out. (not solo here, I didn't write everything)

When I arrived, they had numerous boot bugs, and I spent the last week resolving them. Fixed the multiboot header, and fixed the triple faults.

Or so I thought.

My "fix" for the triple faults involved adding a `cli` instruction before the idling loop, which comes after execution of kernel_main. When time came to write the keyboard driver, this was pointed out to me, and once I removed it, triple faults. I have done a lot of investigation into this.

Please refer to the repo, I don't want too much text on the screen.

In *.../i686/Boot.S*, you can see a `cli` instruction and an `sti` instruction. Upon removal of these lines, the triple faults stop, but the interrupts do not work. If *sti* exists at all the OS triple faults. This seems like nonsensical behavior, so I concluded that the interrupts are botched. I didn't write the interrupts. The person who did has no idea what it does and what the problem may be (i suspect copypasta). I could try to make sense out of this, but I've spent hours and I have gotten nowhere. I turned to this forum for help.

Attachment:
Screenshot_20201012_191400.png
Screenshot_20201012_191400.png [ 23.75 KiB | Viewed 1613 times ]

Author:  Octocontrabass [ Mon Oct 12, 2020 5:19 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.

(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)

Author:  austanss [ Mon Oct 12, 2020 5:22 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Octocontrabass wrote:
Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.

(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)


Attachment:
Screenshot_20201012_192216.png
Screenshot_20201012_192216.png [ 19.81 KiB | Viewed 1612 times ]

Author:  Octocontrabass [ Mon Oct 12, 2020 5:43 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?

Author:  austanss [ Mon Oct 12, 2020 5:45 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Octocontrabass wrote:
Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?

Paste: https://pastebin.com/utfPSLVr

Like I said, I have no idea how to read this.

Author:  Octocontrabass [ Mon Oct 12, 2020 6:02 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Code:
     0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2

Interrupt 8 (v=08) is triggered by external hardware (i=0) while your code is running at address 0x080492a2.

Code:
check_exception old: 0xffffffff new 0xd
     1: v=0d e=0042

The CPU raises an exception. In this case it's #GP (v=0d) and the error code (e=0042) indicates a fault with the IDT entry 8.

Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.

Should external hardware be raising interrupt 8?

Author:  austanss [ Mon Oct 12, 2020 6:09 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Octocontrabass wrote:
Code:
     0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2

Interrupt 8 (v=08) is triggered by external hardware (i=0) while your code is running at address 0x080492a2.

Code:
check_exception old: 0xffffffff new 0xd
     1: v=0d e=0042

The CPU raises an exception. In this case it's #GP (v=0d) and the error code (e=0042) indicates a fault with the IDT entry 8.

Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.

Should external hardware be raising interrupt 8?


From reading the IRQ table, IRQ 8 is the CMOS clock... as I'm virtualizing in QEMU via the qemu command rather than a full configured machine... I don't see why. As well, our interrupt handler doesn't do anything with IRQ 8...

from GDT.cpp
Code:
struct Registers
{
        uint32_t ds;
        uint32_t edi, esi, ebp, esp, ebx, edx, ecx, eax;
        uint32_t interruptNumber, errorCode;
        uint32_t eip, cs, eflags, useresp, ss;
};


void ISRHandlerImpl(Registers& registers)
{
        if (registers.interruptNumber == 1 || registers.interruptNumber == 33) {
                auto keycode = inb(0x60);

                if (keycode < 0) {
                        return;
                }

                Terminal::instance->write(keycode);
                Terminal::instance->write("\n");
        }
}

void ISRHandler(Registers registers)
{
        outb(0xA0, 0x20);
        ISRHandlerImpl(registers);
        outb(0x20, 0x20);
}


Author:  Octocontrabass [ Mon Oct 12, 2020 6:23 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

rizxt wrote:
From reading the IRQ table, IRQ 8 is the CMOS clock...

IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?

You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.

Author:  austanss [ Mon Oct 12, 2020 6:27 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Octocontrabass wrote:
rizxt wrote:
From reading the IRQ table, IRQ 8 is the CMOS clock...

IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?

You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.

We have not yet configured the interrupt controller.

Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling? If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...

Author:  Octocontrabass [ Mon Oct 12, 2020 6:44 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

rizxt wrote:
I'm not quite sure if the interrupt controller was configured.

You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.

rizxt wrote:
Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling?

Probably not. You have hardware raising interrupt 8, and you won't be able to tell if interrupt 8 was caused by an IRQ or by a #DF exception. It'll be more helpful once you get the interrupt controller configured to not overlap with exceptions.

rizxt wrote:
If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...

The best resources are the Intel and AMD manuals. However, there's a pretty good overview on the wiki.

Author:  austanss [ Mon Oct 12, 2020 6:48 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Octocontrabass wrote:
rizxt wrote:
I'm not quite sure if the interrupt controller was configured.

You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.


I edited my post, to clarify, I haven't configured the interrupt controller.

Author:  Octocontrabass [ Mon Oct 12, 2020 6:52 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

Then you need to configure the interrupt controller.

Author:  austanss [ Mon Oct 12, 2020 6:56 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

End of story. Thank you! This explains why clearing interrupts prevented the triple fault. Thank you for explaining it so clearly to me!

Author:  MichaelPetch [ Mon Oct 12, 2020 7:24 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

I don't know anything about anything but the fact there appears to be a Linux user space address is probably the result of your link options. I looked in your code and I see nothing that does such a mapping which tells me 0x080492a2 is likely bogus and still a result of bad linking. It really is as if your linker script isn't being picked up.

I suspect your triple fault isn't just a matter of not configuring the PIC to remap the interrupts but also because your interrupt routine address is bogus. I happened to amend the build options in Meson (I got it working after I went and pulled out the latest Meson builds, Meson 0.45 is not enough but your docs don't specify a minimum version). Wonder what would happen if you add`-fno-pic` in your main meson.build along with `-static`, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option. The `-z` option is causing the `-T` option to be ignored which tells the linker what linker script to use. When I made these changes it didn't triple fault but the output of `-d int` with QEMU does show the timer is coming in on Interrupt 8 and keyboard on Interrupt 9 since you aren't remapping the the master PIC yet. The end result is the keyboard and timer interrupts are running the wrong service routine but doesn't fault with changes I have made.

Author:  austanss [ Mon Oct 12, 2020 7:38 pm ]
Post subject:  Re: Very lost on botched interrupts causing triple fault

MichaelPetch wrote:
Wonder what would happen if you change `-static` to `-fno-pic` in your main meson.build, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option.


Easy question. Replacing "-static" to "-fno-pic" breaks the multiboot header, and getting rid of "-z" would have no effect, as clang already ignores "-z" and "-T", anyway. :mrgreen:

Page 1 of 3 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/