[SOLVED] Double Fault occurs randomly on boot after recompiling

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
torii
Posts: 7
Joined: Sun Feb 02, 2025 5:59 pm
GitHub: https://github.com/Toriiiiiiiiii
Contact:

[SOLVED] Double Fault occurs randomly on boot after recompiling

Post by torii »

Randomly, after recompiling my kernel (typically when making changes that affect the IDT, such as installing a new IRQ handler), a double fault is triggered on boot, after installing the ISR handlers.

I have verified that the initial interrupt triggered is a General Protection Fault, however another interrupt is being triggered before the exception handler is executed.

This has been an issue for a while now, but I have only just been able to capture the event in a debugger.

On all other boots, this issue does not occur and everything is set up and working.
Relevant Code:

Setting up IDT & ISR Handlers

Code: Select all

void idt_set_descriptor(uint8_t vector, void* isr, uint8_t flags) {
    idt_entry_t* descriptor = &idt[vector];

    descriptor->isr_low        = (uint32_t)isr & 0xFFFF;
    descriptor->kernel_cs      = 0x08; // this value can be whatever offset your kernel code selector is in your GDT
    descriptor->attributes     = flags;
    descriptor->isr_high       = (uint32_t)isr >> 16;
    descriptor->reserved       = 0;
}

void idt_init() {
    __asm__ volatile("cli");
    idtr.base = (uintptr_t)&idt[0];
    idtr.limit = (uint16_t)sizeof(idt_entry_t) * IDT_MAX_DESCRIPTORS - 1;

    memset((char*)idt, 0, sizeof(idt_entry_t) * 256);

    __asm__ volatile ("lidt %0" : : "m"(idtr)); // load the new IDT
    __asm__ volatile ("sti"); // set the interrupt flag
}

void isrs_install() {
    __asm__ volatile("cli");
    for (uint8_t vector = 0; vector < 32; vector++) {
        idt_set_descriptor(vector, isr_stub_table[vector], 0x8E);
    }
    __asm__ volatile("sti");
}
Kernel init sequence

Code: Select all

void _kmain(void) {
    vga_init();
    vga_puts("SOLKERN V0.1\n");

    vga_puts("Setting up Interrupt Descriptor Table.....\n");
    idt_init();
    vga_puts("OK!\n");

    vga_puts("Initializing Interrupt Service Routines.....\n");
    isrs_install();
    vga_puts("OK!\n");
    // Fails here
    
    while(1) {}
}
Attachments
double fault qemu.png
double fault qemu.png (11.54 KiB) Viewed 6429 times
double fault gdb.png
Last edited by torii on Wed Feb 05, 2025 7:16 am, edited 1 time in total.
Writing bad code since 2019
Image Image
sebihepp
Member
Member
Posts: 197
Joined: Tue Aug 26, 2008 11:24 am
GitHub: https://github.com/sebihepp

Re: Double Fault occurs randomly on boot after recompiling

Post by sebihepp »

Did you remap the PIC (Programmable Interrupt Controller)?
torii
Posts: 7
Joined: Sun Feb 02, 2025 5:59 pm
GitHub: https://github.com/Toriiiiiiiiii
Contact:

Re: Double Fault occurs randomly on boot after recompiling

Post by torii »

Yeah, as I said it works perfectly fine almost every time except occasionally on the first boot after recompiling. As far as i’m aware, I have set up everything (GDT, IDT, PIC etc) correctly.
Writing bad code since 2019
Image Image
MichaelPetch
Member
Member
Posts: 807
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Double Fault occurs randomly on boot after recompiling

Post by MichaelPetch »

In `idt_init` you enable interrupts with STI. One problem I see is that `idt_init` doesn't actually set interrupt/exception handlers (The IDT is filled with zeroes). If an interrupt were to occur in the time after STI is executed and before you actually set the interrupt handlers you'll likely get a fault because there isn't a valid IDT entry to handle it. As well if a timer interrupt occurs before the PICs are remapped IRQ0 will appear to come in as a Double Fault exception. I wonder if that is why in your screenshot that EIP is 0x00000008 (which doesn't seem like it is a valid EIP) - IRQ0 likely came in and mistakenly triggered the Double Fault exception handler and since there isn't an error code pushed with external IRQ some things aren't in the expected place in your stack frame. Possibly the 0x00000008 you print for EIP is actually CS; the error code contains EIP; and CS hold the flags (each offset by 4 bytes because no error code was pushed).

I don't see where you are remapping the PICs or installing the IRQ handlers in the code you are showing in your question. Your Github repository seems to have a `pic_install` that remaps the PICs and enables all the interrupts and sets a default IRQ handler for each of the 16 external interrupts . I think you've left out some important parts of your code in this question.

Anyway, don't issue an STI until you have finished installing proper IRQ handlers into your IDT and after remapping the PICs. Enabling them early with STI can leave a window open for an interrupt to arrive (like the timer) before the IRQ handlers are properly installed. It is possible the timing in QEMU is different just after recompiling your code because of what isn't cached in memory potentially slowing down QEMU's emulation the next time it is run.
torii
Posts: 7
Joined: Sun Feb 02, 2025 5:59 pm
GitHub: https://github.com/Toriiiiiiiiii
Contact:

Re: Double Fault occurs randomly on boot after recompiling

Post by torii »

MichaelPetch wrote: Tue Feb 04, 2025 5:05 pm In `idt_init` you enable interrupts with STI.
This seems to be a likely culprit, I didn’t consider this when moving my ISR install code to a separate function.
MichaelPetch wrote: Tue Feb 04, 2025 5:05 pm I don't see where you are remapping the PICs or installing the IRQ handlers in the code you are showing in your question.
Turns out I was a bit of an idiot and removed the call to the function while trying to track down the crash - this caused the same error to happen but instead of only occasionally, it happened on every boot. This makes me suspect that it is almost definitely a misinterpreted IRQ being triggered before the PIC has been remapped.

Thanks for the help- I will test this and update you when I get home after college :D
Writing bad code since 2019
Image Image
torii
Posts: 7
Joined: Sun Feb 02, 2025 5:59 pm
GitHub: https://github.com/Toriiiiiiiiii
Contact:

Re: Double Fault occurs randomly on boot after recompiling

Post by torii »

torii wrote: Wed Feb 05, 2025 4:39 am I will test this and update you when I get home after college :D
Removing the `sti` instructions until I was done with setting up interrupts seems to have worked.
Thanks for the help :D
Writing bad code since 2019
Image Image
Post Reply