It has costed me 6 days to debug and fix this, but it was
really worth the effort.
It's interesting that
no OSDev resources discussed this important topic before. From further inspection, there seems to be a
high rate of hobby x86-64 kernels that get their leaf functions stacks silently overriden in case of an interrupt triggered in the right place.
Now to the story: somehow the montonic PIT interrupts that get triggered every 1 millisecond badly corrupted my kernel state. At first, I thought the handler code might have corrupted the kernel stack, but minimizing it only to acknowledging the local APIC:
Code:
push %rax
movq $(VIRTUAL(APIC_PHBASE) + APIC_EOI), %rax
movl $0,(%rax)
pop %rax
iretq
led to the same buggy behaviour.
It was weird. Once I enable interrupts and program the PIT to fire at a high rate, things go insane: random failed assert()s and page-fault exceptions get triggered all over the place. I even minimized the handler code more by ditching the IOAPIC and using the PIC in Automatic EOI mode. This has led to the absolute architecturally minimum x86 IRQ handler of:
Code:
iretq
but nothing really changed: the same ugly symptomps prevailed.
After days and days of disassembly and hex dumps, I found that GCC generated this assemly for memcpy() at -O0:
Code:
ffffffff80109c88: 55 push %rbp
ffffffff80109c89: 48 89 e5 mov %rsp,%rbp
[snip]
/* Bochs magic breakpoint */
ffffffff80109caa: 66 87 db xchg %bx,%bx
/* Our manual software interrupt */
ffffffff80109cad: cd f0 int $0xf0
/* Failing code, specially last line */
ffffffff80109caf: 48 8b 45 f8 mov -0x8(%rbp),%rax
ffffffff80109cb3: 0f b6 10 movzbl (%rax),%edx
ffffffff80109cb6: 48 8b 45 f0 mov -0x10(%rbp),%rax
ffffffff80109cba: 88 10 mov %dl,(%rax)
Noticed something? Look again, especially at how the x86 ops accessed the stack. Yes, GCC kept parts of the leaf function local state
below the stack pointer. Now when the interrupt was soft-triggerd, the CPU rightfully pushed CPU counter, status word, and stack pointer (%ss, %rsp, %rflags, %cs, %rip) which meant that parts of the kernel state got corrupted through the implicit CPU stack usage!
Now certainly the generated code is
interrupts unsafe. Scanning the
AMD64 ABI document for any paragraphs that mentioned the stack, the reason was found:
it's the red zone. The zone is a 128-byte area,
below the stack, mandated by the x86-64 ABI to be safe for use to leaf functions. It's also safe for higher level functions to use before they call any other function, where they'll need to 'reserve' the used parts of the zone beforehand by moving the stack further down.
All what was needed to fix the bug, like a magic pill, was instructing GCC not to use this x86-interrupts-unsafe zone:
Code:
-mno-red-zone
Note that the bug was much easier to trigger at -O0 due to -O0's heavy stack usage. At -O2 and -O3, the bug got triggered with much less frequency.
Everything became sane afterwards: the heavy test cases now works well while the PIT is firing rapidly at all possible optimization levels. I would really like to thank
Brendan for advising me to further investigate the issue using Bochs binary single-stepping debugger when I was stuck
