Hi,
mariuszp wrote:
I mask the interrupt, send EOI, and then unmask it once the driver plays with the appropriate register.
I suspected...
The "mask then EOI early, then unmask at the end" scheme is an ancient hack used by some kernels to bypass the IRQ priority scheme in PIC chips, because the IRQ priority scheme in PIC chips is hard-wired and relatively silly (e.g. high performance devices like NICs end up with very low priority IRQs, while low performance devices like PS/2 keyboard ends up with very high priority IRQ).
For APICs the kernel controls the IRQ priority scheme directly (kernel decides what the IRQ's priority should be and selects the "interrupt vector" to reflect that), so there's very little reason to want "mask then EOI early, then unmask at the end".
Also note that for micro-kernels with drivers in user-space; the IRQ priority scheme is mostly irrelevant. The IRQ priority scheme determines the order IRQs are received by kernel, which determines the order kernel sends "IRQ occurred notifications" to threads in user-space (which usually causes the thread/s to be unblocked); but its the scheduler that decides which thread in user space will run when you switch from kernel back to user-space and not the IRQ priority scheme. For this reason, doing extra work (diddling with masking/unmasking IRQs) isn't going to make any difference or be worthwhile.
mariuszp wrote:
I derive the PCI IRQ routing via ACPICA, I don't just look at the MADT.
How can you figure out how many IO APICs there are (and the physical address of each one) without looking at the MADT?
mariuszp wrote:
What's happening is that the system crashes at one point (after initiailizing the keyboard, for some reason; but that's possibly a coincidence), and when interrupted with an NMI, I *always* catch it inside the irqMask() or irqUnmask() function, on the instruction that follows a read from (*iowin). The interrupts are disabled at this point (as confirmed by the pushed RFLAGS value); however, it is definetely not *stuck* there, because each time it is interrupted, the stack trace below irqMask()/irqUnmask() is competly different (it's running in different threads).
EDIT: Furthermore, I only seem to see 2 stacks: the keyboard driver's interrupt handling thread, and the Intel PRO/1000 driver's interrupt handling thread.
This is the same "IRQ flood" problem that Korona (correctly) diagnosed and described days ago.
Essentially; a device tells the IO APIC that it wants an IRQ, the IO APIC tells the CPU, the IRQ handler doesn't service the device but does EOI (with or without pointless masking/unmasking in there somewhere); and after this the device is still telling the IO APIC it wants an IRQ (because the driver didn't service the device properly) so the IO APIC tells the CPU again and.... this results in constant IRQs that never stop. You check the instruction at EIP (at any point in time) and it has to be something that's executed during IRQ handling because that's all the CPU is (repeatedly) executing.
There are only 3 likely causes of "IRQ flood":
- You have no driver for the device causing the IRQ (and failed to "logically disconnect" the device from the PCI bus and/or failed to leave that IRQ masked)
- You do have a driver for the device causing the IRQ and it fails to service the device properly (e.g. maybe the device is trying to tell you something like "my buffer is full" and your device driver is not doing anything to empty the buffer).
- You messed up the IRQ routing (e.g. wrong device driver notified when IRQ occurs)
Cheers,
Brendan