APIC Interrupts - Logical Destination Mode
-
- Posts: 13
- Joined: Thu Nov 24, 2011 8:54 am
APIC Interrupts - Logical Destination Mode
I'm a little confused as to what is going on with my interrupt code.
I have configured the LAPICs as follows:
Destination Format Register (DFR) = 0xF0000000 (flat mode)
Logical Destination Register (LDR) = 1 / 2 / 4 / 8 (4 cpus - 1 bit set per cpu)
Task Priority Register (TPR) = 0x00
The IOAPIC IOREDTBL (each entry) is configured as follows:
Destination Mode = Logical
Destinaton Mode = Lowest Priority
In QEMU, I only seem to get interrupts on CPU0 - If CPU0 is still processing an interrupt, it will wait until it is free and then issue the next interrupt to CPU0.
If I set TPR = 0x20 on CPU0, then interrupts are issued to CPU1 (IRQs are mapped to vectors 0x20-0x2F)
On my real hardware (Intel Core i5), I get interrupts on all cores - when IRQ2 fires (Timer), I get an interrupt on each core.
Is there something else I should be configuring - or have I just understood the concept wrong?
I have configured the LAPICs as follows:
Destination Format Register (DFR) = 0xF0000000 (flat mode)
Logical Destination Register (LDR) = 1 / 2 / 4 / 8 (4 cpus - 1 bit set per cpu)
Task Priority Register (TPR) = 0x00
The IOAPIC IOREDTBL (each entry) is configured as follows:
Destination Mode = Logical
Destinaton Mode = Lowest Priority
In QEMU, I only seem to get interrupts on CPU0 - If CPU0 is still processing an interrupt, it will wait until it is free and then issue the next interrupt to CPU0.
If I set TPR = 0x20 on CPU0, then interrupts are issued to CPU1 (IRQs are mapped to vectors 0x20-0x2F)
On my real hardware (Intel Core i5), I get interrupts on all cores - when IRQ2 fires (Timer), I get an interrupt on each core.
Is there something else I should be configuring - or have I just understood the concept wrong?
Re: APIC Interrupts - Logical Destination Mode
Hi,
The correct/expected behaviour for "send to lowest priority" is that the IO APIC (and/or chipset) determines which CPU is currently operating at the lowest priority, then forwards the interrupt to that CPU's local APIC. That CPU's local APIC then delivers the interrupt to the CPU when it can (which may not be immediately).
Cheers,
Brendan
The correct/expected behaviour for "send to lowest priority" is that the IO APIC (and/or chipset) determines which CPU is currently operating at the lowest priority, then forwards the interrupt to that CPU's local APIC. That CPU's local APIC then delivers the interrupt to the CPU when it can (which may not be immediately).
High dword = 0x0F000000 and low dword = 0x0000M9VV, where M is the trigger mode and polarity bits, and VV is the vector?sentientnz wrote:The IOAPIC IOREDTBL (each entry) is configured as follows:
Destination Mode = Logical
Destinaton Mode = Lowest Priority
I wouldn't be surprised if Qemu emulates this "partially correctly" (maybe they screwed it up in a way that doesn't break existing OSs and nobody has noticed). If all CPUs have "TPR = 0x00", then when CPU0 is handling an interrupt CPU0's priority should be slightly higher, and any other "lowest priority delivery" interrupts should be sent to lower priority CPUs.sentientnz wrote:In QEMU, I only seem to get interrupts on CPU0 - If CPU0 is still processing an interrupt, it will wait until it is free and then issue the next interrupt to CPU0.
If I set TPR = 0x20 on CPU0, then interrupts are issued to CPU1 (IRQs are mapped to vectors 0x20-0x2F)
That shouldn't happen if it's configured correctly for lowest priority delivery. Are you sure that the first IRQ2 doesn't go to CPU0, the second to CPU1, the third to CPU2, etc?sentientnz wrote:On my real hardware (Intel Core i5), I get interrupts on all cores - when IRQ2 fires (Timer), I get an interrupt on each core.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Posts: 13
- Joined: Thu Nov 24, 2011 8:54 am
Re: APIC Interrupts - Logical Destination Mode
Ever have one of those days where you just want to slap yourself.
You were right on the money, Brendon.
The CPUs are recieving subsequent IRQ2's (not the same one). Seems the IOAPIC is implementing some kind of round robin to load-balance.
QEMU is still weird, but at least I know its working on real hardware.
Thanks heaps.
You were right on the money, Brendon.
The CPUs are recieving subsequent IRQ2's (not the same one). Seems the IOAPIC is implementing some kind of round robin to load-balance.
QEMU is still weird, but at least I know its working on real hardware.
Thanks heaps.
- IanSeyler
- Member
- Posts: 329
- Joined: Mon Jul 28, 2008 9:46 am
- Location: Ontario, Canada
- GitHub: https://github.com/ReturnInfinity
- Contact:
Re: APIC Interrupts - Logical Destination Mode
I'm a bit stumped by this as well. I would like my RTC interrupt to be handled by any available CPU core in the system.
I have set the APIC as follows:
Task Priority Register = 0
Logical Destination Register = 0xFF000000
Destination Format Register = 0xF0000000 (Flat Mode)
And the IO APIC as follows:
Interrupt 0x28 = 0xFF00000000000928 (All cores, Logical Mode, Lowest priority)
BSP gets all of the interrupts.
If I increment the TPR to 1 the BSP still gets the interrupts.. even though the other cores have a TPR of 0. Is this not how it works?
If I set Interrupt 0x28 on the IO APIC to 0x0100000000000028 then all go to APIC ID 1 and the interrupt is handled properly.
Thanks,
-Ian
I have set the APIC as follows:
Task Priority Register = 0
Logical Destination Register = 0xFF000000
Destination Format Register = 0xF0000000 (Flat Mode)
And the IO APIC as follows:
Interrupt 0x28 = 0xFF00000000000928 (All cores, Logical Mode, Lowest priority)
BSP gets all of the interrupts.
If I increment the TPR to 1 the BSP still gets the interrupts.. even though the other cores have a TPR of 0. Is this not how it works?
If I set Interrupt 0x28 on the IO APIC to 0x0100000000000028 then all go to APIC ID 1 and the interrupt is handled properly.
Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
- IanSeyler
- Member
- Posts: 329
- Joined: Mon Jul 28, 2008 9:46 am
- Location: Ontario, Canada
- GitHub: https://github.com/ReturnInfinity
- Contact:
Re: APIC Interrupts - Logical Destination Mode
Is anyone able to comment on this?
My OS is a mono-tasking system so I know if a CPU core is idle or busy at any point in time. I would like to let interrupts be handled by idle cores if possible. Also is the destination really limited to 8 cores? How would the IO-APIC handle a system with 16 cores?
Thanks,
-Ian
My OS is a mono-tasking system so I know if a CPU core is idle or busy at any point in time. I would like to let interrupts be handled by idle cores if possible. Also is the destination really limited to 8 cores? How would the IO-APIC handle a system with 16 cores?
Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Re: APIC Interrupts - Logical Destination Mode
Hi,
I'd test "TPR = 0x10" (in case the lower 4 bits of TPR are being ignored), and I'd also try 0x0100000000000928 in the IO APIC (in case it's getting confused with "0xFF is broadcast to all"). These are both unlikely things you could try, and neither of them should make any difference.
The only other thing I can think of is bugs - not setting what you think you're setting, not enabling the local APIC in the AP CPU's spurious interrupt register, etc. The BSP does seem right; but I'd also try 0x0200000000000028, 0x0400000000000028, etc in the IO APIC, just to see if the AP CPUs will accept any interrupt at all.
This means that you could (for an example) use bit 0 for the first logical CPU in each core, use bit 1 for the last logical CPU in each physical chip and use bit 2 for the first logical CPU in each NUMA domain. In that case "0x0100000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the first logical CPU in each core; "0x0200000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the last logical CPU in each chip; and "0x0400000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the first logical CPU in each NUMA domain.
Another alternative might be to have 8 groups of IRQs instead (e.g. one bit for each PCI host controller, for up to 8 PCI host controllers). In this case, if a CPU's logical destination register is set to "0x05" then it would receive IRQs from the first and third PCI host controller (but not from the second or fourth PCI host controller).
You could even combine these ideas - bit 0 for the first logical CPU in each core, bit 1 for the last logical CPU in each physical chip, bit 2 for the first logical CPU in each NUMA domain, bit 3 for some special purpose, then bits 4 to 7 for as one bit for each PCI host controller for up to 4 PCI host controllers. In this case, if a CPU's logical destination register is set to 0x95 then it'd be in the group of CPUs that are the first logical CPU in a core and in the group of CPUs that are the first logical CPU in a NUMA domain, and it would receive IRQs from the first and fourth PCI host controllers (but not the second or third PCI host controllers).
Basically, for those 8 bits you can decide what you want to use each of them for.
You can even do this dynamically. For example, during boot you might notice that none of the CPUs support hyper-threading (and therefore all CPUs are the first CPU in their core), and decide not to waste a bit for "first CPU in each core" (and have an extra bit to use for something else in that case).
One idea I had was to reprogram the logical destination register during task switches, and use 4 of the bits for "process ID MOD 4". That way, for "TLB shootdown IPIs" that invalidate TLB entries that correspond to a specific process, I could use "logical destination, fixed delivery" and avoid interrupting (on average) 75% of the CPUs that aren't running that process.
For x2APIC everything changes. The logical destination is 32-bits, but it's in "cluster mode" and it's hardwired (you can't change it). In this case you end up with 16 bits for "cluster ID" (used as a NUMA node ID), and 16 bits for up to 16 logical CPUs per cluster. An IRQ sent with logical destination to "0x00010005" would be received by the first and third CPU in the second NUMA domain.
Cheers,
Brendan
It looks correct to me, and I can't think of anything that would explain the unexpected behaviour.ReturnInfinity wrote:Is anyone able to comment on this?
I'd test "TPR = 0x10" (in case the lower 4 bits of TPR are being ignored), and I'd also try 0x0100000000000928 in the IO APIC (in case it's getting confused with "0xFF is broadcast to all"). These are both unlikely things you could try, and neither of them should make any difference.
The only other thing I can think of is bugs - not setting what you think you're setting, not enabling the local APIC in the AP CPU's spurious interrupt register, etc. The BSP does seem right; but I'd also try 0x0200000000000028, 0x0400000000000028, etc in the IO APIC, just to see if the AP CPUs will accept any interrupt at all.
For xAPIC, the logical destination isn't limited to 8 cores; it's limited to 8 groups of one or more CPUs. The local APIC does something like "if( (IRQ_destionation & logical_dest_register) != 0) { I can receive this IRQ }".ReturnInfinity wrote:Also is the destination really limited to 8 cores? How would the IO-APIC handle a system with 16 cores?
This means that you could (for an example) use bit 0 for the first logical CPU in each core, use bit 1 for the last logical CPU in each physical chip and use bit 2 for the first logical CPU in each NUMA domain. In that case "0x0100000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the first logical CPU in each core; "0x0200000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the last logical CPU in each chip; and "0x0400000000000928" would send the IRQ to the lowest priority CPU in the group of CPUs that are the first logical CPU in each NUMA domain.
Another alternative might be to have 8 groups of IRQs instead (e.g. one bit for each PCI host controller, for up to 8 PCI host controllers). In this case, if a CPU's logical destination register is set to "0x05" then it would receive IRQs from the first and third PCI host controller (but not from the second or fourth PCI host controller).
You could even combine these ideas - bit 0 for the first logical CPU in each core, bit 1 for the last logical CPU in each physical chip, bit 2 for the first logical CPU in each NUMA domain, bit 3 for some special purpose, then bits 4 to 7 for as one bit for each PCI host controller for up to 4 PCI host controllers. In this case, if a CPU's logical destination register is set to 0x95 then it'd be in the group of CPUs that are the first logical CPU in a core and in the group of CPUs that are the first logical CPU in a NUMA domain, and it would receive IRQs from the first and fourth PCI host controllers (but not the second or third PCI host controllers).
Basically, for those 8 bits you can decide what you want to use each of them for.
You can even do this dynamically. For example, during boot you might notice that none of the CPUs support hyper-threading (and therefore all CPUs are the first CPU in their core), and decide not to waste a bit for "first CPU in each core" (and have an extra bit to use for something else in that case).
One idea I had was to reprogram the logical destination register during task switches, and use 4 of the bits for "process ID MOD 4". That way, for "TLB shootdown IPIs" that invalidate TLB entries that correspond to a specific process, I could use "logical destination, fixed delivery" and avoid interrupting (on average) 75% of the CPUs that aren't running that process.
For x2APIC everything changes. The logical destination is 32-bits, but it's in "cluster mode" and it's hardwired (you can't change it). In this case you end up with 16 bits for "cluster ID" (used as a NUMA node ID), and 16 bits for up to 16 logical CPUs per cluster. An IRQ sent with logical destination to "0x00010005" would be received by the first and third CPU in the second NUMA domain.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- IanSeyler
- Member
- Posts: 329
- Joined: Mon Jul 28, 2008 9:46 am
- Location: Ontario, Canada
- GitHub: https://github.com/ReturnInfinity
- Contact:
Re: APIC Interrupts - Logical Destination Mode
Brendan, as usual your post is ridiculously informative.
Groups is ideal actually. So what I have done is set all CPU Cores in the system to logical APIC ID 0x01.
If I configure the IO-APIC to deliver IRQ 0x08 to a specific physical APIC ID it works no problem (0x0000000000000028 for APIC ID 0, 0x0100000000000028 for APIC ID 1, etc). Setting the IO-APIC to 0x0100000000000928 (Logical APIC ID 1, Lowest priority, Logical mode) still only triggers on the BSP. I would think that as soon as I increase the TPR on the BSP to 0x10 that the interrupts should be handled by one of the APs instead. Plus if I set the BSPs TPR to 0x20 I get no interrupts at all.
Could I have missed something else from my initialization of the APIC and IO-APIC?
Thanks,
-Ian
Groups is ideal actually. So what I have done is set all CPU Cores in the system to logical APIC ID 0x01.
If I configure the IO-APIC to deliver IRQ 0x08 to a specific physical APIC ID it works no problem (0x0000000000000028 for APIC ID 0, 0x0100000000000028 for APIC ID 1, etc). Setting the IO-APIC to 0x0100000000000928 (Logical APIC ID 1, Lowest priority, Logical mode) still only triggers on the BSP. I would think that as soon as I increase the TPR on the BSP to 0x10 that the interrupts should be handled by one of the APs instead. Plus if I set the BSPs TPR to 0x20 I get no interrupts at all.
Could I have missed something else from my initialization of the APIC and IO-APIC?
Thanks,
-Ian
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
Re: APIC Interrupts - Logical Destination Mode
Hi,
The first category is things your code might not be doing right. If the AP CPUs will receive "fixed delivery" IRQs from the IO APIC, then it can't be most of these possible problems (e.g. "CLI", the "enable/disable" flag in the local APIC's spurious IRQ register, etc). That only leaves things like the AP CPU's TPR, and maybe the flags in the AP CPU's "Interrupt Received Register" and "In Service Register" bitfields. I'd be tempted to display the values in all the local APICs registers (for the AP CPU/s) and check each of them; although I'd start by checking the (read only) "Processor Priority Register" at "APIC_BASE + 0x000000A0".
The second category is false test results - maybe it actually does work, but the code you're using to determine if it worked or not is dodgy.
The last category is dodgy hardware. This is the least likely, but it is possible. You'd want to try the OS on different computers to see if it works on most but not on some. If it does work on most computers, then you'd want to determine which chipset/s and which CPU/s have problems and read through the corresponding errata. I know that there's at least one problem with the local APIC in ancient Pentium CPUs (where you have to make sure each write to a local APIC register is preceded by a read from a local APIC register).
Cheers,
Brendan
There's 3 categories of possible problems.ReturnInfinity wrote:Could I have missed something else from my initialization of the APIC and IO-APIC?
The first category is things your code might not be doing right. If the AP CPUs will receive "fixed delivery" IRQs from the IO APIC, then it can't be most of these possible problems (e.g. "CLI", the "enable/disable" flag in the local APIC's spurious IRQ register, etc). That only leaves things like the AP CPU's TPR, and maybe the flags in the AP CPU's "Interrupt Received Register" and "In Service Register" bitfields. I'd be tempted to display the values in all the local APICs registers (for the AP CPU/s) and check each of them; although I'd start by checking the (read only) "Processor Priority Register" at "APIC_BASE + 0x000000A0".
The second category is false test results - maybe it actually does work, but the code you're using to determine if it worked or not is dodgy.
The last category is dodgy hardware. This is the least likely, but it is possible. You'd want to try the OS on different computers to see if it works on most but not on some. If it does work on most computers, then you'd want to determine which chipset/s and which CPU/s have problems and read through the corresponding errata. I know that there's at least one problem with the local APIC in ancient Pentium CPUs (where you have to make sure each write to a local APIC register is preceded by a read from a local APIC register).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: APIC Interrupts - Logical Destination Mode
Some possible input to this problem. The reason why two different Intel processors (a core duo and 4 core) have not worked in RDOS is because I changed from "Fixed delivery" to "Lowest priority delivery" for ISA interrupts (among them, the PS/2 keyboard IRQ). When the lowest priority bit is set in the IO-APIC on a Intel system, no keyboard IRQs happen, while on AMD, it seem to work. I have no idea why this is so.
Edit: Not only that. When lowest priority delivery is turned off on AMD, the problems with missing interrupts from the network controller disappears.
I think I will drop lowest priority delivery, and let all interrupts go to BSP.
Edit: Not only that. When lowest priority delivery is turned off on AMD, the problems with missing interrupts from the network controller disappears.
I think I will drop lowest priority delivery, and let all interrupts go to BSP.
Re: APIC Interrupts - Logical Destination Mode
Hi,
For the Intel machines, is it only the keyboard IRQ that causes problems (where the rest of the ISA IRQs work fine with "Lowest priority delivery")? Is the keyboard a real PS/2 device, or is it a USB device (are you relying on the "typically very dodgy" firmware/SMM code that emulates a PS/2 device)? Which chipset/s do these machines use, and have you checked the errata for those chipsets?
For the AMD machine, are you saying "Lowest priority delivery" works for all IRQs except the ethernet controller? If so, I find it very unlikely that the IO APIC behaves differently for the ethernet controller's IRQs alone. Maybe the IO APIC has problems with all IRQs that happen very often. Maybe there's race conditions or re-entrancy issues in your IRQ handling that cause problems when an interrupt handler for a device is started when the same interrupt handler for the same device (for the previous IRQ) is still being processed (on the same CPU or on a different CPU). Maybe it's a problem that would effect any device that has multiple IRQs, and the ethernet card happens to be the only device that has multiple IRQs.
The second problem is pounding the daylights out of the BSP due to not having any IRQ load balancing (which will cause increased IRQ latency). Worst case here is the BSP CPU alone can't keep up under heavy load and performance suffers badly. At a minimum (if you must use fixed delivery) you'd want to spread them around, so that the different devices send their IRQs to different CPUs.
Cheers,
Brendan
It'd be nice to find the actual cause/s of these problems.rdos wrote:Some possible input to this problem. The reason why two different Intel processors (a core duo and 4 core) have not worked in RDOS is because I changed from "Fixed delivery" to "Lowest priority delivery" for ISA interrupts (among them, the PS/2 keyboard IRQ). When the lowest priority bit is set in the IO-APIC on a Intel system, no keyboard IRQs happen, while on AMD, it seem to work. I have no idea why this is so.
Edit: Not only that. When lowest priority delivery is turned off on AMD, the problems with missing interrupts from the network controller disappears.
For the Intel machines, is it only the keyboard IRQ that causes problems (where the rest of the ISA IRQs work fine with "Lowest priority delivery")? Is the keyboard a real PS/2 device, or is it a USB device (are you relying on the "typically very dodgy" firmware/SMM code that emulates a PS/2 device)? Which chipset/s do these machines use, and have you checked the errata for those chipsets?
For the AMD machine, are you saying "Lowest priority delivery" works for all IRQs except the ethernet controller? If so, I find it very unlikely that the IO APIC behaves differently for the ethernet controller's IRQs alone. Maybe the IO APIC has problems with all IRQs that happen very often. Maybe there's race conditions or re-entrancy issues in your IRQ handling that cause problems when an interrupt handler for a device is started when the same interrupt handler for the same device (for the previous IRQ) is still being processed (on the same CPU or on a different CPU). Maybe it's a problem that would effect any device that has multiple IRQs, and the ethernet card happens to be the only device that has multiple IRQs.
There's 2 problems there. The first problem is dropping lowest priority delivery - without lowest priority delivery you'd be interrupting important tasks instead of interrupting unimportant tasks (bad for the performance of important tasks); and it'd also make a mess of any sleep states (e.g. repeatedly taking the CPU out of C1/C2/C3 state and killing any power management while also increasing IRQ latency due to the time needed to take the CPU out of the C1/C2/C3 state, even when other CPUs aren't in sleep states).rdos wrote:I think I will drop lowest priority delivery, and let all interrupts go to BSP.
The second problem is pounding the daylights out of the BSP due to not having any IRQ load balancing (which will cause increased IRQ latency). Worst case here is the BSP CPU alone can't keep up under heavy load and performance suffers badly. At a minimum (if you must use fixed delivery) you'd want to spread them around, so that the different devices send their IRQs to different CPUs.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: APIC Interrupts - Logical Destination Mode
Could be "dodgy" SMM code. OTOH, if two Intel machines (one is a portable core-duo, a Vaio, and the other is a stationary Acer with 4 real cores) doesn't work and four AMD machines have no such issues, chances are it is Intel related. Also, it is not the case that it works sometimes, rather it never works (no keyboard activity at all).Brendan wrote:It'd be nice to find the actual cause/s of these problems.
For the Intel machines, is it only the keyboard IRQ that causes problems (where the rest of the ISA IRQs work fine with "Lowest priority delivery")? Is the keyboard a real PS/2 device, or is it a USB device (are you relying on the "typically very dodgy" firmware/SMM code that emulates a PS/2 device)? Which chipset/s do these machines use, and have you checked the errata for those chipsets?
Regarding the ethernet card in the older 2-core AMD, it uses the IO-APIC, while the one that worked yesterday was MSI-based. I'm not 100% that it worked because of the new MSI stubs or because I turned off "lowest priority delivery". I'll have to test this on monday.Brendan wrote:For the AMD machine, are you saying "Lowest priority delivery" works for all IRQs except the ethernet controller? If so, I find it very unlikely that the IO APIC behaves differently for the ethernet controller's IRQs alone. Maybe the IO APIC has problems with all IRQs that happen very often. Maybe there's race conditions or re-entrancy issues in your IRQ handling that cause problems when an interrupt handler for a device is started when the same interrupt handler for the same device (for the previous IRQ) is still being processed (on the same CPU or on a different CPU). Maybe it's a problem that would effect any device that has multiple IRQs, and the ethernet card happens to be the only device that has multiple IRQs.
I read-up on Linux and PCI interrupt sharing, and it seems like my logic might be the problem. Apparantly, what Linux does is that it first checks & clears device-status, EOIs, and then checks device-status again. They also have a quick check if a device is responsible for the IRQ or not. That is a whole lot complex than what I do, especially the part checking device-status after EOI which needs a new, PCI-sharing IRQ stub. When I put the EOI at the start of the IRQ, it works even worse. I suppose I need to use a similar logic as Linux does. Additionally, Linux has a spinlock in IRQs to handle multi-CPU. I definitely don't want that though. There are also discussions about needing to read PCI-space before checking status. This is a real mess.
I could also test it with SATA, but I still have other problems with the SATA driver on AMD as well so that is not a good option right now. I have no Intel machine with IDE (the IDE driver is 100% stable with only IRQs).
After discovering that "lowest priority" removal does not fix the issue with IO-APIC PCI sharing, I've changed my mind.Brendan wrote:There's 2 problems there. The first problem is dropping lowest priority delivery - without lowest priority delivery you'd be interrupting important tasks instead of interrupting unimportant tasks (bad for the performance of important tasks); and it'd also make a mess of any sleep states (e.g. repeatedly taking the CPU out of C1/C2/C3 state and killing any power management while also increasing IRQ latency due to the time needed to take the CPU out of the C1/C2/C3 state, even when other CPUs aren't in sleep states).

But I don't think I will have issues with sleep states, since the BSP will never go beyond C1. I always stop the highest cores first when load is low, and never stop BSP.
I don't want that. I'll keep code for both versions and see what works in the end. Right now I need a new PCI IRQ stub that can handle shared IO-APIC interrupts.Brendan wrote:The second problem is pounding the daylights out of the BSP due to not having any IRQ load balancing (which will cause increased IRQ latency). Worst case here is the BSP CPU alone can't keep up under heavy load and performance suffers badly. At a minimum (if you must use fixed delivery) you'd want to spread them around, so that the different devices send their IRQs to different CPUs.
As for keyboard IRQ, and other ISA interrupts, I think they can be delivered to BSP with no problems. That would solve the Intel PS/2 issue. Chances are other ISA devices with edge-triggered interrupts have similar problems, so it is better to play it safe and deliver to BSP only.
Re: APIC Interrupts - Logical Destination Mode
Hi,
The correct solution would be to support USB devices properly, the same as any OS from a decade ago would. For example, find USB controllers, disable any legacy emulation, then check ACPI tables to see if a real PS/2 controller is actually present, and only then check if there's a real PS/2 keyboard or mouse (rather than an emulated PS/2 keyboard or mouse).
Cheers,
Brendan
The "PS/2 emulation for USB keyboard/mouse" feature is mostly intended for crappy OS's from the early 90's (e.g. DOS, Win95). I wouldn't be surprised that it fails to handle IO APICs (in fact I'd expect it to fail). I would be surprised if you've got an AMD machine where the firmware's "PS/2 emulation" SMM code actually does work for IO APIC, and even more surprised if it works for the (more complicated) "Lowest priority delivery" case.rdos wrote:Could be "dodgy" SMM code. OTOH, if two Intel machines (one is a portable core-duo, a Vaio, and the other is a stationary Acer with 4 real cores) doesn't work and four AMD machines have no such issues, chances are it is Intel related. Also, it is not the case that it works sometimes, rather it never works (no keyboard activity at all).Brendan wrote:It'd be nice to find the actual cause/s of these problems.
For the Intel machines, is it only the keyboard IRQ that causes problems (where the rest of the ISA IRQs work fine with "Lowest priority delivery")? Is the keyboard a real PS/2 device, or is it a USB device (are you relying on the "typically very dodgy" firmware/SMM code that emulates a PS/2 device)? Which chipset/s do these machines use, and have you checked the errata for those chipsets?
The correct solution would be to support USB devices properly, the same as any OS from a decade ago would. For example, find USB controllers, disable any legacy emulation, then check ACPI tables to see if a real PS/2 controller is actually present, and only then check if there's a real PS/2 keyboard or mouse (rather than an emulated PS/2 keyboard or mouse).
Ok - sounds like there's too many things complicating the issue to say what might be going wrong.rdos wrote:Regarding the ethernet card in the older 2-core AMD, it uses the IO-APIC, while the one that worked yesterday was MSI-based. I'm not 100% that it worked because of the new MSI stubs or because I turned off "lowest priority delivery". I'll have to test this on monday.Brendan wrote:For the AMD machine, are you saying "Lowest priority delivery" works for all IRQs except the ethernet controller? If so, I find it very unlikely that the IO APIC behaves differently for the ethernet controller's IRQs alone. Maybe the IO APIC has problems with all IRQs that happen very often. Maybe there's race conditions or re-entrancy issues in your IRQ handling that cause problems when an interrupt handler for a device is started when the same interrupt handler for the same device (for the previous IRQ) is still being processed (on the same CPU or on a different CPU). Maybe it's a problem that would effect any device that has multiple IRQs, and the ethernet card happens to be the only device that has multiple IRQs.
I read-up on Linux and PCI interrupt sharing, and it seems like my logic might be the problem. Apparantly, what Linux does is that it first checks & clears device-status, EOIs, and then checks device-status again. They also have a quick check if a device is responsible for the IRQ or not. That is a whole lot complex than what I do, especially the part checking device-status after EOI which needs a new, PCI-sharing IRQ stub. When I put the EOI at the start of the IRQ, it works even worse. I suppose I need to use a similar logic as Linux does. Additionally, Linux has a spinlock in IRQs to handle multi-CPU. I definitely don't want that though. There are also discussions about needing to read PCI-space before checking status. This is a real mess.
I could also test it with SATA, but I still have other problems with the SATA driver on AMD as well so that is not a good option right now. I have no Intel machine with IDE (the IDE driver is 100% stable with only IRQs).
In that case the problem shifts from "sleep states" to "thermal and acoustic management". We've discussed temperature and acoustics before though.rdos wrote:After discovering that "lowest priority" removal does not fix the issue with IO-APIC PCI sharing, I've changed my mind.Brendan wrote:There's 2 problems there. The first problem is dropping lowest priority delivery - without lowest priority delivery you'd be interrupting important tasks instead of interrupting unimportant tasks (bad for the performance of important tasks); and it'd also make a mess of any sleep states (e.g. repeatedly taking the CPU out of C1/C2/C3 state and killing any power management while also increasing IRQ latency due to the time needed to take the CPU out of the C1/C2/C3 state, even when other CPUs aren't in sleep states).![]()
But I don't think I will have issues with sleep states, since the BSP will never go beyond C1. I always stop the highest cores first when load is low, and never stop BSP.
Chances are that continuing to use the legacy "PS/2 emulation for USB keyboard/mouse" feature will continue to cause problems regardless of how you setup the IO APIC. It should work (for a limited definition of "work") if you use the PIC chips for the PS/2 controller; but using PIC chips for some IRQs and IO APICs for other IRQs is seriously ugly, and using PIC chips for everything is worse (especially for modern multi-core machines or machines with MSI).rdos wrote:I don't want that. I'll keep code for both versions and see what works in the end. Right now I need a new PCI IRQ stub that can handle shared IO-APIC interrupts.Brendan wrote:The second problem is pounding the daylights out of the BSP due to not having any IRQ load balancing (which will cause increased IRQ latency). Worst case here is the BSP CPU alone can't keep up under heavy load and performance suffers badly. At a minimum (if you must use fixed delivery) you'd want to spread them around, so that the different devices send their IRQs to different CPUs.
As for keyboard IRQ, and other ISA interrupts, I think they can be delivered to BSP with no problems. That would solve the Intel PS/2 issue. Chances are other ISA devices with edge-triggered interrupts have similar problems, so it is better to play it safe and deliver to BSP only.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: APIC Interrupts - Logical Destination Mode
I agree, except for one thing. The PS/2 interface is uncomplicated, always resides on IRQ 1, and thus works on most machines that have the physical connector. The same cannot be said about USB. USB only work on a minority of my current machines, and it didn't work on the Intel machine, which was why I looked for a PS/2 port, and found it. If you have no working keyboard, it is a little hard to debug more complex issues like non-working USBs. Especially when the network controller doesn't work either.Brendan wrote:The "PS/2 emulation for USB keyboard/mouse" feature is mostly intended for crappy OS's from the early 90's (e.g. DOS, Win95). I wouldn't be surprised that it fails to handle IO APICs (in fact I'd expect it to fail). I would be surprised if you've got an AMD machine where the firmware's "PS/2 emulation" SMM code actually does work for IO APIC, and even more surprised if it works for the (more complicated) "Lowest priority delivery" case.

USbs start looking like a real mess as well now. We have the UHCI and OHCI (1.0 and 1.1) interface which was well-defined and worked on most machines (at least if it is polled regularily rather than used with IRQs). Now we have EHCI, that sometimes works with companion controllers (this still works for keyboard-class devices), and EHCI without companion controllers that doesn't work unless you tweak the EHCI. On top of this mess there is USB 3 (XHCI?), which seems to make everything incompatible. I'm sure it would be ok once you have working USB 2 and 3 device-drivers, but until then it is just a mess.Brendan wrote:The correct solution would be to support USB devices properly, the same as any OS from a decade ago would. For example, find USB controllers, disable any legacy emulation, then check ACPI tables to see if a real PS/2 controller is actually present, and only then check if there's a real PS/2 keyboard or mouse (rather than an emulated PS/2 keyboard or mouse).
Doing a simple IRQ 1 interface is comparably a snap. Not only that, but we are expected to implement the HID specification, which is almost as complex as ACPI or a network stack. Just to get some key-strokes. It's ridiculous.
I'd expect it to work if you set it up with IO-APIC as an ISA device (with ISA edge triggering), and only deliver to BSP. That is what the PIC would do anyway, so the system shouldn't notice the difference.Brendan wrote:Chances are that continuing to use the legacy "PS/2 emulation for USB keyboard/mouse" feature will continue to cause problems regardless of how you setup the IO APIC. It should work (for a limited definition of "work") if you use the PIC chips for the PS/2 controller; but using PIC chips for some IRQs and IO APICs for other IRQs is seriously ugly, and using PIC chips for everything is worse (especially for modern multi-core machines or machines with MSI).
Re: APIC Interrupts - Logical Destination Mode
Update on this issue:
I solved this issue by assigning all ISA interrupts to BSP, and let PCI interrupts be delivered with lowest priority.
It seems like this works on almost all my machines. It works on all my AMD machines, and it works on Intel Atom, but it doesn't work on my Intel Core Duo. On the Intel Core Duo I've added an RTL8169 compatible network card. I know the card works with RDOS because I used it in another machine with AMD Athlon. I also know it works in Windows on the Intel Core Duo. I know the PCI card is asserting the interrupt (PCI status). I've also found out that the IO-APIC has dispatched the interrupt (both Remote IRR and Delivery Status in the IO-APIC are set). However, it doesn't seem like the interrupt is being serviced by any core. Delivery is set to logical destination, lowest priority. The destination field is FFh. The IRQ number is 0xA1.
Why is this a problem only on Core Duo and not on AMD or on dual core Intel Atom?
Edit: At the time the interrupt is dispatched, one of the cores is halted, while one is executing. No IRR or ISR is set in the local APIC of the executing core. The core that is halted has TPR = 0xFF, and it first executes cli and then hlt.
I solved this issue by assigning all ISA interrupts to BSP, and let PCI interrupts be delivered with lowest priority.
It seems like this works on almost all my machines. It works on all my AMD machines, and it works on Intel Atom, but it doesn't work on my Intel Core Duo. On the Intel Core Duo I've added an RTL8169 compatible network card. I know the card works with RDOS because I used it in another machine with AMD Athlon. I also know it works in Windows on the Intel Core Duo. I know the PCI card is asserting the interrupt (PCI status). I've also found out that the IO-APIC has dispatched the interrupt (both Remote IRR and Delivery Status in the IO-APIC are set). However, it doesn't seem like the interrupt is being serviced by any core. Delivery is set to logical destination, lowest priority. The destination field is FFh. The IRQ number is 0xA1.
Why is this a problem only on Core Duo and not on AMD or on dual core Intel Atom?
Edit: At the time the interrupt is dispatched, one of the cores is halted, while one is executing. No IRR or ISR is set in the local APIC of the executing core. The core that is halted has TPR = 0xFF, and it first executes cli and then hlt.
Re: APIC Interrupts - Logical Destination Mode
Hi,
You should be making sure that no interrupts can be sent to the CPU and doing a few other things too. This includes:
Bringing a CPU back online would be the reverse of this, with a few extra steps thrown in (e.g. reconfigure MTRRs before enabling caches in case the OS changed them while the CPU was offline, LGDT, LIDT, etc).
Cheers,
Brendan
CLI then HLT puts the CPU into a special "shutdown the CPU and wait for INIT-SIPI-SIPI startup sequence" state. If an interrupt is received it's blocked (CLI) and when you send the INIT-SIPI-SIPI startup sequence to start the CPU again the INIT causes the local APIC to be reset (where any pending interrupts are lost).rdos wrote:Why is this a problem only on Core Duo and not on AMD or on dual core Intel Atom?
Edit: At the time the interrupt is dispatched, one of the cores is halted, while one is executing. No IRR or ISR is set in the local APIC of the executing core. The core that is halted has TPR = 0xFF, and it first executes cli and then hlt.
You should be making sure that no interrupts can be sent to the CPU and doing a few other things too. This includes:
- Clearing the CPU's Logical Destination Register so it won't receive any IRQs sent with logical delivery (while making sure there's a least one CPU that will still be able to receive them)
- Reconfiguring any IRQ that uses fixed delivery so they are or sent to other CPU/s
- Making sure no other CPU will send any type of IPI to the CPU you're disabling
- Avoiding race conditions for all of the above (e.g. in case there's any pending interrupts that were sent before they were disabled, have a small delay with interrupts enabled to give the CPU a chance to handle any pending IRQs)
- Disable paging to flush TLB contents (some CPUs have bugs where stale TLB contents may be used after reset/init)
- Disable caches (e.g. in CR0)
- Do a WBINVD to flush anything left in caches (disabled CPUs may not respond to snoop traffic, and not disabling/flushing cache contents can cause corruption)
- Make sure A20 is enabled (you probably don't need to do this, or already did it)
- Consider switching back to real mode - it shouldn't be necessary (but can't hurt and might potentially avoid problems caused by CPU errata)
Bringing a CPU back online would be the reverse of this, with a few extra steps thrown in (e.g. reconfigure MTRRs before enabling caches in case the OS changed them while the CPU was offline, LGDT, LIDT, etc).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.