OSDev.org

The Place to Start for Operating System Developers
It is currently Tue Dec 06, 2022 5:56 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: x2apic losing interrupt after setting ISR around SMI
PostPosted: Fri Jul 22, 2022 2:20 am 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
Title was updated to reflect new understanding of the actual issue, see posts below for details.



On my laptop there are Fn key combos that can be used to adjust the screen backlight.

These keys work even when running my kernel, which doesn't have any idea of backlight and doesn't handle any ACPI interrupts. I'm guessing that a SMI is fired when the key combos are pressed and the FW would adjust the backlight in SMM.

However, the EOI my timer handler sends near the end of its execution sometimes gets lost around this process. Namely, if I keep pressing the key combos to adjust brightness up or down I can get it into this state.

Why do I believe that the EOI is lost?

1. The action of sending EOI is hardcoded in the timer handler, no way around it.

2. While the core seems frozen, after sending it an NMI from a different core, I can see that it is executing the idle loop (and the IF bit is set), but the bits for the timer interrupt vector are high in both the ISR and the IRR so no more timer interrupts can go through to that core.

3. What's more, if the NMI handler sends an EOI after seeing that the timer interrupt vector's bit is set in the ISR, the frozen core will recover and go back to normal operation.

Not familiar with SMI/SMM and also didn't find anything about SMI/SMM eating EOIs in the manual. So I'm wondering what would be some pointers to look further into this?


Last edited by xeyes on Tue Jul 26, 2022 11:54 pm, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: EOI lost (around SMI?)
PostPosted: Fri Jul 22, 2022 9:24 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1388
EOI to the PIC is a single command, right? It is unlikely that SMM would interfere with that.

When an SMI happens, the current state of execution is serialized (with the notable exception of the NMI gate), and when SMM is done executing, it uses the RSM instruction to return to the OS already running. It is of course possible that the SMM handler corrupts the serialized execution state. However, in that case most OSes would have a problem. But Windows would not, on account of it taking control of ACPI on startup.

It is possible that the SMM handler isn't tested very well. In that case you may get out of the problem by writing an ACPI driver. Have fun doing that!

Another possibility is that the PIC in your system doesn't work very well with multiple cores. In that case you may get out of it by writing an APIC driver. That is way easier than a full-fledged ACPI driver, you only need to read the static tables, no AML.

A third possibility is that you have a race condition around interrupt delivery and halting. No way to know without reading your source.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: EOI lost (around SMI?)
PostPosted: Sat Jul 23, 2022 1:13 pm 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
nullplan wrote:
EOI to the PIC is a single command, right? It is unlikely that SMM would interfere with that.


I'm not saying that SMM is interfering, but if I don't adjust the backlight brigtness this doesn't happen at all, so SMM/SMI seems related.

nullplan wrote:
Another possibility is that the PIC in your system doesn't work very well with multiple cores. In that case you may get out of it by writing an APIC driver. That is way easier than a full-fledged ACPI driver, you only need to read the static tables, no AML.


Did more experiments as below, x2apic is a smoking gun, but the whole thing is still a mystery.

a. switching to use xapic instead of x2 makes the issue no longer happen even if I try very hard at asjusting the backlight.

b. DMAR table is not opting out of x2.

c. setting up the interrupt remappers makes things worse, as in much easier to get into frozen state using backlight adjust key combos.

d. using high priority vector in the 0xF* range for timer interrupt makes things worse, as in all cores freeze together instead of just core 0.

e. 32b Linux does not use x2 with or without ACPI off on this machine, it doesn't have this problem.

f. 64b Linux uses x2 when it sees the DMAR table, it also sets up the interrupt remappers, and of course(?) it also does not have this problem either. One difference I saw from dmesg is that Linux uses clustered mode and I'm using physical dest mode.


Seems that x2 is not happy with my setup and sometimes gets confused by the events around SMM and my timer handler.


nullplan wrote:
A third possibility is that you have a race condition around interrupt delivery and halting. No way to know without reading your source.


Handler and idle loop code are both of the garden variety.

timer handler C code:

Code:
void handler(...)
{
    // bunch of house keeping stuff, doesn't IRET or call anything that won't return.

    send_eoi(); // writes 0 to the EOI regiser at offset B0

    // scheduler may not return
    if (time_to_schedule)
        choose_next_task(...);

    // NOTE: I thought the scheduler might be too slow?
    // but moving send_eoi() to points closer to various irets didn't help either.
}


Its ASM helper pushes registers, calls the C function, pops the registers on return and then irets.


idle loop C code:

Code:
do
{
    asm volatile("hlt");
}while(not_done);


What sort of race are you envisioning though?


Top
 Profile  
 
 Post subject: Re: x2apic losts EOI (around SMI?)
PostPosted: Sat Jul 23, 2022 1:54 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1388
xeyes wrote:
What sort of race are you envisioning though?
The classic problem is to have an interrupt between the last time you check that no interrupt occurred and actually halting. In that case, the interrupt will not wake up the CPU, as the halt hasn't started yet, and the CPU will appear to be frozen. However, your architecture does not appear to suffer from that problem, it seems pretty solid as far as I can tell.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: EOI lost (around SMI?)
PostPosted: Sat Jul 23, 2022 4:25 pm 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 4344
xeyes wrote:
f. 64b Linux uses x2 when it sees the DMAR table, it also sets up the interrupt remappers, and of course(?) it also does not have this problem either. One difference I saw from dmesg is that Linux uses clustered mode and I'm using physical dest mode.

Does the FADT say you must use clustered mode?


Top
 Profile  
 
 Post subject: Re: x2apic losts EOI (around SMI?)
PostPosted: Sat Jul 23, 2022 9:44 pm 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
nullplan wrote:
xeyes wrote:
What sort of race are you envisioning though?
The classic problem is to have an interrupt between the last time you check that no interrupt occurred and actually halting. In that case, the interrupt will not wake up the CPU, as the halt hasn't started yet, and the CPU will appear to be frozen. However, your architecture does not appear to suffer from that problem, it seems pretty solid as far as I can tell.


Ah, racing with the wake up event is a classic way to get stuck. It probably can't happen here as the idle loop can't be blocked.

This does make me wonder whether the pic lost the EOI, or the interrupt itself? Per the manual, pic sets the ISR bit before it dispatches the interrupt to the core, so it is not exactly atomic and maybe pic forgets about actually dispatching it after a SMM session?

However, it is probably part of the core, sounds super unlikely for it to have such obvious bugs.

Octocontrabass wrote:
xeyes wrote:
f. 64b Linux uses x2 when it sees the DMAR table, it also sets up the interrupt remappers, and of course(?) it also does not have this problem either. One difference I saw from dmesg is that Linux uses clustered mode and I'm using physical dest mode.

Does the FADT say you must use clustered mode?


Bit 18 of the features flag? It's not set, and I think it means that when using logical mode, don't use the flat one, and doesn't precludes the usage of physical mode?


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Wed Jul 27, 2022 12:00 am 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
Tried a few more things.

1. What's lost is likely the interrupt itself, not EOI.

Found out by switching to TSC deadline mode of the timer. In this mode, the NMI needs to not only EOI but also re-arm the timer for the timer handler to recover. Thus it is likely that the ISR bit is set but the timer handler wasn't invoked. Otherwise the timer handler would have re-armed the timer already.


2. The threshold for the high priority vector is 0x76 but it doesn't make sense.

Vector 0x75 or below causes core 0 to lose intrrupt, vector 0x76 or above causes all cores to freeze when the keys are pressed. My kernel doesn't use anything near these numbers so can't think of any particular reasons for these 2 numbers to be special.


3. The issue doesn't seem related to the timer or clustered mode (or not) either.

Using HPET to interrupt the timer vector causes the exact same problem.
Switching to clustered logical mode didn't help either, with or without interrupt remapping.


Then I decided to give ACPICA a try. It more or less works, printing things like "Transition to ACPI mode successful" and can shutdown the computer.

But it made the Fn key combos ineffective (can't change brightness anymore), maybe the SMI isn't happening anymore, or maybe bios no longer does real work even if it gets SMI.


Maybe there are just some incompatibility with x2apic, the display driver (key combo to switch to external monitor is also effective without ACPICA, and can also result in core 0 freeze) in FW, and how I'm setting something up :? :?


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Wed Jul 27, 2022 8:59 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1388
xeyes wrote:
But it made the Fn key combos ineffective (can't change brightness anymore), maybe the SMI isn't happening anymore, or maybe bios no longer does real work even if it gets SMI.
Well, obviously. You switched to ACPI mode, so you told the firmware that you now want to handle the button presses. This likely made the firmware change those interrupts from SMI to NMI or normal event. You now have to check ACPI for the event block that tells you how to tell if one of these buttons was pressed, and check it for the backlight device to tell how to set the brightness. And you need to connect the two things yourself (i.e. handle the "brightness up" button press event in a way that leads to an increased backlight brightness). In Linux, this goes all the way to userspace. The ACPI event generates a message to a certain netlink group, which something like "acpid" will catch and handle in a user-defined way, typically with a shell script that reads out the brightness setting and increases it by some amount.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Wed Jul 27, 2022 10:50 am 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 4344
xeyes wrote:
2. The threshold for the high priority vector is 0x76 but it doesn't make sense.

Vector 0x75 or below causes core 0 to lose intrrupt, vector 0x76 or above causes all cores to freeze when the keys are pressed. My kernel doesn't use anything near these numbers so can't think of any particular reasons for these 2 numbers to be special.

Those are the default vectors for ISA IRQ13 and IRQ14. Did you relocate the legacy PICs to different vectors, or just mask them?

ACPI has a _PIC method you might need to use.


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Thu Jul 28, 2022 9:36 pm 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
Octocontrabass wrote:
xeyes wrote:
2. The threshold for the high priority vector is 0x76 but it doesn't make sense.

Vector 0x75 or below causes core 0 to lose intrrupt, vector 0x76 or above causes all cores to freeze when the keys are pressed. My kernel doesn't use anything near these numbers so can't think of any particular reasons for these 2 numbers to be special.

Those are the default vectors for ISA IRQ13 and IRQ14. Did you relocate the legacy PICs to different vectors, or just mask them?

ACPI has a _PIC method you might need to use.


They are remapped away. The 2 vectores themselves are not special, others below or above them causes the same issue as well.

What seems special is vector 75.8, or the gap between the two, which causes the machine to behave very differently once crossed :(


_PIC didn't seem to change either how the interrupts work, interrupts are working with or without it, interrupt that is lost is still lost. I get that it only sets 1 flag in the AML space? Didn't seem to write any port/register/address or talk to EC AFAIK by looking at what ACPICA is doing.


Also noticed that after the detour to ACPI and _BCM, the issue itself is still there, when I call _BCM, I can also cause x2apic to lose interrupt, just like how BIOS did it. _BCM seems to only issue 2 IO port writes (always to the same port, using the same value, regardless of level setting).

So I'm now confused about not only how _BCM caused the interrupt to be lost but also how it works. Maybe the writes are doorbells that wake up FW to look at some temporary values stored in AML space, before the FW goes on to talk to the backlight/GPU the same way as in a SMI?


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Thu Jul 28, 2022 11:00 pm 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 4344
xeyes wrote:
_PIC didn't seem to change either how the interrupts work, interrupts are working with or without it, interrupt that is lost is still lost. I get that it only sets 1 flag in the AML space?

In theory, the firmware running in SMM could read that flag. Looks like it isn't doing that here, or at least not in a way that would fix the problem.

xeyes wrote:
_BCM seems to only issue 2 IO port writes (always to the same port, using the same value, regardless of level setting).

Is it an Intel chipset? Is it port 0xB2? It sounds like writing that port triggers SMI.


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Sun Jul 31, 2022 2:26 am 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
Octocontrabass wrote:
xeyes wrote:
_PIC didn't seem to change either how the interrupts work, interrupts are working with or without it, interrupt that is lost is still lost. I get that it only sets 1 flag in the AML space?

In theory, the firmware running in SMM could read that flag. Looks like it isn't doing that here, or at least not in a way that would fix the problem.
xeyes wrote:

xeyes wrote:
_BCM seems to only issue 2 IO port writes (always to the same port, using the same value, regardless of level setting).

Is it an Intel chipset? Is it port 0xB2? It sounds like writing that port triggers SMI.


Wow that's a very accurate guess! It's a 7 series (ivy bridge) chipset and writes F5 to B2 during _BCM. Does this point to anything though?

I added experimental support for long mode (proudly supporting 4GB linear and 4GB physical address space) as it seems odd that 32bit Linux doesn't enable x2apic on this machine. But again it didn't seem to change anything and both backlight adjusting SMI and _BCM still have a high chance of sending core 0 into the ISR bit set but interrupt handler didn't run state.

:( Running out of ideas here, maybe I should just use another core as a watchdog to nudge core 0 as needed.


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Sun Jul 31, 2022 1:01 pm 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 4344
xeyes wrote:
Wow that's a very accurate guess! It's a 7 series (ivy bridge) chipset and writes F5 to B2 during _BCM. Does this point to anything though?

It confirms that SMI is responsible for the lost interrupt.

xeyes wrote:
:( Running out of ideas here, maybe I should just use another core as a watchdog to nudge core 0 as needed.

There must be something you're doing that's different from what Linux does; otherwise Linux would have the same issue. Which timer does Linux use? How is it configured?


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Mon Aug 01, 2022 8:05 pm 
Offline
Member
Member

Joined: Mon Dec 07, 2020 8:09 am
Posts: 212
Octocontrabass wrote:
xeyes wrote:
Wow that's a very accurate guess! It's a 7 series (ivy bridge) chipset and writes F5 to B2 during _BCM. Does this point to anything though?

It confirms that SMI is responsible for the lost interrupt.

xeyes wrote:
:( Running out of ideas here, maybe I should just use another core as a watchdog to nudge core 0 as needed.

There must be something you're doing that's different from what Linux does; otherwise Linux would have the same issue. Which timer does Linux use? How is it configured?


:lol: I'm sure that there must be many things that are set up differently. Don't know a good way to tell on real hardware, but I've seen Linux using the tsc deadline mode of the apic timer in virtual machines.

In this case though, the issue is not specific to a timer or timers but interrupts in general. Tried HPET previously and its interrupts can also get lost. I even coerced HDA into sending periodical interrupts to trigger the timer interrupt handler, and again its interrupts face the same issue. So nothing special about timer interrupts, they just happen frequent enough and were the easiest to be affected/noticed.


Top
 Profile  
 
 Post subject: Re: x2apic losing interrupt after setting ISR around SMI
PostPosted: Tue Aug 02, 2022 12:50 pm 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 4344
If the problem is indeed the APIC configuration, you can use something like msr-tools in Linux to compare the x2APIC registers against your OS.

I can't imagine what else it could be if it's not the APIC.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: SemrushBot [Bot] and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group