OSDev.org

The Place to Start for Operating System Developers
It is currently Wed Oct 28, 2020 7:36 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: Simultaneous IPIs to same target
PostPosted: Fri Jul 15, 2011 2:45 pm 
Offline
Member
Member
User avatar

Joined: Fri Oct 03, 2008 4:13 am
Posts: 142
Location: Ogre, Latvia, EU
Hello, OSDevers!

I've run into a question: What should happen when (in my case) 3 CPUs/cores sends an IPI to the same LAPIC and int. vector simultaneously (or almost simultaneously)? To my understanding, target CPU should receive (and call corresponding interrupt handler) 3 times in a row. My kernel, however, disagrees and I receive only one interrupt. Tried it on QEMU and real quad-core computer, results are the same. When I add a slight (around 1ms, different for each core) delay, problem disappears.

Basically it's an AP waking process - BSP sets up trampoline environment, sends usual INIT-SIPI-SIPI (in my case INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI) and then APs rush to the Long Mode. Once they are there and initialized things enough, they send an IPI back to BSP, telling "I'm alive". BSP is waiting for those IPIs and, when all 3 of them are received, proceeds to clean up trampoline and further to the scheduler.

AFAIK, other cores wakes and are set up correctly: they are able to print their LAPIC IDs, have unique stacks, can use their own LAPIC timers (to add delay I mentioned before). It looks like a race condition, so I tried to protect the whole IPI sending routine with spinlocks. No effect.

Of course I could come up with different way to detect when my APs are up: spin on (spinlock protected) variable, use a delay on BSP or keep those debugging delays on APs. But my current approach raises a few questions - is my assumption of 3 interrupts in-a-row wrong and those IPIs are somehow aggregated into one? Or there's a bug somewhere in my code?

Have read Intel's manuals about APIC couple of times, still no clue. Please advise.

_________________
If something looks overcomplicated, most likely it is.


Top
 Profile  
 
 Post subject: Re: Simultaneous IPIs to same target
PostPosted: Fri Jul 15, 2011 5:42 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

Velko wrote:
I've run into a question: What should happen when (in my case) 3 CPUs/cores sends an IPI to the same LAPIC and int. vector simultaneously (or almost simultaneously)? To my understanding, target CPU should receive (and call corresponding interrupt handler) 3 times in a row.


You could get anywhere from one to 3 interrupts, depending on exact timing.

For one IPI, think of it as 4 steps:
  • Local APIC receives the IPI and sets the corresponding flag in its "Interrupt Received Register", then
  • Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
  • Local APIC handles the interrupt, by:
    • Sending the interrupt to the CPU core
    • Clearing the flag in its "Interrupt Received Register"
    • Setting the flag in its "In Service Register"
  • CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)

Now consider what happens if more interrupts (for the same vector) are received before the flag in the "Interrupt Received Register" is cleared:
  • Local APIC receives the first IPI and sets the corresponding flag in its "Interrupt Received Register", then
  • Local APIC receives the second IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
  • Local APIC receives the third IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
  • Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
  • Local APIC handles the interrupt, by:
    • Sending the interrupt to the CPU core
    • Clearing the flag in its "Interrupt Received Register"
    • Setting the flag in its "In Service Register"
  • CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)

In this case you only get one interrupt.

Now consider what happens if more interrupts (for the same vector) are received before the flag in the "In Service Register" is cleared:
  • Local APIC receives the first IPI and sets the corresponding flag in its "Interrupt Received Register", then
  • Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
  • Local APIC handles the interrupt, by:
    • Sending the interrupt to the CPU core
    • Clearing the flag in its "Interrupt Received Register"
    • Setting the flag in its "In Service Register"
  • Local APIC receives the second IPI and sets the corresponding flag in its "Interrupt Received Register", then
  • Local APIC receives the third IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
  • CPU sends EOI to local APIC, which clears the flag in its "In Service Register"
  • Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
  • Local APIC handles the interrupt, by:
    • Sending the interrupt to the CPU core
    • Clearing the flag in its "Interrupt Received Register"
    • Setting the flag in its "In Service Register"
  • CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)

In this case you get 2 interrupts.


Velko wrote:
Basically it's an AP waking process - BSP sets up trampoline environment, sends usual INIT-SIPI-SIPI (in my case INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI) and then APs rush to the Long Mode.


The extra INIT IPIs and the extra SIPI IPIs are a waste of time and probably do more harm than good. Also, for a lot of CPUs the second SIPI isn't needed and they begin executing instructions after the first SIPI. This can lead to problems. For example, if the AP increments a "number of CPUs started" counter, then it could increment this counter after receiving the first SIPI and then increment it again after receiving the second SIPI (then increment it again after the third SIPI) and you end up thinking there's more CPUs than there are.

Velko wrote:
Once they are there and initialized things enough, they send an IPI back to BSP, telling "I'm alive". BSP is waiting for those IPIs and, when all 3 of them are received, proceeds to clean up trampoline and further to the scheduler.


You're saying that the timing is so exact that the BSP only receives one IPI. The only way that is possible is if you're broadcasting the "INIT-SIPI-SIPI" sequence to all CPUs at the same time. DO NOT broadcast the "INIT-SIPI-SIPI" sequence to all CPUs at the same time - it causes all CPUs, including CPUs that the user disabled (typical for CPUs with hyper-threading where the user disabled hyper-threading in the BIOS) and faulty CPUs that failed testing to be started, and is therefore wrong and dodgy (unless you're writing firmware and not an OS). You must only attempt to start CPUs that the firmware listed in the ACPI "APIC" table or the "MultiProcessor Specification" table (and not any others that might be present but aren't listed); and the only way to do that is to send the "INIT-SIPI-SIPI" sequence to each CPU separately.

The correct way to do it is something like:
Code:
    for(each AP mentioned by BIOS) {
        AP_status = NOT_STARTED;
        send_INIT_to_AP();
        wait(10ms);
        send_SIPI_to_AP();
        timeout_remaining = 5ms;
        while( timeout_remaining > 0) {
            if(AP_status == STARTED) goto started;
        }
        send_SIPI_to_AP();
        timeout_remaining = 10ms;
        while( timeout_remaining > 0) {
            if(AP_status == STARTED) goto started;
        }
        printf("AP CPU failed to start\n");
        continue;

started:
        AP_status = ACKNOWLEDGED;
    }


The AP CPUs would do something like:
Code:
AP_init () {
    AP_status = STARTED;
    while(AP_status != ACKNOWLEDGED) { /* Do nothing */ }

    /* Start CPU initialisation here */


In this pseudo-code, "AP_status" would be a volatile variable that must be used atomically, and the "timeout_remaining" thing would be something that is decreased over time (e.g. maybe the local APIC timer's "current count" register or something).

Finally; if each CPU takes 11 ms to start and you've got 127 CPUs to start, then it'd take 1397 ms to start all of them. That's a significant increase in boot times. This can be improved a lot by doing it in parallel. For example, the first CPU could start the second CPU; then the first and second CPUs could start the third and fourth CPUs; then all 4 started CPUs could start 4 more CPUs; then 8 CPUs start 8 more CPUs, etc. For parallel startup, if each CPU takes 11 ms to start and you've got 127 CPUs to start, then it'd take 77 ms to start all of them. Of course once you start looking at lots of CPUs you'd also need to consider supporting x2APIC.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Simultaneous IPIs to same target
PostPosted: Sat Jul 16, 2011 6:19 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 03, 2008 4:13 am
Posts: 142
Location: Ogre, Latvia, EU
Thanks for detailed explanation, Brendan!

Quote:
# Local APIC receives the first IPI and sets the corresponding flag in its "Interrupt Received Register", then
# Local APIC receives the second IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
# Local APIC receives the third IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then

Well, it settles that. I thought that Local APIC will not accept second and third IPIs until it's not done with first one. I understand now.

Quote:
The extra INIT IPIs and the extra SIPI IPIs are a waste of time and probably do more harm than good.
...
DO NOT broadcast the "INIT-SIPI-SIPI" sequence to all CPUs at the same time

I guess, I did not made myself clear, what "INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI" sequence means. Turns out it is not that "usual" after all :D I am not broadcasting INIT-SIPI-SIPI. I am, however, firing them at each AP in rapid succession.

Pseudocode:
Code:
foreach(AP in BIOS) {
        send_INIT($AP);
}
wait(10ms);
foreach(AP in BIOS) {
        send_SIPI($AP);
}
wait_For_Woke_IPIs_Or_TimeOut();
/* no second SIPI */
cleanupTrampoline();

APs then wakes up (almost) simultaneously, on their way runs into some spinlocks (which probably synchronizes them even more) and finally sends Woke_IPI back to BSP.

That was my idea on improving startup times - why wait, if you can fire some more IPIs at that time :). Seems to work fine, except for that Woke_IPI thing. But now, when I know what causes it, it's not that hard to work around. Also, I should probably implement an array of AP_status or something to see if second SIPI is needed.

But if You think, my AP waking sequence is not such a good idea, I'll revert back to starting them one-by-one.

Thanks again,
Velko

_________________
If something looks overcomplicated, most likely it is.


Top
 Profile  
 
 Post subject: Re: Simultaneous IPIs to same target
PostPosted: Mon Jul 18, 2011 11:18 am 
Offline
Member
Member
User avatar

Joined: Mon Jul 28, 2008 9:46 am
Posts: 310
Location: Ontario, Canada
Velko wrote:
Pseudocode:
Code:
foreach(AP in BIOS) {
        send_INIT($AP);
}
wait(10ms);
foreach(AP in BIOS) {
        send_SIPI($AP);
}
wait_For_Woke_IPIs_Or_TimeOut();
/* no second SIPI */
cleanupTrampoline();



That works quite well and cuts the bootup time by a good amount (Especially for a system with 16 cores). Pure64 would send out the INIT IPI to the first core, wait 10 ms, send the SIPI IPI, wait 2 ms, and then repeat for each of the other AP's. That time adds up for multiple cores and you could see the pause during bootup. I have adopted the method that you detailed above with no issues. Thanks for posting this.

-Ian

_________________
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], crosssans and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group