Hi,
Velko wrote:
I've run into a question: What should happen when (in my case) 3 CPUs/cores sends an IPI to the same LAPIC and int. vector simultaneously (or almost simultaneously)? To my understanding, target CPU should receive (and call corresponding interrupt handler) 3 times in a row.
You could get anywhere from one to 3 interrupts, depending on exact timing.
For one IPI, think of it as 4 steps:
- Local APIC receives the IPI and sets the corresponding flag in its "Interrupt Received Register", then
- Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
- Local APIC handles the interrupt, by:
- Sending the interrupt to the CPU core
- Clearing the flag in its "Interrupt Received Register"
- Setting the flag in its "In Service Register"
- CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)
Now consider what happens if more interrupts (for the same vector) are received before the flag in the "Interrupt Received Register" is cleared:
- Local APIC receives the first IPI and sets the corresponding flag in its "Interrupt Received Register", then
- Local APIC receives the second IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
- Local APIC receives the third IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
- Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
- Local APIC handles the interrupt, by:
- Sending the interrupt to the CPU core
- Clearing the flag in its "Interrupt Received Register"
- Setting the flag in its "In Service Register"
- CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)
In this case you only get one interrupt.
Now consider what happens if more interrupts (for the same vector) are received before the flag in the "In Service Register" is cleared:
- Local APIC receives the first IPI and sets the corresponding flag in its "Interrupt Received Register", then
- Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
- Local APIC handles the interrupt, by:
- Sending the interrupt to the CPU core
- Clearing the flag in its "Interrupt Received Register"
- Setting the flag in its "In Service Register"
- Local APIC receives the second IPI and sets the corresponding flag in its "Interrupt Received Register", then
- Local APIC receives the third IPI and sets the corresponding flag in its "Interrupt Received Register" (but it's already set), then
- CPU sends EOI to local APIC, which clears the flag in its "In Service Register"
- Local APIC searches for the highest priority set flag in its "Interrupt Received Register", then
- Local APIC handles the interrupt, by:
- Sending the interrupt to the CPU core
- Clearing the flag in its "Interrupt Received Register"
- Setting the flag in its "In Service Register"
- CPU sends EOI to local APIC, which clears the flag in its "In Service Register" (and causes it to check for highest priority set flag in its "Interrupt Received Register" again)
In this case you get 2 interrupts.
Velko wrote:
Basically it's an AP waking process - BSP sets up trampoline environment, sends usual INIT-SIPI-SIPI (in my case INIT1-INIT2-INIT3-delay-SIPI1-SIPI2-SIPI3, no second SIPI) and then APs rush to the Long Mode.
The extra INIT IPIs and the extra SIPI IPIs are a waste of time and probably do more harm than good. Also, for a lot of CPUs the second SIPI isn't needed and they begin executing instructions after the first SIPI. This can lead to problems. For example, if the AP increments a "number of CPUs started" counter, then it could increment this counter after receiving the first SIPI and then increment it again after receiving the second SIPI (then increment it again after the third SIPI) and you end up thinking there's more CPUs than there are.
Velko wrote:
Once they are there and initialized things enough, they send an IPI back to BSP, telling "I'm alive". BSP is waiting for those IPIs and, when all 3 of them are received, proceeds to clean up trampoline and further to the scheduler.
You're saying that the timing is so exact that the BSP only receives one IPI. The only way that is possible is if you're broadcasting the "INIT-SIPI-SIPI" sequence to all CPUs at the same time.
DO NOT broadcast the "INIT-SIPI-SIPI" sequence to all CPUs at the same time - it causes all CPUs, including CPUs that the user disabled (typical for CPUs with hyper-threading where the user disabled hyper-threading in the BIOS) and faulty CPUs that failed testing to be started, and is therefore wrong and dodgy (unless you're writing firmware and not an OS). You must only attempt to start CPUs that the firmware listed in the ACPI "APIC" table or the "MultiProcessor Specification" table (and not any others that might be present but aren't listed); and the only way to do that is to send the "INIT-SIPI-SIPI" sequence to each CPU separately.
The correct way to do it is something like:
Code:
for(each AP mentioned by BIOS) {
AP_status = NOT_STARTED;
send_INIT_to_AP();
wait(10ms);
send_SIPI_to_AP();
timeout_remaining = 5ms;
while( timeout_remaining > 0) {
if(AP_status == STARTED) goto started;
}
send_SIPI_to_AP();
timeout_remaining = 10ms;
while( timeout_remaining > 0) {
if(AP_status == STARTED) goto started;
}
printf("AP CPU failed to start\n");
continue;
started:
AP_status = ACKNOWLEDGED;
}
The AP CPUs would do something like:
Code:
AP_init () {
AP_status = STARTED;
while(AP_status != ACKNOWLEDGED) { /* Do nothing */ }
/* Start CPU initialisation here */
In this pseudo-code, "AP_status" would be a volatile variable that must be used atomically, and the "timeout_remaining" thing would be something that is decreased over time (e.g. maybe the local APIC timer's "current count" register or something).
Finally; if each CPU takes 11 ms to start and you've got 127 CPUs to start, then it'd take 1397 ms to start all of them. That's a significant increase in boot times. This can be improved a lot by doing it in parallel. For example, the first CPU could start the second CPU; then the first and second CPUs could start the third and fourth CPUs; then all 4 started CPUs could start 4 more CPUs; then 8 CPUs start 8 more CPUs, etc. For parallel startup, if each CPU takes 11 ms to start and you've got 127 CPUs to start, then it'd take 77 ms to start all of them. Of course once you start looking at lots of CPUs you'd also need to consider supporting x2APIC.
Cheers,
Brendan