nullplan wrote:
Yeah, I remember now that bzt talked about BOOTBOOT having multicore support before. Essentially, BOOTBOOT will boot all available CPUs so you don't have to. The MADT requires no AML, so it is indeed reasonable for a bootloader to parse it.
Certainly, but the AP will end up as an additional core available to the scheduler of a given kernel. You cannot fix up that stuff in the bootloader only. Therefore, if you boot cores in the bootloader rather than in the kernel, then you need to temporarily stop the APs at some stage and then let them continue so they receive their correct configuration that the kernel & scheduler requires. That's why I find it more practical to let the kernel boot AP cores and let them get their final setup in one step only.
nullplan wrote:
Because speed. Let's name the actors in this play the booter and the bootee. While the booter is waiting for signs of life from the bootee, it isn't doing anything productive. In most systems I have seen that use this mechanism, only the BSP will be a booter, an the APs will be blocked in the scheduler until something pushes tasks into the queue. That means, they aren't doing anything productive. As you can see from bzt's most recent reply, architecturally there is no reason why there can't be 65535 CPUs. If the BSP is starting all APs sequentially, and the APs don't do anything, they are going to be here a while.
Actually, 65535 is beyond my absolute task limit (32767), so the function creating the AP starter tasks should also be in its own task. Then it can block while waiting for a slot in the task queue to open up. I might have to revisit that design decision (having an absolute task limit) some time. Having more CPUs than tasks is probably not terribly sensible, so i might have to limit those.
I agree to that. I probably should move the AP initialization to a thread instead. That way it won't affect startup time, and can still be performed in a sequence.
nullplan wrote:
I fail to see the point. Unless your goal is to keep running in 16-bit real mode, the GDT required as part of the trampoline will be different from the final GDT loaded in AP startup code, if for no other reason, by its address. The IDT makes no sense unless you are operating in the correct mode, but by that point, you will no longer need stuff to be in low memory to reference it. CR3, OK, that one is needed. CR4 less so, that can be done in the AP landing pad. It would also break encapsulation of the trampoline, since it is not the trampoline's place to tell you whether you should set, say, the WP bit. You should, but the trampoline is the wrong place to do it in. In 64-bit mode, you should also enable SYSCALL, yet the trampoline won't do that for you, either.
One way or another you must set the correct bits in the control registers in each AP core, and doing this at initialization time is a good place. Since I map my kernel high, I must let the AP core load CR3 and enable paging before it can jump to kernel code and initialize itself from there. There is a "boot-time" GDT, but the "real" GDT & IDT can just as well be loaded before you get into real kernel code, if not because it must know the base & size of the device-driver it will start executing code in. Maybe not an issue in a flat kernel where you can use the same flat selector in both the bootstrap & kernel, but it is an issue for me. As for long mode, you can use the same GDT for protected mode & long mode, but the IDT will be different. Although you can still load the 64-bit IDT in the boot-loader as long as you avoid enabling interrupts.
This is how the long mode monitor is initialized (by copying different code to 0x1400). Since the assembler doesn't support 64-bit code, the long mode part of the code is implemented with dbs:
Code:
; this code is loaded at 01400. It should contain no near jumps!
rt_start:
xor ax,ax
mov ds,ax
mov fs,ax
mov gs,ax
;
mov ax,18h
mov ss,ax
mov esp,OFFSET rt_end - rt_start + 1400h + 10h
;
mov ax,20h
mov es,ax
;
mov eax,12345678h
xchg eax,es:ap_cr4
or al,20h
mov cr4,eax
;
mov ecx,IA32_EFER
rdmsr
or eax,101h
wrmsr
;
mov eax,es:ap_cr3
mov cr3,eax
;
mov eax,cr0
or eax,80010008h
mov cr0,eax
;
mov edx,es:ap_stack_offset
;
xor ax,ax
mov es,ax
;
db 0EAh
dw OFFSET rt_init64 - rt_start + 1400h
dw 28h
rt_init64:
db 48h ; mov rbx,0FFFFFF8000201000h
db 0BBh
dd 201000h
dd 0FFFFFF80h
;
db 48h ; mov rsp,rbx
db 8Bh
db 0E3h
;
db 48h ; mov rbx,0FFFFFF8000000000h
db 0BBh
dd 0
dd 0FFFFFF80h
;
db 89h ; mov [rbx+10],edx
db 53h
db 0Ah
;
db 48h ; mov rax,[rbx+2]
db 8Bh
db 43h
db 2
;
db 48h ; add rbx,rax
db 3
db 0D8h
;
db 53h ; push rbx
;
db 0C3h ; ret
rt_end: