Now I've corrected it already. Then I inserted 'EA 00 01 00 90' that means 'jmp 9000h:0100h' to address 0x90000, which is APs' start-up address. The address 9000h:0100h contains the code to initialize APs. But the CPU reset - APs received SIPI, but failed to execute the initial code. Is there something wrong? Or should I load another module for APs to execute?
Thanks for any help.
Real mode 9000h:0100h is not (32-bit physical) address 0x00090000, it's 0x00090100.
Typically you put some "trampoline code and data" at an address that can be put in the "vector" field of the SIPI where that code is designed for "CS = address >> 4". For example, you might use "vector = 0x90", copy your "trampoline code and data" to 0x00090000, and design that "trampoline code and data" to run with "CS = 0x9000". You don't need any JMP.
Typically that "trampoline code and data" might do something like (for plain 32-bit paging, untested):
mov byte [cs:0x0800],1 ;Set flag to tell BSP that the CPU did start
cmp byte [cs:0x0800],2 ;Has BSP acknowledged that the CPU has started?
jne .wait ; no, wait until it does
mov eax,0x80000001 ;eax = value for CR0
mov bx,0x0010 ;bx = value for data segments
xor cx,cx ;cx = zero
mov edx,[cs:0x0804] ;edx = value to load into CR3
mov esp,[cs:0x0808] ;Set ESP to whatever address BSP allocated for this CPU's stack
lgdt [cs:0x0120] ;Load GDT
mov cr3,edx ;Load page directory
mov cr0,eax ;Enable protected mode and paging
mov ss,bx ;Set SS to "big flat data"
mov ds,bx ;Set DS to "big flat data"
mov es,bx ;Set ES to "big flat data"
mov fs,cx ;Set FS to "NULL"
mov gs,cx ;Set GS to "NULL"
jmp far [dword cs:0x0810] ;Jump to kernel's entry point
The BSP would prepare the data in the trampoline's code (including clearing the flag used for synchronisation, allocating a stack for the CPU to use, etc); then it'd send the INIT-SIPI-SIPI to the AP CPU while monitoring the flag used for synchronisation (so it knows if/when the AP CPU starts).
Note that the value for CR3 may be a special value that is only used by trampolines (where the trampoline's code and data is identity mapped).
After the AP CPU enters the kernel, code in the kernel might set the flag used for synchronisation to 3 to tell the BSP that the trampoline is no longer being used by the AP CPU, and allow the BSP to recycle the trampoline for the next CPU or free any temporary stuff (e.g. the page directory it created to identity map the trampoline and the page used for the trampoline itself).
Also note that keeping the data in the trampoline (and using CS to access it) allows you to have 2 or more independent "trampoline pages" and start 2 or more CPUs at the same time (without them getting the wrong stack, or interfering with each other's synchronisation flag, etc).