Usually you want a stack switch. Here is a simple model: Every task gets its own kernel stack. On task switch, the new task's kernel stack top is written into the current CPU's TSS's ESP0. No task is ever suspended in user mode, they all go to kernel mode sooner or later, and if it is by force.
The first level interrupt handlers only save the volatile registers (EAX, ECX, and EDX), as well as the segment registers, before switching the segments over to kernel mode. ESP is saved by the CPU, and EBP, EBX, EDI, and ESI are callee saved registers, according to the ABI.
Before returning to userspace, all interrupt handlers (if they are returning to userspace) and the system call handler check for a task flag that says the task should be scheduled out. If set, they all call a function to do that. The timer interrupt then only needs to set that flag. The function to switch tasks then only needs to save the nonvolatile registers (EBX, EBP, EDI, and ESI), switch stack to the next task's kernel stack, restore registers and return.
Code:
struct task {
[...]
unsigned task_flags;
void *kstack_top;
void *kstack_bot;
[...]
};
extern void raw_switch_task(struct task *current, struct task *next);
Code:
switch_task:
pushl %ebp
movl %esp,%ebp
subl $12, %esp
movl %ebx, -4(%ebp)
movl %esi, -8(%ebp)
movl %edi, -12(%ebp)
movl 8(%ebp), %eax
movl 12(%ebp), %ecx
movl %esp, KSTACK_BOT(%eax)
movl KSTACK_BOT(%ecx),%esp
movl KSTACK_TOP(%ecx),%edx
/* now somehow get %edx into current_tss.esp0. I can do that like this: */
movl %edx, %gs:CPU_TSS+TSS_ESP0
pushl %ecx
call sched_set_current_task
addl $4,%esp
movl (%esp),%edi
movl 4(%esp),%esi
movl 8(%esp),%ebx
movl 12(%esp),%ebp
addl $16, %esp
ret
Code:
void switch_task(struct task *current, struct task *next) {
current->task_flags &= ~TI_TIMEOUT;
raw_switch_task(current, next);
/* when we get here, another task switched to "current". */
}
Bit of a head-bender, but it means the volatile registers are only saved close to the top of the kernel stack (once), then there might be several call frames and maybe even an interrupt frame, and then there are the nonvolatile registers, and that's how all suspended tasks look. You might notice that this means you can create a new task by filling the new kernel stack with convenient values for "restoration".
Code:
int new_kernel_task(void (*main)(void*), void *arg) {
struct task *new = ...
[...]
uint32_t *ks = new->kstack_top;
ks -= 7;
new->kstack_bot = ks;
ks[3] = 0; /* mark stack frame as initial with 0 EBP */
ks[4] = (uint32_t)main;
ks[5] = (uint32_t)kernel_task_exit;
ks[6] = (uint32_t)arg;
}
This means, the "ret" in raw_switch_task would jump to the given entry point, with ESP pointing to the start of the kernel task cleanup function, and the argument on stack.
Or a new user task:
Code:
int new_user_task(uint32_t entry, uint32_t start_sp)
{
struct task *new = ...
[...]
uint32_t *ks = new->kstack_top;
ks -= 10;
new->kstack_bot = ks;
ks[4] = (uint32_t)user_start_iret;
ks[5] = entry;
ks[6] = USER_CS;
ks[7] = USER_INIT_EFLAGS;
ks[8] = start_sp;
ks[9] = USER_DS;
}
Code:
user_start_iret:
movw $USER_DS, %ax
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs
iret