0xBADC0DE wrote:
I am not planning on using SYSENTER/SYSEXIT etc, so I'll just stick with software interrupts. So when an interrupt is generated, the CPU switches to kernel mode automatically?
Depends. x86 is extremely flexible there -- much to its detriment, because all that flexibility has to be configured, and has to be read and processed by the processor. In this case, when you execute "int 128", first the CPU will check the DPL of the IDT entry, to see if you were even allowed to call that int. Therefore, for the syscall interrupt, you must set the DPL to 3.
Then the CPU will load CS with the segment reference out of the IDT entry. If that segment is a CPL 0 segment, then that' what it switches to.
0xBADC0DE wrote:
And regarding "vfs_open", yea, I do have it as returning a FILE structure. In there I store the file's descriptor, but I didn't realize that "vfs_open" should not return a FILE structure, but just the file descriptor instead.
Depends. "vfs_open" is an implementation detail. It can return what ever you should like to return - as long as you know that it cannot return a userspace type. "sys_open", on the other hand, is public ABI, and should return something that can be used to write and read the file. Not a FILE*, however, because that's a type name reserved for libc.
0xBADC0DE wrote:
Will the initial part of "sys_open" be written assembly, as in "fopen" would call "sys_open", which is an assembly function which sets up the stack, puts the appropriate system call number in the eax register, and then calls the actual C function that opens the file (through the user of a system call table, with eax as the index)? And then for the final part, would I just use my "switch_to_user_mode" function to get back to user mode?
Usual way to do this is to have fopen() call a userspace helper that calls the syscall. Something like:
Code:
int open(const char *fn, int flags,...) {
va_list ap;
int mode = 0;
if (flags & O_CREAT) {
va_start(ap, flags);
mode = va_arg(ap, int);
va_end(ap);
}
long r;
asm("int $128" : "=a"(r) : "a"(__NR_open), "b"(fn), "c"(flags), "d"(mode) : "memory");
return __syscall_ret(r);
}
And then __syscall_ret() extracts errno info if need be.
On the other side, you have the int 128 handler in the kernel, which does something like:
Code:
int128:
addl $-44, %esp
movl %gs, 40(%esp)
movl %fs, 36(%esp)
movl %es, 32(%esp)
movl %ds, 28(%esp)
movl %esi, 24(%esp)
movl %edi, 20(%esp)
movl %edx, 16(%esp)
movl %ecx, 12(%esp)
movl %ebx, 8(%esp)
movl %eax, 4(%esp)
movl $KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %es
movl %eax, %fs
movl %eax, %gs
leal 4(%esp), %eax
movl %eax, (%esp)
call sys_main
movl 28(%esp), %eax
movl %eax, %ds
movl 32(%esp), %eax
movl %eax, %es
movl 36(%esp), %eax
movl %eax, %fs
movl 40(%esp), %eax
movl %eax, %gs
movl 4(%esp), %eax
movl 8(%esp), %ebx
movl 12(%esp), %ecx
movl 16(%esp), %edx
movl 20(%esp), %edi
movl 24(%esp), %esi
addl $44, %esp
iretl
If you want to support signals later, add more code after the call instruction above, testing if the current process has signals pending, and calling a handler function for that, too.
Anyway, the kernel then needs the main syscall function:
Code:
struct regs {
long ax, bx, cx, dx, di, si, ds, es, fs, gs, ip, cs, flags, sp, ss;
};
static long sys_ni_syscall(void) { return -ENOSYS; }
static const long (*syscall_table[NUM_SYSCALLS])() = {
/* Using a GCC extension here */
[0..NUM_SYSCALLS - 1] = sys_ni_syscall,
[__NR_open] = sys_open,
[__NR_foo] = sys_foo,
[__NR_bar] = sys_bar,
/* etc. pp. */
}
void sys_main(struct regs* r)
{
if (r->ax < NUM_SYSCALLS)
r->ax = syscall_table[r->ax](r->bx, r->cx, r->dx, r->di, r->si);
else
r->ax = -ENOSYS;
}
Again, sys_main() is bound to become more complicated, once you add code to track the time spent in system or user space, and add a facility to track which system calls a process calls.
0xBADC0DE wrote:
I am getting a DPL of 0 for all the segment registers even though I have made the switch to user mode
Evidently not. Did you load the correct segments? Your enter_usermode() function has a few problems, though: The usermode stack will be the same as the kernel mode one (bad idea, usually), and the code will continue in the same place. Usually you'd allocate a separate stack for the user to use and screw up to their liking, and take as code pointer something user-specified as argument. Also, you first CLI, then PUSHF, meaning the IF will be disabled in the flags you pushed. So the iret will also not set it, meaning you enter user mode with interrupts disabled. Really bad idea. Usually, you'd
Code:
pushfl
orl $(1<<9), (%esp)
0xBADC0DE wrote:
And as for kernel ESP, what should I set that to? In my start.asm file, I have a symbol called "stack_space", which I think is just the start of the stack area for the kernel, but I'm not sure about that.
stack_space is in fact the stack top that you need. That's what you can use there. But that means you abandon your stack once you enter user mode, which is normal, you just have to be aware of it.
Later on, allocate a new kernel stack for each new thread. You switch them out in the TSS when you switch processes. And your boot stack can be the stack for the init process, which never ends, anyway, right?