x86_64-ISR-Registers and syscall arguments

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

Octocontrabass wrote:

DevNoteHQ wrote:

But what about the other topic?
Can i do something like this:
Userland:

Code:

namespace System
{
   void* Syscall(uint64_t iCall, void* OtherArguments)
   {
      asm volatile("syscall");
      //return?
   }
}

You can have a function like this, but if you want it to actually work, you need to tell GCC how to pass parameters and accept return values. For example:

Code:

void *Syscall( uint64_t iCall, void *OtherArguments)
{
    void *retval;
    asm volatile( "syscall" : "=a"(retval) : "D"(iCall), "S"(OtherArguments) : "memory" );
    return retval;
}

This assumes your syscall handler returns a value in RAX, expects iCall in RDI, expects OtherArguments in RSI, accesses memory belonging to your program (for example, by dereferencing the OtherArguments pointer), and doesn't modify the values in any registers besides RAX. You can change the constraints and clobbers to fit your kernel's syscall handler.

Why isn't just calling "Syscall(iCode, OtherArguments);" enough? Shouldn't GCC store "iCode" in RDI and "OtherArguments" in RSI anyway? Or is this again for preventing gcc from optimizing my code away?

MichaelPetch · **Joined:** Fri Aug 26, 2016 1:41 pm **Posts:** 692

DevNoteHQ wrote:

But what about the other topic?
Can i do something like this:
Userland:

Code:

namespace System
{
   void* Syscall(uint64_t iCall, void* OtherArguments)
   {
      asm volatile("syscall");
      //return?
   }
}

[snip]
Why isn't just calling "Syscall(iCode, OtherArguments);" enough? Shouldn't GCC store "iCode" in RDI and "OtherArguments" in RSI anyway? Or is this again for preventing gcc from optimizing my code away?

Although the ABI says what parameters are used to pass parameters to functions, it doesn't mean that the compiler that generates code actually didn't clobber them before asm volatile("syscall"); is called despite the fact it may appear as the first line of code. And yes, with optimizations turned on if a function can be made inline (and the compiler can avoid the overhead of the function call) then it is likely that the values in _RSI_ and _RDI_ will not be the parameters. You need to use assembly templates with constraints to pass in registers to the assembly code, and constraints for all the outputs. Inline assembly in GCC is hard to get right with all its nuances. If you are not an expert in dealing with GCC's inline assembly I'd suggest writing such code in a separate assembly module. It may be less efficient but it would likely be less error prone.

MichaelPetch · **Joined:** Fri Aug 26, 2016 1:41 pm **Posts:** 692

Octocontrabass wrote:

You can have a function like this, but if you want it to actually work, you need to tell GCC how to pass parameters and accept return values. For example:

Code:

void *Syscall( uint64_t iCall, void *OtherArguments)
{
    void *retval;
    asm volatile( "syscall" : "=a"(retval) : "D"(iCall), "S"(OtherArguments) : "memory" );
    return retval;
}

At the very least you are going to want to add R11 and RCX to the clobbers since the SYSCALL instruction itself will overwrite those registers.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

Korona wrote:

MSR_GS_BASE is the GS base address. swapgs swaps MSR_GS_BASE and MSR_KERNEL_GS_BASE. Both registers need to be set up manually.

There is no instruction to access the kernel stack base from the TSS. swapgs is meant to solve exactly that problem. Use swapgs to load a known GS base and then load the kernel RSP from the GS segment.

All this applies only to syscall. Interrupts automatically load RSP from the TSS. Its probably a good idea to swapgs anyway in order to be able to rely on a consistent GS segment for per-CPU data in your kernel.

The RSP0 entry is not automatically set in the TSS, right? So i'll have to save it manually each time i want to leave privilege level 0, right?

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

MichaelPetch wrote:

Octocontrabass wrote:

You can have a function like this, but if you want it to actually work, you need to tell GCC how to pass parameters and accept return values. For example:

Code:

void *Syscall( uint64_t iCall, void *OtherArguments)
{
    void *retval;
    asm volatile( "syscall" : "=a"(retval) : "D"(iCall), "S"(OtherArguments) : "memory" );
    return retval;
}

At the very least you are going to want to add R11 and RCX to the clobbers since the SYSCALL instruction itself will overwrite those registers.

Is that right then?

Code:

void *Syscall( uint64_t iCall, void *OtherArguments)
{
    void *retval;
    asm volatile( "syscall" : "=a"(retval) : "D"(iCall), "S"(OtherArguments) : "memory", "%rcx", "%r11" );
    return retval;
}

MichaelPetch · **Joined:** Fri Aug 26, 2016 1:41 pm **Posts:** 692

DevNoteHQ wrote:

Is that right then?

Code:

void *Syscall( uint64_t iCall, void *OtherArguments)
{
    void *retval;
    asm volatile( "syscall" : "=a"(retval) : "D"(iCall), "S"(OtherArguments) : "memory", "%rcx", "%r11" );
    return retval;
}

That looks okay, but the assumption as well that your SYSCALL code in the kernel doesn't clobber any other registers besides RAX. If your SYSCALL code preserves all the registers it uses, yes that code looks okay.

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

MichaelPetch wrote:

That loos okay, but the assumption as well that your SYSCALL code in the kernel doesn't clobber any other registers besides RAX. If your SYSCALL code preserves all the registers it uses, yes that code looks okay.

Okay, thanks for the help!

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

DevNoteHQ wrote:

Korona wrote:

MSR_GS_BASE is the GS base address. swapgs swaps MSR_GS_BASE and MSR_KERNEL_GS_BASE. Both registers need to be set up manually.

There is no instruction to access the kernel stack base from the TSS. swapgs is meant to solve exactly that problem. Use swapgs to load a known GS base and then load the kernel RSP from the GS segment.

All this applies only to syscall. Interrupts automatically load RSP from the TSS. Its probably a good idea to swapgs anyway in order to be able to rely on a consistent GS segment for per-CPU data in your kernel.

The RSP0 entry is not automatically set in the TSS, right? So i'll have to save it manually each time i want to leave privilege level 0, right?

Or would it be better to create a seperate stack for interrupts/syscalls? Because the interrupt/syscall should remove everything from the stack anyway before iret/sysret...

Brendan · **Posted:** Wed Oct 25, 2017 12:20 am

Hi,

DevNoteHQ wrote:

The RSP0 entry is not automatically set in the TSS, right? So i'll have to save it manually each time i want to leave privilege level 0, right?

Or would it be better to create a seperate stack for interrupts/syscalls? Because the interrupt/syscall should remove everything from the stack anyway before iret/sysret...

It depends on how you want to do kernel stacks.

The most common way is to have a kernel stack for each (user) thread. In this case you set the RSP0 field during task switches - e.g. if CPU #5 switches to thread number #6, you put the "address of kernel stack" value for thread number #6 into the RSP0 field of the TSS that CPU #5 is using. During task switches you'd also change RSP to the thread's "current kernel stack top" (which is different to the "address of kernel stack"), and (possibly long after the task switch was done) if/when you return from the kernel back to user-space you switch from the thread's kernel stack to the thread's user stack, and when an IRQ occurs the CPU switches from the thread's user stack to the thread's kernel stack. Unfortunately; for SYSCALL this switching between the thread's user stack and the thread's kernel stack is something you have to take care of yourself (SYSCALL and SYSRET are the only cases where the CPU doesn't take care of stack switching during a privilege level change).

The other alternative (which is less common and doesn't make much sense for monolithic kernels) is "kernel stack per CPU". In this case you set the RSP0 field in the CPU's TSS once during boot and never have to change it again; but task switching ends up being very different (thread's state saved when switching from user to kernel for any reason, and thread's state loaded when switching from kernel back to user for any reason) and there's usually a few other things different (something to manage what kernel does when, because kernel isn't pre-emptable and can't just switch to other task/s running other kernel code whenever it feels like).

Cheers,

Brendan

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

Brendan wrote:

Hi,

It depends on how you want to do kernel stacks.

The most common way is to have a kernel stack for each (user) thread. In this case you set the RSP0 field during task switches - e.g. if CPU #5 switches to thread number #6, you put the "address of kernel stack" value for thread number #6 into the RSP0 field of the TSS that CPU #5 is using. During task switches you'd also change RSP to the thread's "current kernel stack top" (which is different to the "address of kernel stack"), and (possibly long after the task switch was done) if/when you return from the kernel back to user-space you switch from the thread's kernel stack to the thread's user stack, and when an IRQ occurs the CPU switches from the thread's user stack to the thread's kernel stack. Unfortunately; for SYSCALL this switching between the thread's user stack and the thread's kernel stack is something you have to take care of yourself (SYSCALL and SYSRET are the only cases where the CPU doesn't take care of stack switching during a privilege level change).

The other alternative (which is less common and doesn't make much sense for monolithic kernels) is "kernel stack per CPU". In this case you set the RSP0 field in the CPU's TSS once during boot and never have to change it again; but task switching ends up being very different (thread's state saved when switching from user to kernel for any reason, and thread's state loaded when switching from kernel back to user for any reason) and there's usually a few other things different (something to manage what kernel does when, because kernel isn't pre-emptable and can't just switch to other task/s running other kernel code whenever it feels like).

Cheers,

Brendan

Thanks!
Can you tell me what to do when the stack is full? Do i just use stack traces to detect if the stack is full and create a new one somewhere?
And are heap and stack for a process/thread usually created by the kernel? Because the kernel reservates some space for the stack itself. And is the heap for the kernel initialized the same way as the stack for the kernel (="resb HEAP_SIZE")? Or do you usually initialize the heap on top of the stack (=on top of the kernel.elf) when the kernel is already running?

And what use could swapgs have? The only usage of the GS register that i saw was to save CPU-specific information (TSS-address, current thread,...) in the kernelmode register. What could i put into the usermode register? Or is the usermode GS usually empty to prevent processes of reading CPU-specific information?

Brendan · **Posted:** Fri Oct 27, 2017 12:36 am

Hi,

DevNoteHQ wrote:

Brendan wrote:

It depends on how you want to do kernel stacks.

Thanks!
Can you tell me what to do when the stack is full?

Most people assume that if a kernel stack becomes full then the kernel has bugs (e.g. got stuck in an "infinite recursion" loop), so they do some kind of "kernel panic" and halt the machine. In this case you'd have to (try to) make sure kernel stacks are large enough to begin with.

Unfortunately "how large is large enough?" can be hard to determine; especially if you support nested IRQs, and especially if it's a monolithic kernel where third-party device drivers can install their own IRQ handlers. Note that some OSs have special/extra "interrupt handler stacks" so that the worst case that normal kernel stacks have to handle is much smaller (which can save a lot of RAM if you've using a kernel stack for each user thread, and also makes it easier to figure out how large a kernel stack needs to be).

DevNoteHQ wrote:

Do i just use stack traces to detect if the stack is full and create a new one somewhere?

"Dynamically growing kernel stacks" ends up being excessively complicated and/or error prone.

DevNoteHQ wrote:

And are heap and stack for a process/thread usually created by the kernel? Because the kernel reservates some space for the stack itself. And is the heap for the kernel initialized the same way as the stack for the kernel (="resb HEAP_SIZE")? Or do you usually initialize the heap on top of the stack (=on top of the kernel.elf) when the kernel is already running?

For processes, my kernel gives the process a virtual address space and lets the process (or the run-time for whatever language that process was written in) do whatever it feels like with the virtual address space it was given. My kernel only provides a way for the process to set/change the "virtual area type" for areas of the process' virtual address space (e.g. if a process tells my kernel it wants the area from 0x00000000 to 0x12345000 in its virtual address space to be changed to the "not used" virtual area type, then my kernel does what it's told).

For kernels, I'd start by drawing a map of how you feel like using kernel space - maybe one area for kernel code, maybe one area for the "recursive page table mapping" trick, maybe one area set aside for memory mapped devices, etc. If kernel has one or more heaps, then you'd add space for those too. How you tell the compiler about this memory map is up to you - I'm lazy so I typically just use the preprocessor (e.g. "#define MESSAGE_QUEUE_AREA_ADDRESS 0xD800000").

DevNoteHQ wrote:

And what use could swapgs have? The only usage of the GS register that i saw was to save CPU-specific information (TSS-address, current thread,...) in the kernelmode register. What could i put into the usermode register? Or is the usermode GS usually empty to prevent processes of reading CPU-specific information?

Usually a kernel has various pieces of code that want to find CPU specific information (e.g. which task is the current CPU running, how long until the current CPU needs to do a task switch, how much load is the current CPU under, is the current CPU in a special state, etc). To find this information quickly on 80x86 a lot of kernels use a segment register, where the segment's base points to that CPU's information. Unfortunately, nothing prevents user-space code from changing the segment register that the kernel uses so the kernel has to guard against this possibility. SWAPGS is what AMD provided for this purpose - so that the kernel can use SWAPGS to make sure GS is set correctly (e.g. just after CPU switches from CPL=3 to CPL=0), then use GS to find CPU specific information.

User code shouldn't have any reason to touch (or modify) GS, and shouldn't be able to access the kernel's data (including the kernel's CPU specific data that GS is used for). For thread local storage; threads can use a different segment register (e.g. FS).

Cheers,

Brendan

DevNoteHQ · **Joined:** Mon May 15, 2017 11:04 am **Posts:** 50

Thanks a lot!

OSDev.org

x86_64-ISR-Registers and syscall arguments

Who is online