OSDev.org

The Place to Start for Operating System Developers
It is currently Sun Jul 03, 2022 11:45 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: The cost of a system call
PostPosted: Fri May 04, 2012 7:42 pm 
Offline
Member
Member

Joined: Thu Mar 25, 2010 11:26 pm
Posts: 1801
Location: Melbourne, Australia
I've been doing some testing this morning and thought someone might be interested in the results. My machine is a Core2 Quad at 2.83 GHz. I have a 'null' system call that I used to do the tests in long mode. It does the following.

i) begins in the C lib passes 3 parameters to the kernel
ii) enters the kernel via syscall or int $0x20
iii) kernel saves all 16 GP regs
iv) uses swapgs to retrieve kernel stack, core id etc.
v) switches to kernel stack
vi) moves into kernel C code through my kernel system call mechanism and increments a counter
vii) reverses the above and returns to user mode.

The scheduler is not called.

Average times are as follows

Code:
int $0x20/iretq  - 587 ns
syscall/sysret   - 449 ns

int $0x20/sysret - 506 ns
syscall/iretq    - 530 ns

What this means is that per call -
Code:
syscall instead of int $0x20 saves 57 ns
sysret instead of iretq saves      81 ns
total saving                      138 ns


The other thing that this shows is that you can mix and match between the 2 mechanisms. You can use sysret to return from hardware interrupts to ring3. I also tested using syscall from ring 1 and that works fine as long as you use iretq to return.

_________________
If a trainstation is where trains stop, what is a workstation ?


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 4:38 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
Here is my result, test with QEMU.
The OS has only launched single process (kthreads are idle priority so will not be switched, but PIC timer will still fire which should affect very little)
The kernel and application is compiled with -O2
Note that my syscall will only preserve registers according to AMD ABI, but not 16 like yours.

Code:
PID[1]: Hello from user application
Call    Start : 4378D42C
          End : 4381FCF7
      Elapsed : 928CB (600267 cycles)
      Average : 3C (60 cycles)
Syscall Start : 4381FDF4
          End : 43CFDD63
      Elapsed : 4DDF6F (5103471 cycles)
      Average : 1FE (510 cycles)


Related code:
Code:
    // test call
    __asm volatile ("rdtsc; rdtsc\n" : "=a"(cstart_lo), "=d"(cstart_hi));
    for ( int i=0; i<10000; i++ ) {
        call_null();
    }
    __asm volatile ("rdtsc\n" : "=a"(cend_lo), "=d"(cend_hi));

    // test sysacall
    __asm volatile ("rdtsc; rdtsc\n" : "=a"(start_lo), "=d"(start_hi));
    for ( int i=0; i<10000; i++ ) {
        syscall_null();
    }
    __asm volatile ("rdtsc\n" : "=a"(end_lo), "=d"(end_hi));


userland interface:
Code:
call_null:
    ret
syscall_null:
    xor     eax, eax
    syscall
    ret


syscall in kernel:
Code:
; Max 5 parameters: rdi rsi rdx r9 r8
_syscall_stub:
    cmp     rsp, APPADDR_PROCESS_STACK
    jae     .fault

    push    rcx                         ; ring3 rip
    push    r11                         ; rflags

    mov     r11, qword syscall_table
    mov     rcx, r9                     ; 4th parameter
   
    cmp     eax, 12
    jbe     .1
    mov     edi, eax
    xor     eax, eax
.1:
   
    call    qword [r11+rax*8]

    pop     r11
    pop     rcx
    db 0x48 ; REX.w
    sysret


null call is a C function with just return 0;


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 5:29 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2812
gerryg400 wrote:
i) begins in the C lib passes 3 parameters to the kernel
ii) enters the kernel via syscall or int $0x20
iii) kernel saves all 16 GP regs
iv) uses swapgs to retrieve kernel stack, core id etc.
v) switches to kernel stack
vi) moves into kernel C code through my kernel system call mechanism and increments a counter
vii) reverses the above and returns to user mode.

The scheduler is not called.

Average times are as follows

Code:
int $0x20/iretq  - 587 ns
syscall/sysret   - 449 ns

int $0x20/sysret - 506 ns
syscall/iretq    - 530 ns



This is a lot more than what I measured on this processor in 32-bit mode with call gates in RDOS.

Results from the other thread:

near: 51.6 million calls per second
gate: 13.4 million calls per second
sysenter: 10.5 million calls per second

The call gate version takes about 75ns. I haven't measured the sysenter/sysexit version yet (edit: just below 100ns, and thus slower). And this is the full overhead as there is nothing else involved in calling kernel functions (other than loading the appropriate registers for the call in case the function has parameters). At the kernel side, no registers are saved unless they are used.


Last edited by rdos on Sat May 05, 2012 6:22 am, edited 3 times in total.

Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 5:35 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2812
bluemoon wrote:
Here is my result, test with QEMU.


Its not reliable to use QEMU for these kind of performance tests. You must do them on real hardware.

Additionally, you should not use idealised code, but you should rather compile it and validate syscalls like you would do in a production release of your OS/application.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 5:40 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
I just started porting my OS to 64bit last week and it just finally worked yesterday, so what I can do is run it in QEMU for now. sure someday I'll try it on real hardware.
The numbers are for references only, don't take it too serious.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 11:06 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
oops that was mistake, should be:
Code:
__asm volatile ("xor eax, eax; cpuid; rdtsc" : "=a"(cstart_lo), "=d"(cstart_hi) :: "ebx", "ecx");

The idea is to make sure rdtsc is not executed out of order.

Then, the result become:
Code:
  PID[1]: Hello from user application
Call    Start : 45409DEE
          End : 48B61737
      Elapsed : 3757949 (58030409 cycles)
      Average : 3A (58 cycles)
Syscall Start : 48B618EA
          End : 61EB4055
      Elapsed : 1935276B (422913899 cycles)
      Average : 1A6 (422 cycles)
  PID[1]: Hello again! counter=0
  PID[1]: Hello again! counter=1
  PID[1]: Hello again! counter=2
  PID[1]: Hello again! counter=3
  KMAIN : Clean zombie process: FFFFFFFF:8012A500


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 11:43 am 
Offline
Member
Member
User avatar

Joined: Tue Feb 08, 2011 1:58 pm
Posts: 496
gerryg400 wrote:
iii) kernel saves all 16 GP regs

Good testing, but I think your results were influenced by this. One of the advantage of using syscall is no need for saving all the registers. You only have to save rcx and r11. Gives considerable performance boost.

Here's how I do it. KMEM_userspace points to current TCB, which happens to be TSS as well. This is the prologue:
Code:
      cli
if INTSYSCALL
      clsavectx
      sub         qword [MEM_userspace+24h], KERNELSTACKSIZE
else
      mov         qword [MEM_userspace+tcb.userrip], rcx
      pushf
      pop         qword [MEM_userspace+tcb.userflags]
      //restore previous r11 from local variables stack (pushed on caller side)
      mov         r11, qword [r15]
end if
      //bound check
      cmp         qword [MEM_userspace+24h], MEM_userspace+tcb.acl_end
      jb         @f
      sti
@@:

And this is the epilogue:
Code:
if INTSYSCALL
      clloadctx
      iretq
else
      mov         r11, qword [MEM_userspace+tcb.userflags]
      mov         rcx, qword [MEM_userspace+tcb.userrip]
      //force interrupt enable
      bts         r11, 9
      sysretq
end if

Maybe yield is interesting too:
Code:
if INTSYSCALL
      //no clloadctx, we want registers changed
      add         rsp, 16*8
      add         qword [MEM_userspace+24h], KERNELSTACKSIZE
      clsavectx
      //switch page tables and refresh cr3
      call         sys.arch.thread.thswitch
      clloadctx
else
      int         SCHEDTMR_INT
end if

Hope it was useful.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sat May 05, 2012 2:46 pm 
Offline
Member
Member

Joined: Thu Mar 25, 2010 11:26 pm
Posts: 1801
Location: Melbourne, Australia
I understand the comments but the purpose of the test was to compare "syscall/sysret" to "int/iretq". Most documentation tells us that the former pair is "4 times quicker" (or something similar) to the latter. I've always felt that this is a useless way of comparing the instructions unless you know how much the rest of the system call costs.

As turdus points out there is another saving with sys call/sysret (i.e. not really needing to save the GP regs on a syscall). But surely this is true for the int/iretq situation as well ?

_________________
If a trainstation is where trains stop, what is a workstation ?


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 9:16 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

gerryg400 wrote:
I understand the comments but the purpose of the test was to compare "syscall/sysret" to "int/iretq". Most documentation tells us that the former pair is "4 times quicker" (or something similar) to the latter. I've always felt that this is a useless way of comparing the instructions unless you know how much the rest of the system call costs.


I've always thought that, because syscall/sysret doesn't do some things that are likely to be necessary (like switching ESP to a kernel stack), it isn't directly comparable to software interrupts or call gates (or SYSENTER) because a kernel typically needs to add more instructions to a syscall handler that wouldn't have been necessary for software interrupts or call gates.

For an extreme example; because ESP/RSP isn't switched and the CPU doesn't push anything on the stack while at CPL=3, user space code could do "mov rsp, SOMEWHERE_IN_KERNEL_SPACE" and then "SYSCALL" and trick the kernel into trashing itself or modifying kernel data. To guard against that, the kernel has to save RSP somewhere and load RSP with a "known good" value before anything is pushed on the stack (either by the SYSCALL handler itself or by the CPU if an NMI or machine check exception occurs).

Note: To be honest, I'm not even sure if it's possible to use SYSCALL in a "guaranteed 100.0000% safe" way (as you can't prevent NMI or machine check before the SYSCALL handler switches to a safe stack, and task switching and IST fails for nesting).

For worst case, you'd need to deal with malicious user space code that does something like this:

Code:
    mov eax,0
    mov ds,eax
    mov es,eax
    mov fs,eax
    mov gs,eax
    mov esp,SOMEWHERE_IN_KERNEL_SPACE
    syscall


gerryg400 wrote:
As turdus points out there is another saving with sys call/sysret (i.e. not really needing to save the GP regs on a syscall). But surely this is true for the int/iretq situation as well ?


Yes.

For a fair comparison that isn't overly effected by OS design, you'd want to compare:
  • software interrupt with nothing more than IRET
  • call gate with nothing more than RETF
  • SYSENTER with nothing more than SYSEXIT
  • the minimum safe SYSCALL handler
The result won't be entirely OS neutral as different OSs will take different approaches for the minimum safe SYSCALL handler (e.g. the "mov esp, *something*" part).

I'd also suggest that the caller's code size also be taken into account. SYSCALL and "INT n" both cost 2 bytes. For SYSENTER, the caller needs to store "return EIP" and "return ESP" somewhere (likely EDX and ECX), so even though SYSENTER is only 2 bytes itself it's probably going to cost 6 or more bytes. For 32-bit code, call gates are going to cost a minimum of 6 bytes (using a 16-bit address size override prefix to avoid the full 32-bit offset that's ignored anyway).

I'd expect that SYSENTER would end up being the winner for performance (for frequently executed pieces of code), and SYSCALL and software interrupts would tie for code size (for infrequently used code).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 10:00 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
Brendan wrote:
For worst case, you'd need to deal with malicious user space code that does something like this:

Code:
    mov eax,0
    mov ds,eax
    mov es,eax
    mov fs,eax
    mov gs,eax
    mov esp,SOMEWHERE_IN_KERNEL_SPACE
    syscall


If I understand correctly, in long mode (hence required by syscall instruction) ds, es are practically ignored. I do the above in my code and it affect nothing.
I still need to check cmp rsp, APPADDR_PROCESS_STACK, where APPADDR_PROCESS_STACK is the application legal address range, and have enough room, and return fail for the syscall or abort the process. syscall handler can reuse the application's user stack just fine, while make sure for not leaving sensitive data there - at that time you may still switch stack.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 11:25 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

bluemoon wrote:
Brendan wrote:
For worst case, you'd need to deal with malicious user space code that does something like this:

Code:
    mov eax,0
    mov ds,eax
    mov es,eax
    mov fs,eax
    mov gs,eax
    mov esp,SOMEWHERE_IN_KERNEL_SPACE
    syscall


If I understand correctly, in long mode (hence required by syscall instruction) ds, es are practically ignored. I do the above in my code and it affect nothing.


The example was for a 32-bit protected mode kernel (otherwise it would've been "mov rsp,SOMEWHERE_IN_KERNEL_SPACE" ;-) ).

bluemoon wrote:
I still need to check cmp rsp, APPADDR_PROCESS_STACK, where APPADDR_PROCESS_STACK is the application legal address range, and have enough room, and return fail for the syscall or abort the process. syscall handler can reuse the application's user stack just fine, while make sure for not leaving sensitive data there - at that time you may still switch stack.


If the syscall handler reuses the application's user stack, be very careful with your page fault handler. If the syscall handler's RSP (inherited from user space) ends up pointing to a "not present" page (either because that's where the caller left it, or because the kernel pushed enough on the stack to cross from a present page into a not present page), then the CPU won't try to switch to a different stack when trying to start the page fault exception handler (no privilege level transition) and will generate a double fault. To avoid that you'd probably need to use IST for the page fault handler (and ensure that page faults never nest), or use IST for the double fault handler.

Also, "cmp rsp, APPADDR_PROCESS_STACK" isn't enough. Consider:

Code:
    mov rsp,0x00000008
    syscall


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 11:31 am 
Offline
Member
Member
User avatar

Joined: Wed Dec 01, 2010 3:41 am
Posts: 1761
Location: Hong Kong
Brendan wrote:
The example was for a 32-bit protected mode kernel (otherwise it would've been "mov rsp,SOMEWHERE_IN_KERNEL_SPACE" ;-) ).[/code]

according to intel manual, syscall in 32-bit or compatibility mode trigger #UD.

Brendan wrote:
Also, "cmp rsp, APPADDR_PROCESS_STACK" isn't enough. Consider:
Code:
    mov rsp,0x00000008
    syscall


That's why I said "is the application legal address range, and have enough room".

edit: I did an experiment with this:
Code:
syscall_null:
    xor     eax, eax
    mov     ds, ax
    mov     es, ax
    mov     fs, ax
    mov     gs, ax
    mov     rbx, rsp
    mov     rsp,0x00000008
    syscall
    mov     rsp, rbx
    ret


And this is catched by #PF within syscall handler, which I have a chance to terminate this abnormal process.
Code:
  INT0E : #PF Page Fault Exception. RIP:FFFFFFFF:80104AE9 CODE:2 ADDR:00000000:00000000
        : PML4[0] PDPT[0] PD[0] PT[0]
    #PF : Access to unallocated memory. CODE: 2
        : ADDR: 00000000:00000000 PTE[0]: 00000000:00000000


By the way, you are correct on the #PF issue which I overlooked.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 11:59 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2812
Brendan wrote:
Note: To be honest, I'm not even sure if it's possible to use SYSCALL in a "guaranteed 100.0000% safe" way (as you can't prevent NMI or machine check before the SYSCALL handler switches to a safe stack, and task switching and IST fails for nesting).


I don't remember the parameters for SYSCALL, but at least for SYSENTER it is possible to make 100% certain that an application cannot modify kernel data or malfunctions because of an invalid kernel stack.

I do it like this:

1. Kernel ESP MSR is loaded with the current thread stack offset from TSS (by taking base + size of SS0) whenever a new thread is scheduled. This takes care of the nesting issue as ESP is not loaded manually in kernel.

2. When using the stack reference from the application, address it with the ds or es register, and let ds and es for applications only cover application address space. This will make the sysenter entry-point code protection fault if a stack reference to kernel space is provided. In long mode, this doesn't work (limits are not used), and so the pointer needs to be checked with software.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 2:01 pm 
Offline
Member
Member
User avatar

Joined: Tue Oct 17, 2006 9:29 pm
Posts: 2426
Location: Canada
bluemoon wrote:
according to intel manual, syscall in 32-bit or compatibility mode trigger #UD.

SYSCALL/SYSRET are from AMD, which does support them in 32-bit mode. Intel only supports them in 64-bit mode.

_________________
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.


Top
 Profile  
 
 Post subject: Re: The cost of a system call
PostPosted: Sun May 06, 2012 2:25 pm 
Offline
Member
Member

Joined: Tue Apr 15, 2008 6:37 pm
Posts: 191
Location: Gotham, Batmanistan
Generally if you were to use SYSCALL in long mode you simply swapgs and load in a known good pointer.

Code:
user_enter_syscall64:
     swapgs
     mov rax, [gs:KSTACK_OFFSET]
     mov [gs:USTACK_OFFSET], rsp
     mov rsp, rax
     ...
     mov rsp, [gs:USTACK_OFFSET]
     swapgs
     sysret


This is making the assumption you could at least clobber RAX initially as you'll probably return some value in it later. You could also do similar things for protected mode.

Code:
user_enter_syscall32:
   mov ax, PROC_SPECIFIC_DATA_SEG
   mov gs, ax
   mov eax, [gs:KSTACK_OFFSET]
   mov [ss:eax+4], esp
   mov esp, eax
   ...
   pop gs
   pop esp
   sysret


Here the user space GS value is assumed to be determinable from some other structure (thread info for example), which should work out since it's usually used for thread specific data anyways. To Brendan's point about NMI's it'd be a mess if you aren't using a task gate/IST for them. AFAIK Linux deals with NMI nesting by using task gates or ISTs and doing some extensive checking in software to determine if an NMI nested within the NMI handler code itself.

_________________
Reserved for OEM use.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group