Syscall types

frank · **Posted:** Wed Aug 08, 2007 11:53 am

I had never thought about the alternate calling conventions until I read the recent post on programming for Mac OS X. Now that I have seen another syscall type (passing on the stack instead of registers) I am curious about which one other people think is better. Now this is in no way about the implementation I know how to do that myself this is completely about the conventions of BSD style vs. Linux, Windows style. Now for those that don't know about BSD style there is a good summary here.

jnc100 · **Posted:** Wed Aug 08, 2007 2:50 pm

Surely that relies on int not changing the stack? Either you run all user code at PL0 or you have your syscalls running at PL3? I suppose syscall/sysenter wouldn't have a problem.

Regards,
John.

frank · **Posted:** Wed Aug 08, 2007 3:06 pm

jnc100 wrote:

Surely that relies on int not changing the stack? Either you run all user code at PL0 or you have your syscalls running at PL3? I suppose syscall/sysenter wouldn't have a problem.

Regards,
John.

'
Well I never thought about that part, of course the location of the user stack gets pushed on the kernel stack so it shouldn't be too hard to get to the values.

Brendan · **Posted:** Wed Aug 08, 2007 10:57 pm

Hi,

frank wrote:

I had never thought about the alternate calling conventions until I read the recent post on programming for Mac OS X. Now that I have seen another syscall type (passing on the stack instead of registers) I am curious about which one other people think is better.

Which calling convention is better depends on which language you're using, and what your design goals are.

For me, I know my kernel functions never need a variable number of arguments or a large number of arguments, and design my code so that people that want speed can have speed. I pass input and output parameters in registers, and guarantee that kernel functions never trash any registers (unless they're used for output parameters).

Registers are also used in order - I use the following registers:

Code:

Input Parameters:
   rax/eax = function number
   rbx/ebx = first parameter
   rsi/esi = second parameter
   rdi/edi = third parameter
   r9/ebp = fourth parameter

Output Parameters:
   rax/eax = status/error code
   rbx/ebx = first return parameter
   rsi/esi = second return parameter
   rdi/edi = third return parameter
   r9/ebp = fourth return parameter

There's a few things here worth mentioning. First, ECX and EDX aren't used because of the way SYSEXIT (and SYSRET) work. Secondly, all kernel API functions return a standard error code (returned eax = 0 if function was successful), so that if the function fails you can find out why without any messing about.

All undefined functions return an "undefined function" error code, which makes it easy to write software that uses newer API functions that also works on older kernels. This is an important difference to standard C calling conventions, where there is no standard way of returning errors. For example (for C), you can't just return a NULL pointer and hope that if/when the function is defined in the future it will return NULL to indicate an error (and even if it does, the caller wouldn't know if the function isn't supported, or if the function is supported but failed).

I also try to make my input and output parameters match for similar functions. For example, if there's a function to get the time of day (hour, minute, second) and another function to set the time of day, then the output parameters for the first function can be used without change as input parameters for the second function. This just reduces the amount juggling the callers need to do.

For 32-bit code I usually have a software interrupt, a call gate, SYSENTER and SYSCALL. All of these just check if EAX is sane and then do "call [API_table + eax*4]" and return (IRET, RETF, SYSEXIT or SYSRET), so that I only need one table of kernel functions (and one set of kernel functions) that work the same regardless of how CPL=3 code called the kernel. This is even more convenient for me, as my recent design uses "kernel modules" - it means one kernel module can call public kernel functions in another kernel module by using the same interface (e.g. by doing a "call near [API_table + function_number * 4]"), which reduces the need for the "kernel-only" call table.

Lastly, I plan to have a "batch" kernel function where the caller creates a table of parameters for many kernel functions and then calls the kernel's special "batch" function once. For each entry in the table the kernel loads the set of parameters from the table into registers, executes the kernel function indicated by the entry, and then stores the return parameters back in the table. The idea here is to improve performance by reducing the number of times you need to switch to CPL=0 and back (e.g. switch to CPL=0, execute N kernel functions then return to CPL=3, then check all the return parameters; instead of switching to CPL=0 and back N times). This is simple for me to implement and can work for all kernel functions (but would be a nightmare if the kernel allowed functions with a variable number of parameters, for e.g.).

In general this method is fast, but only because the kernel functions themselves are written in assembly. If the kernel functions were written in a high level language then passing parameters in registers would be slower because I'd need some sort of ugly assembly stub that pushes the registers on the stack so that the high level language code can access them (in this case it'd make more sense to pass parameters on the stack to begin with).

Cheers,

Brendan

bewing · **Posted:** Thu Aug 23, 2007 7:39 am

Well, Brendan, I like your style. :wink:

I haven't taken my design quite to your extreme, yet -- and I'm not quite sure I'll go that far. Considering that ecx, edi, esi, eax, edx (and to some extent ebp) have "special uses" in intel CPUs -- I don't really want to just assign "input #4" to one of them, forever. Since there is usually something like an optimal register ordering for any particular kernel function, I'll be content with the other method, of just creating a per-function translation between a stack-oriented calling structure, and each assembly-based kernel function.

If motorola had won the chip wars over intel, with something like a mc68k register design -- where all the data and address registers are separately interchangeable -- then I'd absolutely be designing my interface just like yours.

As far as returns go, however -- I think there is an important addition to look at. The one huge advantage that assembly routines have over traditional high-level languages is that you can return many values (up to 7 values plus 2 testable flag bits on 32bit intel!) as the "result" of a function. Every high level language that I've seen limits the # of explicit returns to one.

Which leads to my point: I think it is possible, and reasonable, to ALWAYS return a 2nd, implicit value from all defined and undefined kernel calls -- when doing a stack-based return to a high level language. The first return would be the expected return value. The second return is always a pre-defined system error code -- which can include "_UNDEFINED_FUNCTION." It seems to me that this extends some needed assembly functionality a little further into the high level language realm.

This is a little annoying to implement in assembler, of course. You would probably have to pre-define a particular register as always holding the system error code. This sounds like an extension to what you are doing with eax?

Obviously, supporting this requires tweaking the calling/return strucure of every compiler that gets loaded on your OS, and those "ugly" assembly stubs -- but I think it's worth it, for the extra benefit of permanently separating error codes from return values, and allowing the use of both in all situations.

Brendan · **Posted:** Thu Aug 23, 2007 10:59 am

Hi,

bewing wrote:

I haven't taken my design quite to your extreme, yet -- and I'm not quite sure I'll go that far. Considering that ecx, edi, esi, eax, edx (and to some extent ebp) have "special uses" in intel CPUs -- I don't really want to just assign "input #4" to one of them, forever.

For special uses, even though ECX and EDX are useful (I/O port access and LOOP/JECXZ) they're trashed by SYSENTER/SYSEXIT and need to be reserved. For string instructions (EAX, ESI and EDI), most string instruction are slower than similar code implemented as smaller instructions (except for string copy, like REP MOVSD) and the cost of shifting things between registers (if/when necessary) is relatively small compared to the CPU's setup costs. That only really leaves EBP (used for stack frames in high level language), but it's not too hard to use EBP from inline assembly within high level language code without problems and most kernel functions don't need more than 4 input or output parameters (and therefore don't need EBP).

bewing wrote:

Since there is usually something like an optimal register ordering for any particular kernel function, I'll be content with the other method, of just creating a per-function translation between a stack-oriented calling structure, and each assembly-based kernel function.

That would work too, but the most efficient place for this per-function translation between stack and registers is within the libraries used by the high level languages. This is partly because the libraries usually need to more than just the low-level kernel function call, and partly because the compiler can optimise register usage so things are in the right registers before the function call anyway.

bewing wrote:

As far as returns go, however -- I think there is an important addition to look at. The one huge advantage that assembly routines have over traditional high-level languages is that you can return many values (up to 7 values plus 2 testable flag bits on 32bit intel!) as the "result" of a function. Every high level language that I've seen limits the # of explicit returns to one.

IMHO this is what makes most high level languages inherently broken. Consider C, where programmers use "pointers to output parameters" as input parameters, which can make it hard to tell the difference between input parameters and output parameters (and can also increase overhead). For an example, compare "status = foo(&a, &b, &c);" to something like "(status, a, b) = foo(&c);". For the normal C code it's hard to tell that "a" and "b" are actually output parameters.

bewing wrote:

Which leads to my point: I think it is possible, and reasonable, to ALWAYS return a 2nd, implicit value from all defined and undefined kernel calls -- when doing a stack-based return to a high level language. The first return would be the expected return value. The second return is always a pre-defined system error code -- which can include "_UNDEFINED_FUNCTION." It seems to me that this extends some needed assembly functionality a little further into the high level language realm.

This is a little annoying to implement in assembler, of course. You would probably have to pre-define a particular register as always holding the system error code. This sounds like an extension to what you are doing with eax?

That is exactly what I do - EAX always returns a standardised status code (0 = OK, 1 = function undefined, 2 = bad input parameters, 3 = permission denied, etc).

bewing wrote:

Obviously, supporting this requires tweaking the calling/return strucure of every compiler that gets loaded on your OS, and those "ugly" assembly stubs -- but I think it's worth it, for the extra benefit of permanently separating error codes from return values, and allowing the use of both in all situations.

That's another reason why the most efficient place for the per-function translation between stack and registers is within the libraries used by the high level languages. For example (for C) the kernel can return 2 parameters (the status code and something else) and the library can store the status output parameter into "errno" and only return one parameter to the caller. For example:

Code:

_open:
    push ebx
    push esi
    mov ebx,[esp+4]
    mov esi,[esp+8]
    mov eax,0x00001234
    int 0x80
    test eax,eax
    je .worked

    mov [_errno],eax
    mov eax,-1
    pop esi
    pop ebx
    ret

.worked:
    mov eax,ebx
    pop esi
    pop ebx
    ret

Of course you'd want to define this as a macro in a header file instead, so that the optimiser can do it's thing...

The same thinking applies to other functions - for example, the kernel could return many parameters and the library could put most of them into a structure or something. It's also "high level language neutral" as it's not designed for any specific high level language calling convention (and doesn't penalise languages that use other calling conventions) - the code needed in a C library isn't much different to the code needed in a PASCAL library (even though both libraries might do completely different things with input and output parameters).

The other nice thing is that an assembly programmer can skip all the "fluff" and directly use multple return parameters without messing about with the stack, but it doesn't really increase overhead for high level languages (assuming that the low level kernel API calls are done as inline assembly macros that are optimised by the compiler).

Cheers,

Brendan

JamesM · **Posted:** Thu Aug 23, 2007 11:05 am

Quote:

For an example, compare "status = foo(&a, &b, &c);" to something like "(status, a, b) = foo(&c);".

That is actually (whether deliberate or not) a syntactically valid perl function call. (perl can return 'list contexts', or multiple return values). Now all we need is a viable way to use perl in kernel dev

JamesM

Alboin · **Posted:** Thu Aug 23, 2007 11:51 am

JamesM wrote:

Now all we need is a viable way to use perl in kernel dev

Or use Alef.

bewing · **Posted:** Fri Aug 24, 2007 11:13 am

Brendan wrote:

the most efficient place for the per-function translation between stack and registers is within the libraries used by the high level languages.

I agree completely. Your example shows almost precisely how I intend to implement my kernel asm/HLL API interface in each language's library.

Brendan wrote:

For special uses, even though ECX and EDX are useful (I/O port access and LOOP/JECXZ) they're trashed by SYSENTER/SYSEXIT and need to be reserved.

That's why I don't use SYSENTER/SYSEXIT. :wink:

I just batch all my userland system requests in a queue. Sort of like your batch looping kernel call.

Brendan wrote:

For string instructions (EAX, ESI and EDI), most string instruction are slower than similar code implemented as smaller instructions

I suppose. But opcode clock cycle info is not really available anymore, and those string instructions should be more "optimizable" than a series of smaller instructions, theoretically. So I think I'll stick with the string ops anyway.

And the irritating things about EBP are that it preferentially uses the SS segment -- so conceivably you always have to use DS overrides every time you use EBP as a pointer. And the Mod/RM-SIB bytes for EBP are screwey, so you often get many more bytes in an EBP opcode than for any other register.

Brendan · **Posted:** Fri Aug 24, 2007 9:58 pm

Hi,

bewing wrote:

Brendan wrote:

For special uses, even though ECX and EDX are useful (I/O port access and LOOP/JECXZ) they're trashed by SYSENTER/SYSEXIT and need to be reserved.

That's why I don't use SYSENTER/SYSEXIT. :wink:

I just batch all my userland system requests in a queue. Sort of like your batch looping kernel call.

Sometimes you can batch several system calls together, but often you can't - optimising both cases is worth considering....

bewing wrote:

Brendan wrote:

For string instructions (EAX, ESI and EDI), most string instruction are slower than similar code implemented as smaller instructions

I suppose. But opcode clock cycle info is not really available anymore, and those string instructions should be more "optimizable" than a series of smaller instructions, theoretically. So I think I'll stick with the string ops anyway.

For modern CPUs, complex instructions (hardware task switching, most string instructions, enter/leave, loop, etc) use microcode, and are broken down into simpler instructions. It's this conversion that makes them slow. CPU designers could hard-wire the instructions into the core itself so they don't need to use microcode for them, but I'd assume that CPU manufacturers have decided it's not worth the hassle and they're better off make more frequently used instructions faster instead.

For string comparisons (e.g. repe cmpsb, repe scasb), Intel are introducing SSE instructions to do them faster.

For I/O instructions (e.g. rep outsb) a micro-kernel never uses them (and most devices drivers don't use them anyway).

I'd expect string instructions (except memory copying, perhaps) to end up like hardware task switching - considered a legacy thing that no-one uses and no CPU manufacturer cares about.

bewing wrote:

And the irritating things about EBP are that it preferentially uses the SS segment -- so conceivably you always have to use DS overrides every time you use EBP as a pointer. And the Mod/RM-SIB bytes for EBP are screwey, so you often get many more bytes in an EBP opcode than for any other register.

I use a flat memory model (where SS = DS = ES) and can ignore segment overrides. The possiblity of a few extra bytes of code is mostly negligable - it's less overhead than messing about putting parameters on the stack or in memory.

It's also very rare for a kernel function to need more than 4 input or output parameters. The only functions I can remember that ever needed more than 4 parameters is one function to convert an integer time value into broken down time (year, month, day, hour, minute, second) and another function that does the reverse.

Cheers,

Brendan

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

Alboin wrote:

JamesM wrote:

Now all we need is a viable way to use perl in kernel dev

Or use Alef.

I dunno. Alef-Null or C?

Alboin · **Posted:** Mon Aug 27, 2007 3:48 pm

Crazed123 wrote:

Alboin wrote:

JamesM wrote:

Now all we need is a viable way to use perl in kernel dev

Or use Alef.

I dunno. Alef-Null or C?

Hmm? No, the Alef Programming Language. It has tuples built in.

Crazed123 · **Joined:** Thu Oct 21, 2004 11:00 pm **Posts:** 248

I know that, but I wanted to make the set-theory joke anyway.

JamesM · **Posted:** Tue Aug 28, 2007 3:47 am

I don't get it. Obviously my set/group/ring/field/vector space theory isn't up to scratch

Candy · **Posted:** Tue Aug 28, 2007 10:33 am

As far as I know, Aleph-Null is a number while C is a set (set of complex numbers).

OSDev.org

Syscall types

Who is online