OSDev.org https://forum.osdev.org/ |
|
Calling C functions from MASM (x64) https://forum.osdev.org/viewtopic.php?f=1&t=33501 |
Page 1 of 1 |
Author: | PhantomR [ Wed Feb 13, 2019 6:22 am ] |
Post subject: | Calling C functions from MASM (x64) |
I'm really confused as to how calling C functions from MASM works (Visual Studio) on x64... I have a function I've written in C called 'printf'. If I do the following in MASM, my RSP is changed.. Code: move rcx, format_str move rdx, argument call printf I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers). Well then, I assumed this meant I had to do this: Code: move rcx, format_str move rdx, argument sub rsp, 32 call printf However, this also modifies my RSP. So, I thought I maybe have to also clean up the stack after the call and this worked: Code: move rcx, format_str move rdx, argument sub rsp, 32 call printf add rsp, 32 My question would be: is this last code snippet the RIGHT way of calling a C function from assembly in x64?? EDIT: Also, if this is indeed the right way, how would it go for functions with more than 4 parameters? Would I have to allocate the shadow space before pushing the params or after? |
Author: | fpissarra [ Thu Feb 14, 2019 7:12 am ] |
Post subject: | Re: Calling C functions from MASM (x64) |
PhantomR wrote: I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers). I believe you are refeering to SEH (Structured Exception Handling). This type of alignment is done, usually, at the beginning of your code (main()), not at each libc/api function call. Example of a simple program compiled with mingw64's gcc: Code: // test.c // // Compile with // x86_64-w64-ming32-gcc -O2 -S -masm=intel test.c #include <stdio.h> void main( void ) { int i; for (i = 0; i < 10; i++) printf( "%d\n", i ); } And we'll get something as this: Code: main:
push rsi ; with x86-64 ms-abi we need push rbx ; to preserve this regs! sub rsp, 40 ; why 40?! lea rsi, .LC0[rip] ; printf's fmt xor ebx, ebx ; i=0 call __main .L2: mov edx, ebx mov rcx, rsi add ebx, 1 call printf ; Note: No stack alignment! cmp ebx, 10 jne .L2 add rsp, 40 pop rbx pop rsi ret |
Author: | alexfru [ Thu Feb 14, 2019 8:21 am ] |
Post subject: | Re: Calling C functions from MASM (x64) |
fpissarra wrote: PhantomR wrote: I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers). I believe you are refeering to SEH (Structured Exception Handling). It may have tangential relation to SEH, but the primary purpose is different. It's cheaper to do things with registers than with memory (shorter instructions, less cache traffic, etc). So, when you have plenty of registers, it's reasonable to dedicate some of them to parameter passing. Hence you have 4 or 6 (depending on the particular x86-64 ABI) regs used for this. All is fine until you need to call something like "int printf(const char* fmt, ...)". So, fmt goes into one register, 3 to 5 more optional parameters are in other registers and the rest is on the stack. Now, think how you'd implement the va_start() and va_arg() macros to get all those optional values. Those macros must definitely be able to extract the values from the stack when there are many of them. The values contained in the registers also need to be extractable. And the simplest is just to spill those 3-5 regs into this shadow/home space and use the exact same code to access things on the stack. |
Author: | fpissarra [ Thu Feb 14, 2019 11:33 am ] |
Post subject: | Re: Calling C functions from MASM (x64) |
Quote: It may have tangential relation to SEH, but the primary purpose is different. You are probably right! But I also know that GCC and MSVC creates a VERY strange code!!! Take a look at this example using variadics: Code: // test.c // Compile with: // x86_64-w64-mingw32-gcc -O2 -S -masm=intel test.c // #include <stdio.h> #include <stdarg.h> __attribute__ ( ( noinline ) ) int f ( int x ) { return 2 * x; } __attribute__ ( ( noinline ) ) int g ( int x, ... ) { int y; va_list ap; va_start ( ap, x ); y = va_arg ( ap, int ); va_end ( ap ); return x * y; } void dosomething ( int x, int y ) { printf ( "%d\n", f ( x ) ); printf ( "%d\n", g ( x, y ) ); } Which generate strange code like this: Code: ; ECX = x f: ; OK! very straightforward! lea eax, [rcx+rcx] ret ; ECX = x, EDX = y (NOT on stack, as you will see later!) g: ; Notice: at this point the structure of stack should be: ; <caller stkframe> ; RSP-> retaddr sub rsp, 24 ; reserve 3 QWORDS on stack (WHY?) ; The stack now is (in qwords): ; <caller stkframe> ; retaddr ; ? ; ? ; RSP-> ? mov eax, edx ; EAX=y mov [rsp+40], rdx ; saves y on stack, ; as if the arguments were stacked ; before the call (they aren't!). ; OBS: The hypothetical stacked x is [rsp+32] and ; never pushed! lea rdx, [rsp+40] ; RDX now POINTS TO hypothetical stacked y. imul eax, ecx ; EAX=y*x ; RSP+48 and RSP+56 points to hypothetical 3rd and 4th arguments ; (There are none!). mov [rsp+48], r8 ; saves after the last argument? mov [rsp+56], r9 ; again? even further? mov [rsp+8], rdx ; saves ptr to y on stack on a local reserved space (WHY?). ; why 24 bytes were allocated if not used? ; Notice that there four qwords, after retaddr, overwrites the caller ; stack frame... ; R9 ; R8 ; y ; ? ; retaddr ; ? ; RDX ; RSP-> ? add rsp, 24 ; reclaim reserved space. ret dosomething: push rsi push rbx sub rsp, 40 ; WHY? mov ebx, ecx mov esi, edx call f lea rcx, .LC0[rip] mov edx, eax call printf ; See? A simple msabi calling convetion call, no stack used for the arguments. mov edx, esi mov ecx, ebx call g lea rcx, .LC0[rip] mov edx, eax add rsp, 40 ; WHY? pop rbx pop rsi jmp printf Notice that g() isn't using the stack (x and y are taken directly from ECX and EDX), writes y, R8 and R9 as if they were pushed on stack before the call and store the POINTER of y to a local var (reserved on stack) to discard right after. It NEVER writes x and never uses R8 and R9... The dosomething() function is less strange, but still, reserves 40 bytes (5 QWORDS)... WHY? We're not using local vars or SEH here! This is ok if you consider the 4 qwords saved by g(). So, dosomething() is reserving this local space so g() don't overwrite dosomething()'s stack frame. Put why 40? Why not 32? For me this is a very, very strange code. |
Author: | nullplan [ Thu Feb 14, 2019 2:23 pm ] |
Post subject: | Re: Calling C functions from MASM (x64) |
I think what's happening here is that va_list isn't actually all that magic. It is some sort of structure that contains all the registers containing parameters, and a pointer to the remaining stack arguments. So, since you declared a va_list as local variable, the compiler reserves space for it, and initializes it. In the Win64 ABI, there are 4 argument registers, and in your code, at most 3 of those can be filled, so they get spilled. Somehow the optimizer doesn't see that those stores are dead. I presume that's because va_start() is a little bit too magical for the optimizer, or so. So the va_list is initialized entirely, even though only a single member of it is ever used. But the instruction scheduler somehow manages to push the multiplication instruction further up. Because why not? Mind you, va_lists are usually meant for a variable amount of arguments. Here's a bit more complete of an example (compiled on Linux. Sorry, no Windows compiler available): Code: #include <stddef.h> #include <stdarg.h> size_t sum(size_t n, ...) { va_list ap; size_t r = 0; va_start(ap, n); while (n--) r += va_arg(ap, size_t); va_end(ap); return r; } Compiled with -Os: Code: .globl sum .type sum, @function sum: .LFB0: .cfi_startproc leaq 8(%rsp), %rax movq %rsi, -40(%rsp) movq %r9, -8(%rsp) movl $8, -72(%rsp) movq %rax, -64(%rsp) leaq -48(%rsp), %rax movq %rdx, -32(%rsp) leaq 8(%rsp), %rdx movq %rcx, -24(%rsp) movl $8, %ecx movq %r8, -16(%rsp) movq %rax, %r8 movq %rax, -56(%rsp) xorl %eax, %eax .L2: decq %rdi cmpq $-1, %rdi je .L7 leaq 8(%rdx), %rsi cmpl $47, %ecx ja .L4 movl %ecx, %r9d movq %rdx, %rsi addl $8, %ecx leaq (%r8,%r9), %rdx .L4: addq (%rdx), %rax movq %rsi, %rdx jmp .L2 .L7: ret .cfi_endproc .LFE0: .size sum, .-sum So you see, it spills all the argument registers in order. But only God knows why it does so in a random order. |
Page 1 of 1 | All times are UTC - 6 hours |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |