OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 1:09 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Calling C functions from MASM (x64)
PostPosted: Wed Feb 13, 2019 6:22 am 
Offline

Joined: Wed Oct 31, 2018 6:15 am
Posts: 18
I'm really confused as to how calling C functions from MASM works (Visual Studio) on x64...

I have a function I've written in C called 'printf'. If I do the following in MASM, my RSP is changed..
Code:
move rcx, format_str
move rdx, argument
call printf


I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers). Well then, I assumed this meant I had to do this:
Code:
move rcx, format_str
move rdx, argument
sub rsp, 32
call printf


However, this also modifies my RSP. So, I thought I maybe have to also clean up the stack after the call and this worked:
Code:
move rcx, format_str
move rdx, argument
sub rsp, 32
call printf
add rsp, 32


My question would be: is this last code snippet the RIGHT way of calling a C function from assembly in x64??
EDIT: Also, if this is indeed the right way, how would it go for functions with more than 4 parameters? Would I have to allocate the shadow space before pushing the params or after?


Top
 Profile  
 
 Post subject: Re: Calling C functions from MASM (x64)
PostPosted: Thu Feb 14, 2019 7:12 am 
Offline

Joined: Mon Nov 26, 2018 9:14 am
Posts: 8
PhantomR wrote:
I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers).


I believe you are refeering to SEH (Structured Exception Handling). This type of alignment is done, usually, at the beginning of your code (main()), not at each libc/api function call.

Example of a simple program compiled with mingw64's gcc:
Code:
// test.c
//
// Compile with
//  x86_64-w64-ming32-gcc -O2 -S -masm=intel test.c
#include <stdio.h>

void main( void )
{
  int i;

  for (i = 0; i < 10; i++)
    printf( "%d\n", i );
}

And we'll get something as this:
Code:
main:
   push   rsi     ; with x86-64 ms-abi we need
   push   rbx    ; to preserve this regs!

   sub   rsp, 40          ; why 40?!

   lea   rsi, .LC0[rip]  ; printf's fmt
   xor   ebx, ebx        ; i=0
   call   __main
.L2:
   mov   edx, ebx
   mov   rcx, rsi
   add   ebx, 1
   call   printf         ; Note: No stack alignment!
   cmp   ebx, 10
   jne   .L2

   add   rsp, 40

   pop   rbx
   pop   rsi

   ret


Top
 Profile  
 
 Post subject: Re: Calling C functions from MASM (x64)
PostPosted: Thu Feb 14, 2019 8:21 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
fpissarra wrote:
PhantomR wrote:
I read on the Microsoft website that the caller should allocate space for 4 qwords (shadow space for the 4 param registers).


I believe you are refeering to SEH (Structured Exception Handling).


It may have tangential relation to SEH, but the primary purpose is different.

It's cheaper to do things with registers than with memory (shorter instructions, less cache traffic, etc).
So, when you have plenty of registers, it's reasonable to dedicate some of them to parameter passing.
Hence you have 4 or 6 (depending on the particular x86-64 ABI) regs used for this.

All is fine until you need to call something like "int printf(const char* fmt, ...)".
So, fmt goes into one register, 3 to 5 more optional parameters are in other registers and the rest is on the stack.

Now, think how you'd implement the va_start() and va_arg() macros to get all those optional values.
Those macros must definitely be able to extract the values from the stack when there are many of them.
The values contained in the registers also need to be extractable.
And the simplest is just to spill those 3-5 regs into this shadow/home space and use the exact same code to access things on the stack.


Top
 Profile  
 
 Post subject: Re: Calling C functions from MASM (x64)
PostPosted: Thu Feb 14, 2019 11:33 am 
Offline

Joined: Mon Nov 26, 2018 9:14 am
Posts: 8
Quote:
It may have tangential relation to SEH, but the primary purpose is different.

You are probably right! But I also know that GCC and MSVC creates a VERY strange code!!! Take a look at this example using variadics:
Code:
// test.c
// Compile with:
//   x86_64-w64-mingw32-gcc -O2 -S -masm=intel test.c
//
#include <stdio.h>
#include <stdarg.h>

__attribute__ ( ( noinline ) ) int f ( int x ) { return 2 * x; }

__attribute__ ( ( noinline ) ) int g ( int x, ... )
{
  int y;
  va_list ap;

  va_start ( ap, x );
  y = va_arg ( ap, int );
  va_end ( ap );

  return x * y;
}

void dosomething ( int x, int y )
{
  printf ( "%d\n", f ( x ) );
  printf ( "%d\n", g ( x, y ) );
}

Which generate strange code like this:
Code:
; ECX = x
f:
   ; OK! very straightforward!
   lea   eax, [rcx+rcx]
   ret

; ECX = x, EDX = y (NOT on stack, as you will see later!)
g:
   ; Notice: at this point the structure of stack should be:
   ;           <caller stkframe>
   ; RSP-> retaddr

   sub   rsp, 24   ; reserve 3 QWORDS on stack (WHY?)

   ; The stack now is (in qwords):
   ;           <caller stkframe>
   ;            retaddr
   ;            ?
   ;            ?
   ; RSP->  ?

   mov   eax, edx  ; EAX=y

   mov   [rsp+40], rdx  ; saves y on stack,
                                 ; as if the arguments were stacked
                                 ; before the call (they aren't!).
                                 ; OBS: The hypothetical stacked x is [rsp+32] and
                                 ; never pushed!

   lea   rdx, [rsp+40]  ; RDX now POINTS TO hypothetical stacked y.

   imul   eax, ecx    ; EAX=y*x

   ; RSP+48 and RSP+56 points to hypothetical 3rd and 4th arguments
   ; (There are none!).
   mov   [rsp+48], r8 ; saves after the last argument?
   mov   [rsp+56], r9 ; again? even further?

   mov   [rsp+8], rdx ; saves ptr to y on stack on a local reserved space (WHY?).
                              ; why 24 bytes were allocated if not used?

   ; Notice that there four qwords, after retaddr, overwrites the caller
        ; stack frame...
 
   ;            R9
   ;            R8
   ;            y
   ;            ?
   ;            retaddr
   ;            ?
   ;            RDX
   ; RSP-> ?

   add   rsp, 24  ; reclaim reserved space.
   ret

dosomething:
   push   rsi
   push   rbx

   sub   rsp, 40  ; WHY?

   mov   ebx, ecx

   mov   esi, edx
   call   f

   lea   rcx, .LC0[rip]
   mov   edx, eax
   call   printf

   ; See? A simple msabi calling convetion call, no stack used for the arguments.
   mov   edx, esi
   mov   ecx, ebx
   call   g

   lea   rcx, .LC0[rip]

   mov   edx, eax

   add   rsp, 40   ; WHY?

   pop   rbx
   pop   rsi
   jmp   printf

Notice that g() isn't using the stack (x and y are taken directly from ECX and EDX), writes y, R8 and R9 as if they were pushed on stack before the call and store the POINTER of y to a local var (reserved on stack) to discard right after. It NEVER writes x and never uses R8 and R9...

The dosomething() function is less strange, but still, reserves 40 bytes (5 QWORDS)... WHY? We're not using local vars or SEH here! This is ok if you consider the 4 qwords saved by g(). So, dosomething() is reserving this local space so g() don't overwrite dosomething()'s stack frame. Put why 40? Why not 32?

For me this is a very, very strange code.


Top
 Profile  
 
 Post subject: Re: Calling C functions from MASM (x64)
PostPosted: Thu Feb 14, 2019 2:23 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
I think what's happening here is that va_list isn't actually all that magic. It is some sort of structure that contains all the registers containing parameters, and a pointer to the remaining stack arguments. So, since you declared a va_list as local variable, the compiler reserves space for it, and initializes it. In the Win64 ABI, there are 4 argument registers, and in your code, at most 3 of those can be filled, so they get spilled.

Somehow the optimizer doesn't see that those stores are dead. I presume that's because va_start() is a little bit too magical for the optimizer, or so. So the va_list is initialized entirely, even though only a single member of it is ever used. But the instruction scheduler somehow manages to push the multiplication instruction further up. Because why not?

Mind you, va_lists are usually meant for a variable amount of arguments. Here's a bit more complete of an example (compiled on Linux. Sorry, no Windows compiler available):

Code:
#include <stddef.h>
#include <stdarg.h>

size_t sum(size_t n, ...)
{
    va_list ap;
    size_t r = 0;
    va_start(ap, n);
    while (n--)
        r += va_arg(ap, size_t);
    va_end(ap);
    return r;
}

Compiled with -Os:
Code:
   .globl   sum
   .type   sum, @function
sum:
.LFB0:
   .cfi_startproc
   leaq   8(%rsp), %rax
   movq   %rsi, -40(%rsp)
   movq   %r9, -8(%rsp)
   movl   $8, -72(%rsp)
   movq   %rax, -64(%rsp)
   leaq   -48(%rsp), %rax
   movq   %rdx, -32(%rsp)
   leaq   8(%rsp), %rdx
   movq   %rcx, -24(%rsp)
   movl   $8, %ecx
   movq   %r8, -16(%rsp)
   movq   %rax, %r8
   movq   %rax, -56(%rsp)
   xorl   %eax, %eax
.L2:
   decq   %rdi
   cmpq   $-1, %rdi
   je   .L7
   leaq   8(%rdx), %rsi
   cmpl   $47, %ecx
   ja   .L4
   movl   %ecx, %r9d
   movq   %rdx, %rsi
   addl   $8, %ecx
   leaq   (%r8,%r9), %rdx
.L4:
   addq   (%rdx), %rax
   movq   %rsi, %rdx
   jmp   .L2
.L7:
   ret
   .cfi_endproc
.LFE0:
   .size   sum, .-sum


So you see, it spills all the argument registers in order. But only God knows why it does so in a random order.

_________________
Carpe diem!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], FrankRay78, Google [Bot] and 58 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group