OSDev.org
https://forum.osdev.org/

Proper way to write inline assembly
https://forum.osdev.org/viewtopic.php?f=13&t=36822
Page 1 of 1

Author:  sunnysideup [ Mon Jun 01, 2020 7:58 am ]
Post subject:  Proper way to write inline assembly

Checkout this code segment:

Code:
void put_zmm_value()
{
  struct zmm_value buffer;

  asm volatile(
      "vmovdqa64   %%zmm0, (%[buffer]) \n": :[buffer]"r"(buffer):"%zmm0");

  for(int i=0;i<8;i++)
           printf("%lx ",buffer.word[i]);
}


I'm wondering whether my inline assembly code is correct, and will work for all optimizations.

Author:  Octocontrabass [ Mon Jun 01, 2020 8:30 am ]
Post subject:  Re: Proper way to write inline assembly

Probably not, since it doesn't appear to do anything useful. What is it supposed to do?

Author:  sunnysideup [ Mon Jun 01, 2020 8:40 am ]
Post subject:  Re: Proper way to write inline assembly

Correction:
Code:
void put_zmm_value()
{
  struct zmm_value buffer;

  asm volatile(
      "vmovdqa64   %%zmm0, %[buffer] \n": :[buffer]"m"(buffer):"%zmm0");

  for(int i=0;i<8;i++)
           printf("%lx ",buffer.word[i]);
}



The previous snippet doesn't even compile.
The purpose of this code is to print out the value of the zmm0 register. Nothing else!

Note: buffer is cache line aligned... No issues there... I won't get segfault

Author:  Octocontrabass [ Mon Jun 01, 2020 9:04 am ]
Post subject:  Re: Proper way to write inline assembly

sunnysideup wrote:
The purpose of this code is to print out the value of the zmm0 register.

Why? According to the ABI, there is nothing useful in zmm0 at this point in time.

Author:  sunnysideup [ Mon Jun 01, 2020 9:31 am ]
Post subject:  Re: Proper way to write inline assembly

I'm want to get familiar with zmm0 as is important for implementing fast memcpy and so on. I also manually 'fill' zmm0 using:

Code:
struct zmm_value
{
  uint64_t word[8];
} __attribute__((packed)) __attribute__ ((aligned(64)));

void set_zmm_value(struct zmm_value* val_address)
{
  asm volatile
    ("vmovntdqa (%[val_address]),%%zmm0\n":: [val_address]"r"(val_address):"%zmm0");
}

Author:  Octocontrabass [ Mon Jun 01, 2020 10:32 am ]
Post subject:  Re: Proper way to write inline assembly

That won't work. The compiler is free to do whatever it wants with zmm0 outside your asm block.

If you want to move data through zmm registers, you must either load and store within the same asm block, or you must tell the compiler to do the load/store on your behalf by passing around __m512/__m512d/__m512i values.

Author:  sunnysideup [ Mon Jun 01, 2020 10:57 am ]
Post subject:  Re: Proper way to write inline assembly

Makes sense

Author:  sunnysideup [ Sun Jun 28, 2020 6:21 am ]
Post subject:  Re: Proper way to write inline assembly

Alright, moving on to something new here: I've often seen this:
Code:
  asm volatile("" ::: "memory"); 

in C code that is compiled using gcc. What is its significance and what is the concept here? Is this extended asm? I can't wrap my head around inline assembly in gcc. Any good resources?

Also, what's the difference between __asm and just asm?

Author:  Korona [ Sun Jun 28, 2020 9:41 am ]
Post subject:  Re: Proper way to write inline assembly

That's a memory barrier for the compiler.¹ Yes, it is extended asm. The memory clobber forces all loads/stores to globally visible variables that occur before/after the barrier in program order to happen before/after the barrier.

__asm can be used in contexts where asm is not available (e.g., because somebody chose to #define asm) but other than that, there is no difference.

¹ But not for the CPU! On some architectures (e.g., ARM), that makes a vast difference.

Author:  sunnysideup [ Sun Jun 28, 2020 11:35 am ]
Post subject:  Re: Proper way to write inline assembly

Korona wrote:
That's a memory barrier for the compiler.¹ Yes, it is extended asm. The memory clobber forces all loads/stores to globally visible variables that occur before/after the barrier in program order to happen before/after the barrier.


Alright, I understand that why it's used for now - as a way for the compiler to ensure that no compile time reordering occurs across this 'barrier'

However, why does it work this way? Or as a mathematician would say - can you derive it from first principles? :lol:

I've also read this piece of code:
Code:
static void force_read(uint8_t *p) {
    asm volatile("" : : "r"(*p) : "memory");
}


It's supposed to force a read from memory location p. But how does it work, i.e. why does gcc make it work that way?

Author:  nullplan [ Sun Jun 28, 2020 2:06 pm ]
Post subject:  Re: Proper way to write inline assembly

sunnysideup wrote:
However, why does it work this way?
It's an assembler statement with a memory clobber. The fact that it's empty is incidental to this. The memory clobber tells GCC that this statement will change "memory", but not which memory and in what way it will be changed. Therefore, GCC cannot assume anything about the state of memory, and must write all changes to memory before the statement, and read all things that are in memory again after the statement.

sunnysideup wrote:
I've also read this piece of code:
Code:
static void force_read(uint8_t *p) {
    asm volatile("" : : "r"(*p) : "memory");
}


It's supposed to force a read from memory location p. But how does it work, i.e. why does gcc make it work that way?

This time it is a memory clobber and an input constraint. So in addition to the above, this statement requires that the value of "*p" be put iinto a register beforehand. The statement is empty and doesn't do anything with the value, but GCC doesn't know that, and is therefore forced to emit a read of this memory location. And since memory is clobbered, even multiple reads of this location have to be read, since they might have changed now.

Author:  Octocontrabass [ Sun Jun 28, 2020 2:47 pm ]
Post subject:  Re: Proper way to write inline assembly

sunnysideup wrote:
However, why does it work this way? Or as a mathematician would say - can you derive it from first principles? :lol:

Because the GCC developers say so. :lol:
Quote:
Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

Here's the part of the manual that explains it.

nullplan wrote:
And since memory is clobbered, even multiple reads of this location have to be read, since they might have changed now.

But this holds true only if you use functions like this one with memory barriers to access that location. If you also access it without a memory barrier and the function gets inlined, the read may be combined with prior accesses. There is also no guarantee that the read will occur after all prior statements if the function is inlined.

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/