OSDev.org
https://forum.osdev.org/

What's the best assembly sequences for vm64z indexing?
https://forum.osdev.org/viewtopic.php?f=13&t=36668
Page 1 of 1

Author:  blackoil [ Fri Apr 10, 2020 9:11 am ]
Post subject:  What's the best assembly sequences for vm64z indexing?

e.g.

e0 = 0;
e1 = 8;
e2 = 16;
...

Author:  nullplan [ Fri Apr 10, 2020 9:45 am ]
Post subject:  Re: What's the best assembly sequences for vm64z indexing?

You are going to have to write a little bit more than that. What is it you wish to do? If I have to google it, I can't answer your question.

Author:  blackoil [ Fri Apr 10, 2020 10:03 am ]
Post subject:  Re: What's the best assembly sequences for vm64z indexing?

to form [ gpr_base + zmm0 + displacement ], vm64z.

It's a bit slow to use instruction mov zmm0, [ vm64z_index_from_memory ]

Author:  bzt [ Fri Apr 10, 2020 11:29 am ]
Post subject:  Re: What's the best assembly sequences for vm64z indexing?

blackoil wrote:
to form [ gpr_base + zmm0 + displacement ], vm64z.

It's a bit slow to use instruction mov zmm0, [ vm64z_index_from_memory ]
I lost you there. If zmm0 is supposed to be a floating point / SIMD register, then you can't use it for indexing and you can't use "mov". You have to use special instructions like "movaps" with those registers. Otherwise you can speed up the read by using only aligned values and prefetch.

Btw, with indexed addressing you can encode 3 bit shifts and a base in a single mov instruction (like [rbx + rax*8]), and reading memory with it into a gpr is not slow at all. Read about addressing modes in Intel spec.

Cheers,
bzt

Author:  blackoil [ Fri Apr 10, 2020 8:17 pm ]
Post subject:  Re: What's the best assembly sequences for vm64z indexing?

I used pseudo one.

vmovdqa64 zmm0, [index64] ; vindex instruction from armv8 can do this without memory read
vgatherqpd zmm1, [ rbx + zmm0 ] ; the zmm0 contains offsets for each element of zmm1.

index64:
dq 0
dq 16
dq 32
dq 48
dq 64
dq 80
dq 96
dq 112

Author:  Octocontrabass [ Mon Apr 13, 2020 10:03 am ]
Post subject:  Re: What's the best assembly sequences for vm64z indexing?

I don't think x86 has any way to do that without an extra memory access to load the indices.

Why is there an 8-byte gap between each of the values you want to load?

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/