blackoil wrote:
to form [ gpr_base + zmm0 + displacement ], vm64z.
It's a bit slow to use instruction mov zmm0, [ vm64z_index_from_memory ]
I lost you there. If zmm0 is supposed to be a floating point / SIMD register, then you can't use it for indexing and you can't use "mov". You have to use special instructions like "movaps" with those registers. Otherwise you can speed up the read by using only aligned values and prefetch.
Btw, with indexed addressing you can encode 3 bit shifts and a base in a single mov instruction (like [rbx + rax*8]), and reading memory with it into a gpr is not slow at all. Read about addressing modes in Intel spec.
Cheers,
bzt