Dear.
In my own OS, I supported SSE extension. I would like to get the maximum possible throughput of cpu from the SSE instructions and for that I tried to reorder the instruction based on their latency and reciprocal throughput from Agner Fog's excellent article
https://www.agner.org/optimize/instruction_tables.pdf.
I have a minimal code showing my desire and try.
listing 1:
Code:
; CPU: Intel Core i7, MICROARCHITECTURE: NEHALEM
movaps xmm0, [some_mem_0] ;latency = 2
addps xmm0, xmm1 ;latency = 3
movss [m32], xmm0 ;latency = 3
movaps xmm2, [some_mem_1] ;latency = 2
addps xmm2, xmm3 ;latency = 3
movss [m32], xmm2 ;latency = 3
; total latency = 16
listing 2:
Code:
; CPU: Intel Core i7, MICROARCHITECTURE: NEHALEM
movaps xmm0, [some_mem_2] ;latency = 2
movaps xmm2, [some_mem_3] ;latency = 2
addps xmm0, xmm1 ;latency = 3
addps xmm2, xmm3 ;latency = 3
movss [m32], xmm0 ;latency = 3
movss [m32], xmm2 ;latency = 3
; total latency = 10
In the first listing, I have the initial ordering of the instructions. In the second listing, on the other hand, I reordered them in a way to get lower latency counts. For instance, I have the latency of 3 for addps xmm0, xmm1 and 3 for movss [m32], xmm0. Since they are in a dependence chain, the total latency would be counted as 6, but if I put another completely independent instruction such as addps xmm2, xmm3, the final total latency would be lower. This is in order to perform another instruction right during the cpu cycles dedicated for addps xmm0, xmm1.
If I put these small codes in an iterative loop running for 300,000 times, I expect to lower the total cpu cycles latency counts from 4,800,000 to 3,000,000. I did the bench-marking of both codes, but in the end, I came up with the exactly same time of execution.
Did I miscalculated the latency counts or there might be other factors limiting the maximum throughput of the cpu in this case?
I really appreciate if somebody can explain the pushing cpu to its limit by getting the maximum throughput coming from the lower counts of latency.
Best regards.
Iman.