Solar wrote:
Smelling a X-Y problem here... what are you wanting a 80-bit variable for?
Are you, by any chance, trying to store a x86 extended precision floating point?
I want to bulk-move 80 bytes of binary data with the FPU (from 8 10-byte registers).
I suspect it would be (and is) a great addition to CPU-intensive applications. Even old CPUs with FPU could move 2 bytes more per operation (and eight 10-byte registers) than with 64-bit instructions.
I already tested it and it's possible (all bits are preserved, or probably unknown ciphers could not be encoded so it can be used at the very least to move raw data or strings).
But you have to disable truncation/rounding, maybe some FPU interrupts and enable full precision.
It needs no special instructions and is even better than a 64-bit data move loop:
Code:
;It can make 0xFFFFFFFF iterations in
;around 1:09.05 minutes in a dual core
;at 3.4GHz (less than 1:12 minutes).
;;
;It could be extended with all GPRs
;to move through the destination pointer
;every 10 bytes (70 bytes for EAX,
;EBX, ECX, EDX, ESI, EDI, EBP and maybe
;ESP with which we address for the
;80-byte FPU stack)
;instead of subtracting a single one, or a
;2-pointer struct for the initial source and
;destination values for each general-purpose
;register.
;
;Param 1 -- Source register value
;Param 2 -- Destination register value
;;
%macro MICRO_LONG__move_80_bytes_with_x87_FPU 2
;Load the whole FPU stack:
;;
add %1,10*7 ;Move to the last TWORD of the 8
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
sub %1,10
fld tword[%1]
;Copy 10 bytes at a time:
;;
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
add %2,10
fstp tword[%2]
%endmacro
;;INIT: Initialize the FPU for this program
;;INIT: Initialize the FPU for this program
;;INIT: Initialize the FPU for this program
;;INIT: Initialize the FPU for this program
;Initialize the FPU to its default state and configuration
;after checking for pending unmasked floating-point exceptions:
;;
finit
;Wait for any pending FPU operations
;or exceptions and clear any
;pending exceptions:
;;
fclex
;Load new x87 Control Word
;into FPU control register:
;;
fldcw [x87_FPU_New_Control_Word]
;;END: Initialize the FPU for this program
;;END: Initialize the FPU for this program
;;END: Initialize the FPU for this program
;;END: Initialize the FPU for this program
align 4,db 0
x87_FPU_New_Control_Word equ $+ImageBase-__data_RVA_Localize__
dw 0001111110111111b
; -__--_ -_-_-_
; || || ||||||
; || || ||||||
; || || ||||||
; || || ||||||
; || || ||||| ---- 0 - IM - Invalid Operation Interrupt Mask (exception):
; || || ||||| 0: Generate INT/IRQ (disable handling at the FPU).
; || || ||||| 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || || |||||
; || || |||| ----- 1 - DM - Denormalized Interrupt Mask (exception):
; || || |||| 0: Generate INT/IRQ (disable handling at the FPU).
; || || |||| 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || || ||||
; || || ||| ------ 2 - ZM - Zero Divide Interrupt Mask (exception):
; || || ||| 0: Generate INT/IRQ (disable handling at the FPU).
; || || ||| 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || || |||
; || || || ------- 3 - OM - Overflow Interrupt Mask (exception):
; || || || 0: Generate INT/IRQ (disable handling at the FPU).
; || || || 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || || ||
; || || | -------- 4 - UM - Underflow Interrupt Mask (exception):
; || || | 0: Generate INT/IRQ (disable handling at the FPU).
; || || | 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || || |
; || || --------- 5 - PM - Precision Interrupt Mask (exception):
; || || 0: Generate INT/IRQ (disable handling at the FPU).
; || || 1: Do not generate INT/IRQ (selected at initialization, enable handling at the FPU).
; || ||
; || | ---------- 7 - IEM - Interrupt Enable Mask (global for INT bits 0-5):
; || | 0: Enable interrupts.
; || | 1: Disable interrupts (selected at initialization).
; || |
; || ---------- 8-9 - PC - Precision Control:
; || 00b: 24 bits (REAL4).
; || 01b: Unused.
; || 10b: 53 bits (REAL8).
; || 11b: 64 bits (REAL10, selected at initialization).
; ||
; | ----------- 10-11 - RC - Rounding Control:
; | 00b: Round to nearest, or to even if equidistant (selected at initialization).
; | 01b: Round down (towards -Infinity).
; | 10b: Round up (towards +Infinity).
; | 11b: Truncate (towards 0).
; |
; --------------- 12 - IC - Infinity Control (more modern CPUs always use -Infinity and +Infinity):
; 0: Use unsigned Infinity (selected at initialization).
; 1: Respect -Infinity and +Infinity.
;
;
;
;
;
;
Code:
;INIT: Main benchmark
;INIT: Main benchmark
;INIT: Main benchmark
;INIT: Main benchmark
mov widecx,0
mov widesi,source_10_byte_buffer_or_String_10_byte_end
move widedi,dest_buffer
align wideword_sz
.tttt:
MICRO_LONG__move_80_bytes_with_x87_FPU widesi,widedi
dec widecx
jnz .tttt
;END: Main benchmark
;END: Main benchmark
;END: Main benchmark
;END: Main benchmark