OSDev.org

The Place to Start for Operating System Developers
It is currently Mon Mar 18, 2024 8:10 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Most optimized function to copy/store every 4th byte (SSE2)
PostPosted: Thu Dec 16, 2021 3:49 pm 
Offline

Joined: Tue Nov 09, 2021 11:40 am
Posts: 18
Essentially, I'm looking for the most optimized way (within SSE2) to copy and re-store every fourth byte of a data stream.

If the data stream was:
ABCDABCDABCDABCD

Then the new four outputs would be:
AAAA
BBBB
CCCC
DDDD

Likewise, if I had four data streams:
EEEE
FFFF
GGGG
HHHH

They could output as:
EFGHEFGHEFGHEFGH


Top
 Profile  
 
 Post subject: Re: Most optimized function to copy/store every 4th byte (SS
PostPosted: Thu Dec 16, 2021 8:25 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5069
Why do you want to do this?

If it has anything to do with your other question about image processing, you shouldn't separate the color channels at all - process them in parallel instead.


Top
 Profile  
 
 Post subject: Re: Most optimized function to copy/store every 4th byte (SS
PostPosted: Fri Dec 17, 2021 12:32 am 
Offline

Joined: Tue Nov 09, 2021 11:40 am
Posts: 18
Some of the algorithms I've found (like the Gaussian blur one) work on single color channels; I could modify them to work with offsets for each channel, but I can also see this being useful in the future for other algorithms.


Top
 Profile  
 
 Post subject: Re: Most optimized function to copy/store every 4th byte (SS
PostPosted: Fri Dec 17, 2021 7:05 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
The most optimized way to copy is to avoid the copy, so... yes, context matters. 8)

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: Most optimized function to copy/store every 4th byte (SS
PostPosted: Fri Dec 17, 2021 10:50 am 
Offline
Member
Member

Joined: Sat Nov 21, 2009 5:11 pm
Posts: 852
This is the best I could come up with.

Extracting:

Code:
; Initial conditions: ecx = pixel count, esi = source, edx = channel 1, ebx = channel 2,
; edi = channel 3, ebp = channel 4
shr ecx, 2
sub ebx, edx
sub edi, edx
sub ebp, edx
sub edx, 4
mov eax, 0ffh
movd mm4, eax
punpckldq mm4, mm4
punpckldq xmm4, xmm4
extractloop:
add edx, 4
movdqa xmm0, [esi]
movdqa xmm1, xmm0
movdqa xmm2, xmm0
movdqa xmm3, xmm0
psrld xmm1, 8
psrld xmm2, 16
psrld xmm3, 24
pand xmm0, xmm4
pand xmm1, xmm4
pand xmm2, xmm4
pand xmm3, xmm4
packssdw xmm0, xmm0
packssdw xmm1, xmm1
packssdw xmm2, xmm2
packssdw xmm3, xmm3
packuswb xmm0, xmm0
packuswb xmm1, xmm1
packuswb xmm2, xmm2
packuswb xmm3, xmm3
add esi, 16
dec ecx
movd [edx], mm0
movd [edx+ebx], mm1
movd [edx+edi], mm2
movd [edx+ebp], mm3
jnz extractloop


Merging:

Code:
; Initial conditions: ecx = pixel count, esi = destination, edx = channel 1, ebx = channel 2,
; edi = channel 3, ebp = channel 4
shr ecx, 2
sub esi, 16
sub ebx, edx
sub edi, edx
sub ebp, edx
mergeloop:
add esi, 16
movd mm0, [edx]
punpcklbw mm0, [edx+ebx]
movd mm1, [edx+edi]
punpcklbw mm1, [edx+ebp]
punpcklwd xmm0, xmm1
movdqa [esi], xmm0
add edx, 4
dec ecx
jnz mergeloop


Top
 Profile  
 
 Post subject: Re: Most optimized function to copy/store every 4th byte (SS
PostPosted: Fri Dec 17, 2021 12:35 pm 
Offline

Joined: Tue Nov 09, 2021 11:40 am
Posts: 18
Gigasoft wrote:
This is the best I could come up with.


Thank you so much!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group