OSDev.org

The Place to Start for Operating System Developers
It is currently Wed Nov 22, 2017 5:07 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: sse2 blit and conversions
PostPosted: Thu Aug 25, 2016 9:46 am 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 536
i am trying to implement SSE2 blitting and encountered a problem.

If I take a 32-bit RGBA value from each image, and I want to perform alpha blending and save the result in the destination image, the first thing I would have to do, to use SSE2, would be to convert this vector of 8-bit integers into four packed single-precision floating-point values, in the range 0-1.

I know i can scale the values to 0-1 by just multiplying by (1/255, 1/255, 1/255, 1/255). But how can I convert the 8-bit integers into floating-points efficiently?

Or am I looking at this the wrong way entirely?

Thank you for help

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://github.com/madd-games/glidix


Top
 Profile  
 
 Post subject: Re: sse2 blit and conversions
PostPosted: Thu Aug 25, 2016 10:25 am 
Offline
Member
Member

Joined: Mon Apr 09, 2007 12:10 pm
Posts: 771
Location: London, UK
Its not easy using only sse2. If you can use sse4.1 then you can use pmovzxbd to do the load then cvtdq2ps to do the conversion. Otherwise for sse2 its a bit clunky. (untested) code:
Code:
zero:dq 0
load_func:
movq xmm1, [zero]
movq xmm0, [data]
punpcklbw xmm0, xmm1
punpcklwd xmm0, xmm1
cvtdq2ps xmm0, xmm0
Note that you should probably keep the graphics data in floating point format (within the range 0.0-1.0) for the entirety of the data path and finally only convert back to integers at the end.

Potentially, if you're blitting images etc, then when you load them to a device context or whatever you would do the conversion to FP there, so it only needs doing once.

Regards,
John.

_________________
Tysos | rpi-boot | EFI tools


Top
 Profile  
 
 Post subject: Re: sse2 blit and conversions
PostPosted: Thu Aug 25, 2016 12:30 pm 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 536
is there also an equally straight-forward way to convert back to bytes?

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://github.com/madd-games/glidix


Top
 Profile  
 
 Post subject: Re: sse2 blit and conversions
PostPosted: Thu Aug 25, 2016 12:59 pm 
Offline
Member
Member

Joined: Mon Apr 09, 2007 12:10 pm
Posts: 771
Location: London, UK
Again, if not constrained by sse2 its easy.

If you must use sse2, then something like the following (untested again, essentially the reverse of the above):
Code:
zero: dq 0
toint_func:
movq xmm1, [zero]
movq xmm0, packed_single_value
cvtps2dq xmm0, xmm0
packusdw xmm0, xmm1
packuswb xmm0, xmm1
You should then have the 32-bit packed value in the low 32 bits of xmm0. How you write that to display memory is up to you: most direct stores from sse registers will store at least 64 bits, so will overwrite the top 32 bits. If you are writing direct to the framebuffer from position 0 upwards then thats fine to do (repeated movqs). Otherwise, if you know the next pixel too, then put it into xmm1 above for the first pack instruction (and then zero for the second) and you'll have two 32-bit pixels in the low half of xmm0 which you can movq to screen even quicker. You can even optimise further and get 4 packed 32 bit pixels into xmm0 if you like (packusdw xmm0, xmm1; packusdw xmm2, xmm3; packusbw xmm0, xmm2). Otherwise if you want to be conservative, reserve some stack space, store 64 bits there, pop into rax and then do a 32-bit GPR move.

Regards,
John.

_________________
Tysos | rpi-boot | EFI tools


Top
 Profile  
 
 Post subject: Re: sse2 blit and conversions
PostPosted: Thu Aug 25, 2016 1:04 pm 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 536
Considering SSE 4 is supported (AFAIK) on all CPU since 2006, I think I can use that too.

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://github.com/madd-games/glidix


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group