flat assembler
Message board for the users of flat assembler.

Index > Main > SSE2 add 32 bits parts of register xmm1

Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 826
Roman
I have 32 bits Color buffer.
I do movups xmm1,[buffer]
Now in registr xmm1 we have 4 Color each 32 bits

How i can add all 4 Colors in registr xmm1 together ?
Does for this only one SSE command ? Now i use pshufd xmm2,xmm1,1 and pshufd xmm3,xmm1,2 and pshufd xmm4,xmm1,3 then apply xmm1,xmm2,xmm3,xmm4 but its very slow.
Post 14 Aug 2013, 18:07
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
Roman,

Did you mean something like horisontal add? Use phaddd xmm1, xmm0 twice while xmm0 being zero, and you're all set (it's SSSE3 though; please state the constraints clearly; your reference to pshufd implies SSE2 at least).

Should those colors be added component-wise? Or they're colorful shades of grayscale? How about saturation?

[EDIT] Oh, I'm sorry, missed SSE2 referred in subject. [/EDIT]
Post 14 Aug 2013, 22:09
View user's profile Send private message Reply with quote
asmdev



Joined: 21 Dec 2006
Posts: 18
asmdev
Code:
        movdqa  xmm0, [mem]            ; 3 2 1 0
        pshufd  xmm1, xmm0, 1110b       ; _ _ 3 2
        paddb   xmm1, xmm0              ; change to "paddd" if adding dwords
        pshufd  xmm0, xmm1, 1
        paddb   xmm0, xmm1              ; change to "paddd" if adding dwords
    

Code:
        movdqa  xmm0, [mem]            ; 3 2 1 0
        movdqa  xmm2, [mem+16]
        pshufd  xmm1, xmm0, 1110b       ; _ _ 3 2
        pshufd  xmm3, xmm2, 1110b
        paddb   xmm1, xmm0              ; change to "paddd" if adding dwords
        paddb   xmm3, xmm2
        pshufd  xmm0, xmm1, 1
        pshufd  xmm2, xmm3, 1
        paddb   xmm0, xmm1              ; change to "paddd" if adding dwords
        paddb   xmm2, xmm3
    
I don't recommend unrolling any futher.

You should consider 4 regular "add" instructions in a row if you are adding dwords and EITHER memory is cached OR you are using movUps instead of movdqA.
Post 14 Aug 2013, 23:17
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 826
Roman
baldr
Да именно сложить четири части регистра XMM1 между собой. Вроде phaddd xmm1, xmm1 то что надо. Спасибо.
Post 15 Aug 2013, 13:32
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.