flat assembler
Message board for the users of flat assembler.

Index > Windows > SSE question(s)

Author
Thread Post new topic Reply to topic
SeryZone



Joined: 20 Dec 2013
Posts: 38
Location: Ukraine, Kryviy Rih
SeryZone 26 May 2014, 13:26
Hello! I have a question.

1) Question One -------------------------------------------------------->
How to REPLACE in SSE register all numbers (4*32-bit floats) from 1 to 0???

So, explaining:
xmm0 = (0.454, 0.120, 1.000, 1.000).
How to replace 1 to 0???
xmm0 = (0.454, 0.120, 0.000, 0.000)
Post 26 May 2014, 13:26
View user's profile Send private message Reply with quote
ejamesr



Joined: 04 Feb 2011
Posts: 52
Location: Provo, Utah, USA
ejamesr 26 May 2014, 20:01
I'm not sure exactly how you determine exactly which values to change. But you might consider the very fast pshufb command, which can easily convert any byte of an xmm register to 0. In the floating-point format, the value 0.0 is equal to all zeros in the number. So you could use the pshufb command to very quickly clear the low eight bytes of the xmm register.

For example, assume the following clears the low eight bytes of xmm0:
Code:
align 16
Mask db -1,-1,-1,-1,-1,-1,-1,-1,8,9,10,11,12,13,14,15
... other data/code

... When you want to clear the register using Mask, do it like this:
pshufb  xmm0, dqword [Mask]    

The variable Mask has a value for each byte, corresponding to each byte of the xmm register. The low four bits say to move the byte from that byte offset of the register, into that position. If the high bit is set, no byte will be copied, but that destination value will be cleared to 0.

And of course, you would want to put your Mask
- ejamesr
Post 26 May 2014, 20:01
View user's profile Send private message Send e-mail Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 272
Location: California
BAiC 27 May 2014, 04:11
1) generate 4 floats that are all equal to 1.0 (store in an xmm register)

2) use the CMPPS instruction (described in the manuals) to compare the value with (1). the issue with source/destination registers will make this code sequence messy. you'll need to preload a register since the first source register is also the destination. the destination will be a vector mask.

3) 'not' the mask (you might be able to integrate the not into 'pandn')

4) 'and' the mask with the original vector.

- Stefan

_________________
byte me.
Post 27 May 2014, 04:11
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.