flat assembler
Message board for the users of flat assembler.

Index > Main > What is the fastest way to check if XMM contains only zeros?

Author
Thread Post new topic Reply to topic
peter_k



Joined: 27 Dec 2006
Posts: 6
Location: Poland
peter_k 27 Dec 2006, 13:59
I'm coding engine to logical game and i'm using SSE & SSE2 technology for doing bitmasks.
I need fast checking if 128 bits XMM register contains only zeros. The trivial way to do this is sth like this (below), but i need sth much more efficient. I'm new to this forum and SSE technology Smile

Code:
;in data section
tmp        dd        4 dup(?)

;in code section; checking if xmm0 contains only zeros
movdqa [tmp], xmm0
mov    eax, [tmp+0]
mov    ebx, [tmp+4]
or     eax, [tmp+8]
or     ebx, [tmp+12]
or     eax, ebx
jz     xmm0_contain_only_zeros
    
Post 27 Dec 2006, 13:59
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 27 Dec 2006, 15:56
Here are a couple of alternative ways to do it
Code:
packssdw xmm1,xmm0
packsswb xmm1,xmm1
movd eax,xmm1
test eax,eax
jz xmm0_contain_only_zeros

pxor xmm1,xmm1
pcmpeqb xmm1,xmm0
pmovmskb eax,xmm1
inc eax
jz xmm0_contain_only_zeros    
Post 27 Dec 2006, 15:56
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 27 Dec 2006, 16:46
Actually, Goplat is basically right, but the second one way is wrong cuz if XMM reg contains zeros then after
Code:
pmovmskb eax,xmm1
    

eax = 0FFFFh
and we should eather
Code:
cmp eax, 0FFFFh
jz xmm0_contain_only_zeros
    

or
Code:
inc ax
jz xmm0_contain_only_zeros
    

_________________
Any offers?
Post 27 Dec 2006, 16:46
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 28 Dec 2006, 18:07
my try
Code:
        pxor    xmm1,xmm1
        packssdw        xmm0,xmm0
        comisd  xmm0,xmm1;0000000000000000h = +0.0
        je    xmm0_contain_only_zeros
    

maybe the mixing of floating point and integer numbers is not such a good idea Question

EDIT: the COMISD definately generates a reformatting delay, so this example is the shortest, but maybe not the fastest one

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||


Last edited by MCD on 06 Jan 2007, 21:23; edited 1 time in total
Post 28 Dec 2006, 18:07
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 28 Dec 2006, 18:44
Yep, I was also thinking about COMISD and it should work since it doesn't check the destination for a NaN.
Post 28 Dec 2006, 18:44
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8267
Location: Kraków, Poland
Tomasz Grysztar 29 Dec 2006, 12:40
The PTEST instruction from SSE4 set seems like a promise to simplify this problem in future. Wink
Post 29 Dec 2006, 12:40
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 29 Dec 2006, 19:24
Tomasz Grysztar wrote:
The PTEST instruction from SSE4 set seems like a promise to simplify this problem in future. Wink

You must be joking. my CPU only got MMX and SSE1 Sad

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 29 Dec 2006, 19:24
View user's profile Send private message Reply with quote
peter_k



Joined: 27 Dec 2006
Posts: 6
Location: Poland
peter_k 29 Dec 2006, 21:03
Thanks everybody for reply!

MCD code seems to be the shortest. I'll profile and test it.

My processor is Intel Pentium M Processor 730 so i have SSE2.
Post 29 Dec 2006, 21:03
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.