flat assembler
Message board for the users of flat assembler.
  
       
      Index
      > Main > What is the fastest way to check if XMM contains only zeros? | 
  
| Author | 
  | 
              
| 
                  
                   Goplat 27 Dec 2006, 15:56 
                  Here are a couple of alternative ways to do it     
                  
                Code: packssdw xmm1,xmm0 packsswb xmm1,xmm1 movd eax,xmm1 test eax,eax jz xmm0_contain_only_zeros pxor xmm1,xmm1 pcmpeqb xmm1,xmm0 pmovmskb eax,xmm1 inc eax jz xmm0_contain_only_zeros  | 
              |||
                  
  | 
              
| 
                  
                   asmfan 27 Dec 2006, 16:46 
                  Actually, Goplat is basically right, but the second one way is wrong cuz if XMM reg contains zeros then after
 
                  Code: 
pmovmskb eax,xmm1
    eax = 0FFFFh and we should eather Code: cmp eax, 0FFFFh jz xmm0_contain_only_zeros or Code: inc ax jz xmm0_contain_only_zeros _________________ Any offers?  | 
              |||
                  
  | 
              
| 
                  
                   MCD 28 Dec 2006, 18:07 
                  my try
 
                  Code: pxor xmm1,xmm1 packssdw xmm0,xmm0 comisd xmm0,xmm1;0000000000000000h = +0.0 je xmm0_contain_only_zeros maybe the mixing of floating point and integer numbers is not such a good idea EDIT: the COMISD definately generates a reformatting delay, so this example is the shortest, but maybe not the fastest one _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || Last edited by MCD on 06 Jan 2007, 21:23; edited 1 time in total  | 
              |||
                  
  | 
              
| 
                  
                   MazeGen 28 Dec 2006, 18:44 
                  Yep, I was also thinking about COMISD and it should work since it doesn't check the destination for a NaN. 
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   Tomasz Grysztar 29 Dec 2006, 12:40 
                  The PTEST instruction from SSE4 set seems like a promise to simplify this problem in future.  
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   MCD 29 Dec 2006, 19:24 
                  Tomasz Grysztar wrote: The PTEST instruction from SSE4 set seems like a promise to simplify this problem in future. You must be joking. my CPU only got MMX and SSE1 _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| ||  | 
              |||
                  
  | 
              
| 
                  
                   peter_k 29 Dec 2006, 21:03 
                  Thanks everybody for reply!
 
                  
                MCD code seems to be the shortest. I'll profile and test it. My processor is Intel Pentium M Processor 730 so i have SSE2.  | 
              |||
                  
  | 
              
< Last Thread | Next Thread >  | 
    
Forum Rules: 
  | 
    
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.