flat assembler
Message board for the users of flat assembler.

Index > Main > Question about (clever) optimization....

Author
Thread Post new topic Reply to topic
macgub



Joined: 11 Jan 2006
Posts: 345
Location: Poland
macgub 05 Apr 2024, 18:59
Hi,
I have this sample code:
Code:
   cmpltps xmm4,xmm2
   cmpltps xmm1,xmm3
   xorps   xmm1,xmm4
   orps    xmm0,xmm1
   movmskps eax,xmm0
   and      eax,111b
   cmp      eax,111b
   je       .chck

   movaps  xmm1,.var
   cmpltps xmm7,xmm2
   cmpltps xmm1,xmm3
   xorps   xmm1,xmm7
   movmskps eax,xmm1
   and      eax,111b
   or       eax,eax
   jnz      .chck           

....
.chck:

    

Any ideas to convert it to one jump? Or (at last) other code minimization?
Thanks....
Post 05 Apr 2024, 18:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3944
Location: vpcmipstrm
bitRAKE 05 Apr 2024, 20:00
Well, the OR EAX,EAX is superfluous - the AND instruction already set the flags.

Guess we can't assume bit-3 and just XOR EAX,111b, on the earlier condition?

Beyond that we could look at the expressions these operations represent, or probabilities of each branch.

It looks like you want to preserve XMM2/3 and XMM0 is accumulating prior state?

Maybe?:
Code:
   cmpltps xmm4,xmm2
   cmpltps xmm1,xmm3

   cmpltps xmm7,xmm2
   cmpltps xmm3,.var

   xorps   xmm1,xmm4
   xorps   xmm3,xmm7

   orps    xmm0,xmm1

   movmskps edx,xmm3
   and      edx,111b
   jnz      .chck

   movmskps eax,xmm0
   and      eax,111b
   cmp      eax,111b
   je       .chck    
Edit: ... if we work backward:
Code:
        cmp eax, 111_000b
        ja .chk    
... assuming we've combined the three bit groups - to have a single branch. The top three bits are always set and any non-zero in the low three bits. How to efficiently combine them though?
Code:
shl eax, 32-3           ; low group !0
shrd eax, edx, 3        ; high group =7
cmp eax, 0xE000_0000
ja .chk    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 05 Apr 2024, 20:00
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 345
Location: Poland
macgub 06 Apr 2024, 05:37
Thanks for reply. Very Happy
bitRAKE wrote:

Guess we can't assume bit-3 and just XOR EAX,111b, on the earlier condition?

Yes, you are right.

bitRAKE wrote:

Code:
shl eax, 32-3   ; low group !0
shrd eax, edx, 3        ; high group =7
cmp eax, 0xE000_0000
ja .chk
    


This one looks promising...
What about avoid two movmskps instructions and put bit information into eax at once?
Post 06 Apr 2024, 05:37
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 345
Location: Poland
macgub 06 Apr 2024, 18:05
Hi,
According to your solution, mine is:
Code:
   cmpltps xmm4,xmm2
   cmpltps xmm0,xmm3
   cmpltps xmm7,xmm2
   cmpltps xmm1,xmm3

   cmpltps xmm2,.var1
   cmpltps xmm3,.var2
   xorps   xmm0,xmm4
   xorps   xmm1,xmm7
   xorps   xmm2,xmm3
   orps    xmm0,xmm1

   packssdw  xmm2,xmm2
   packssdw  xmm0,xmm0
   packsswb xmm2,xmm2
   packsswb xmm0,xmm0
   punpckldq xmm0,xmm2
   pmovmskb eax,xmm0
   and     eax,01110111b
   cmp     al,11110000b
   jna     .chck    
    

But its biggger...
Post 06 Apr 2024, 18:05
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 345
Location: Poland
macgub 07 Apr 2024, 17:11
Samples of code above I use as preselection data to perform edges - triangles intersection test in my 3d objects viewer app. Version with two jumps works about 2 times faster. According to wall clock ~42 vs ~82 seconds. (Calculation on the same test object). Version with one jump I get something like ~40 seconds.
I think to more decrease computing time, I must improve preselecting algo. Some presorted data will do the job. bitRAKE thanks again for your effort,,
Post 07 Apr 2024, 17:11
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.