flat assembler
Message board for the users of flat assembler.
Index
> Main > Question about (clever) optimization.... |
Author |
|
bitRAKE 05 Apr 2024, 20:00
Well, the OR EAX,EAX is superfluous - the AND instruction already set the flags.
Guess we can't assume bit-3 and just XOR EAX,111b, on the earlier condition? Beyond that we could look at the expressions these operations represent, or probabilities of each branch. It looks like you want to preserve XMM2/3 and XMM0 is accumulating prior state? Maybe?: Code: cmpltps xmm4,xmm2 cmpltps xmm1,xmm3 cmpltps xmm7,xmm2 cmpltps xmm3,.var xorps xmm1,xmm4 xorps xmm3,xmm7 orps xmm0,xmm1 movmskps edx,xmm3 and edx,111b jnz .chck movmskps eax,xmm0 and eax,111b cmp eax,111b je .chck Code: cmp eax, 111_000b ja .chk Code: shl eax, 32-3 ; low group !0 shrd eax, edx, 3 ; high group =7 cmp eax, 0xE000_0000 ja .chk _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
05 Apr 2024, 20:00 |
|
macgub 06 Apr 2024, 05:37
Thanks for reply.
bitRAKE wrote:
Yes, you are right. bitRAKE wrote:
This one looks promising... What about avoid two movmskps instructions and put bit information into eax at once? |
|||
06 Apr 2024, 05:37 |
|
macgub 06 Apr 2024, 18:05
Hi,
According to your solution, mine is: Code: cmpltps xmm4,xmm2 cmpltps xmm0,xmm3 cmpltps xmm7,xmm2 cmpltps xmm1,xmm3 cmpltps xmm2,.var1 cmpltps xmm3,.var2 xorps xmm0,xmm4 xorps xmm1,xmm7 xorps xmm2,xmm3 orps xmm0,xmm1 packssdw xmm2,xmm2 packssdw xmm0,xmm0 packsswb xmm2,xmm2 packsswb xmm0,xmm0 punpckldq xmm0,xmm2 pmovmskb eax,xmm0 and eax,01110111b cmp al,11110000b jna .chck But its biggger... |
|||
06 Apr 2024, 18:05 |
|
macgub 07 Apr 2024, 17:11
Samples of code above I use as preselection data to perform edges - triangles intersection test in my 3d objects viewer app. Version with two jumps works about 2 times faster. According to wall clock ~42 vs ~82 seconds. (Calculation on the same test object). Version with one jump I get something like ~40 seconds.
I think to more decrease computing time, I must improve preselecting algo. Some presorted data will do the job. bitRAKE thanks again for your effort,, |
|||
07 Apr 2024, 17:11 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.