flat assembler
Message board for the users of flat assembler.
Index
> Main > ...because there is no algos section - here you are MAX(A,B) |
Author |
|
revolution 04 Oct 2005, 01:45
I've always used this sequence for MAX:
Code: sub eax,ebx ;a=a-b setge dl movzx edx,dl neg edx ;d=0 if a<b, d=-1 if a>=b and eax,edx ;a=0 if a<b, a=a-b if a>=b add eax,ebx ;a=b if a<b, a=a if a>=b |
|||
04 Oct 2005, 01:45 |
|
revolution 04 Oct 2005, 02:02
Hehe, just for comparison in ARM it looks like this:
Code: cmp r0,r1 movlt r0,r1 Code: cmp eax,ebx cmovl eax,ebx |
|||
04 Oct 2005, 02:02 |
|
Madis731 04 Oct 2005, 08:26
Hmm, that looks great and with CMOVxx the answer is trivial, but is SETxx supported instruction in 8086 for example?
|
|||
04 Oct 2005, 08:26 |
|
revolution 04 Oct 2005, 09:48
Quote: but is SETxx supported instruction in 8086 for example? |
|||
04 Oct 2005, 09:48 |
|
revolution 04 Oct 2005, 09:54
For signed and unsigned like above you can shorten the code to this which doesn't use SETcc:
Code: sub eax,ebx ;a=a-b mov edx,eax sar edx,31 not edx ;d=0 if a<b, d=-1 if a>=b and eax,edx ;a=0 if a<b, a=a-b if a>=b add eax,ebx ;a=b if a<b, a=a if a>=b |
|||
04 Oct 2005, 09:54 |
|
Madis731 04 Oct 2005, 10:57
why I couldn't have thought about that :S SAR fills edx with signbits
|
|||
04 Oct 2005, 10:57 |
|
r22 04 Oct 2005, 20:35
The following can be easily adapted to 32bit
(hint find and replace all R's with E's ) Benchmarking with randomly produced numbers will show that it's slightly faster than using Conditional Mov instructions. Code: ;;;;64bit Fastcall Max: ;;RCX = num1, RDX = num2 MOV RAX,RCX SUB RCX,RDX SBB RDX,RDX AND RDX,RCX SUB RAX,RDX RET ;;For inline usage you can make the MIN function smaller. ;;Removing the MOV RAX,RDX and replacing RAX with RDX in last line. Min: ;;RCX = num1, RDX = num2 MOV RAX,RDX ;;;drop this line of code for inline SUB RCX,RDX SBB R8,R8 ;;;use SBB RAX,RAX for inline AND R8,RCX ;;;;replace with AND RAX,RCX for inline ADD RAX,R8 ;;replace with ADD RDX,RAX for inline RET |
|||
04 Oct 2005, 20:35 |
|
revolution 05 Oct 2005, 00:48
r22: Your code works for unsigned numbers, but Madis731 was using signed numbers. SBB x,x propagates the carry whereas SAR x,31 propagates the sign.
I am rather suprised that your code is faster than CMOVcc. What processor are you testing on? |
|||
05 Oct 2005, 00:48 |
|
r22 05 Oct 2005, 04:22
Yea signed numbers ruin everything.
Just have to replace SBB with CDQ or for 64bit CQO Code: SUB EAX,ECX CDQ AND EAX,EDX ADD ECX,EAX P4 3.2ghz Ok the bencharmking with CMOVL I stand corrected Trying both with random 32bit values running them 07FFFFFFh each ----------------------------------- My re-code above 3734ms 3rd so close to beating cmov dang! Cmp & Cmovl 3625ms 2nd The code below 3406ms 1st ----------------------------------- Code: ;;;INLINE SSE MIN CVTSI2SD XMM0, EAX CVTSI2SD XMM1, ECX MINSD XMM0,XMM1 CVTSD2SI EAX,XMM0 Using Scalar SINGLE (SS) instead of SCALAR DOUBLE (SD) shaves off another 80ms on the benchmark (3375ms). SSE works great for 32bit signed integer, but in 64bit land and unsigned 32bit integers the conditional mov will likely come out on top. |
|||
05 Oct 2005, 04:22 |
|
hopcode 02 Nov 2008, 02:54
revolution wrote: I've always used this sequence for MAX: Very cool !!! A possible way with xmm/sse instruction on Agner Fog Chap 13.1 Edit:2 hopcode[mrk] Last edited by hopcode on 09 Nov 2008, 16:36; edited 4 times in total |
|||
02 Nov 2008, 02:54 |
|
baldr 02 Nov 2008, 22:04
Signed subtraction can easily overflow, so code have to check OF as well... Thus cdq/sar rxx, 31 will give incorrect result (e.g. try 2³¹-1 and -1), while CMOVcc/SETcc examples will do fine.
|
|||
02 Nov 2008, 22:04 |
|
hopcode 09 Nov 2008, 16:51
Edit
/to bladr/ Thanks a lot pointing it out. I like working code so i would avoid the real mess Perhaps, it is more useful choosing the range of the numbers,positive or negative, signed unsigned, consequently apply the right code... hopcode[mrk] Last edited by hopcode on 09 Nov 2008, 19:43; edited 1 time in total |
|||
09 Nov 2008, 16:51 |
|
baldr 09 Nov 2008, 17:30
hopcode,
I'd just pointed to difference between signed and unsigned subtraction result flags (signed eax < ebx iff SF<>OF after sub/cmp eax, ebx, while for unsigned CF is sufficient). Your code appears to go great lengths to correct only one case of the OF being set, 2³¹-1 - (-1). How about 2³¹-2 - (-2)? By the way, signed 0x80000000 is -2³¹, much less than -1. |
|||
09 Nov 2008, 17:30 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.