flat assembler
Message board for the users of flat assembler.

Index > Main > ...because there is no algos section - here you are MAX(A,B)

Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
There was a discussion on Orkut.com community named "ASM".
Quote:
Don't Jump!!! 8/17/2005 1:14 AM
heres a programming in ASM problem.

Move the Greater of A and B to a Location C
WITHOUT using any jump instructions.

Code:
;Provided EAX and EBX have some values 0...0FFFFFFFFh
;signed or unsigned
mov  edx,eax ; save A
sub  eax,ebx ; A-B
shr  eax,31  ; Carry (a<b?)
neg  eax     ; eax= 0 | 0FFFFFFFFh
and  ebx,eax ; eax=0?ebx=0:ebx=ebx
not  eax
and  edx,eax ; eax=0?edx=0:edx=edx
add  ebx,edx ; either one is zero so ebx is A+0 or 0+B
    

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 03 Oct 2005, 20:37
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
I've always used this sequence for MAX:
Code:
sub     eax,ebx ;a=a-b
setge   dl
movzx   edx,dl
neg     edx     ;d=0 if a<b, d=-1 if a>=b
and     eax,edx ;a=0 if a<b, a=a-b if a>=b
add     eax,ebx ;a=b if a<b, a=a if a>=b    
You can adjust the "setge" to whatever condition you need for MAX, MIN, signed, unsigned etc. If you make it "setns" then you get that same as your code above.
Post 04 Oct 2005, 01:45
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
Hehe, just for comparison in ARM it looks like this:
Code:
cmp r0,r1
movlt r0,r1    
And using a P2 or later this:
Code:
cmp eax,ebx
cmovl eax,ebx    
Post 04 Oct 2005, 02:02
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Hmm, that looks great and with CMOVxx the answer is trivial, but is SETxx supported instruction in 8086 for example?
Post 04 Oct 2005, 08:26
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
Quote:
but is SETxx supported instruction in 8086 for example?
SETcc is an 80386 (and up) instruction.
Post 04 Oct 2005, 09:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
For signed and unsigned like above you can shorten the code to this which doesn't use SETcc:
Code:
sub eax,ebx ;a=a-b 
mov edx,eax
sar edx,31
not edx ;d=0 if a<b, d=-1 if a>=b 
and eax,edx ;a=0 if a<b, a=a-b if a>=b 
add eax,ebx ;a=b if a<b, a=a if a>=b    
Post 04 Oct 2005, 09:54
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
why I couldn't have thought about that :S SAR fills edx with signbits Smile
Post 04 Oct 2005, 10:57
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
The following can be easily adapted to 32bit
(hint find and replace all R's with E's Very Happy)

Benchmarking with randomly produced numbers will show that it's slightly faster than using Conditional Mov instructions.


Code:
;;;;64bit Fastcall
Max:
;;RCX = num1, RDX = num2
MOV RAX,RCX
SUB RCX,RDX
SBB RDX,RDX   
AND RDX,RCX
SUB RAX,RDX
RET

;;For inline usage you can make the MIN function smaller.
;;Removing the MOV RAX,RDX and replacing RAX with RDX in last line.
Min:
;;RCX = num1, RDX = num2
MOV RAX,RDX   ;;;drop this line of code for inline
SUB RCX,RDX
SBB R8,R8       ;;;use SBB RAX,RAX for inline
AND R8,RCX     ;;;;replace with AND RAX,RCX for inline
ADD RAX,R8    ;;replace with ADD RDX,RAX for inline
RET
    
Post 04 Oct 2005, 20:35
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
r22: Your code works for unsigned numbers, but Madis731 was using signed numbers. SBB x,x propagates the carry whereas SAR x,31 propagates the sign.

I am rather suprised that your code is faster than CMOVcc. What processor are you testing on?
Post 05 Oct 2005, 00:48
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Yea signed numbers ruin everything.
Just have to replace SBB with CDQ or for 64bit CQO
Code:
SUB EAX,ECX
CDQ
AND EAX,EDX
ADD ECX,EAX
    


P4 3.2ghz
Ok the bencharmking with CMOVL I stand corrected
Trying both with random 32bit values running them 07FFFFFFh each
-----------------------------------
My re-code above 3734ms 3rd so close to beating cmov dang!
Cmp & Cmovl 3625ms 2nd
The code below 3406ms 1st
-----------------------------------

Code:
;;;INLINE SSE MIN
CVTSI2SD XMM0, EAX
CVTSI2SD XMM1, ECX
MINSD XMM0,XMM1
CVTSD2SI EAX,XMM0
    

Using Scalar SINGLE (SS) instead of SCALAR DOUBLE (SD) shaves off another 80ms on the benchmark (3375ms).

SSE works great for 32bit signed integer, but in 64bit land and unsigned 32bit integers the conditional mov will likely come out on top.
Post 05 Oct 2005, 04:22
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
revolution wrote:
I've always used this sequence for MAX:
Code:
sub     eax,ebx ;a=a-b
setge   dl
movzx   edx,dl
neg     edx     ;d=0 if a<b, d=-1 if a>=b
and     eax,edx ;a=0 if a<b, a=a-b if a>=b
add     eax,ebx ;a=b if a<b, a=a if a>=b    
....

Very cool !!!Cool
A possible way with xmm/sse instruction on Agner Fog Chap 13.1
Edit:2

hopcode[mrk]


Last edited by hopcode on 09 Nov 2008, 16:36; edited 4 times in total
Post 02 Nov 2008, 02:54
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
Signed subtraction can easily overflow, so code have to check OF as well... Thus cdq/sar rxx, 31 will give incorrect result (e.g. try 2³¹-1 and -1), while CMOVcc/SETcc examples will do fine.
Post 02 Nov 2008, 22:04
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Edit
/to bladr/ Thanks a lot pointing it out.
I like working code so i would avoid the real mess Laughing
Perhaps, it is more useful choosing the range of the numbers,positive
or negative, signed unsigned, consequently apply the right code...

hopcode[mrk]


Last edited by hopcode on 09 Nov 2008, 19:43; edited 1 time in total
Post 09 Nov 2008, 16:51
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

I'd just pointed to difference between signed and unsigned subtraction result flags (signed eax < ebx iff SF<>OF after sub/cmp eax, ebx, while for unsigned CF is sufficient). Your code appears to go great lengths to correct only one case of the OF being set, 2³¹-1 - (-1). How about 2³¹-2 - (-2)? Wink By the way, signed 0x80000000 is -2³¹, much less than -1.
Post 09 Nov 2008, 17:30
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.