flat assembler - ...because there is no algos section

Index > Main > ...because there is no algos section - here you are MAX(A,B)

Author

Thread

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 03 Oct 2005, 20:37

There was a discussion on Orkut.com community named "ASM".

Quote:

Don't Jump!!! 8/17/2005 1:14 AM
heres a programming in ASM problem.

Move the Greater of A and B to a Location C
WITHOUT using any jump instructions.

Code:

;Provided EAX and EBX have some values 0...0FFFFFFFFh
;signed or unsigned
mov  edx,eax ; save A
sub  eax,ebx ; A-B
shr  eax,31  ; Carry (a<b?)
neg  eax     ; eax= 0 | 0FFFFFFFFh
and  ebx,eax ; eax=0?ebx=0:ebx=ebx
not  eax
and  edx,eax ; eax=0?edx=0:edx=edx
add  ebx,edx ; either one is zero so ebx is A+0 or 0+B

_________________
My updated idol Very Happy

http://www.agner.org/optimize/

03 Oct 2005, 20:37

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20896
Location: In your JS exploiting you and your system

revolution 04 Oct 2005, 01:45

I've always used this sequence for MAX:

Code:

sub     eax,ebx ;a=a-b
setge   dl
movzx   edx,dl
neg     edx     ;d=0 if a<b, d=-1 if a>=b
and     eax,edx ;a=0 if a<b, a=a-b if a>=b
add     eax,ebx ;a=b if a<b, a=a if a>=b

You can adjust the "setge" to whatever condition you need for MAX, MIN, signed, unsigned etc. If you make it "setns" then you get that same as your code above.

04 Oct 2005, 01:45

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20896
Location: In your JS exploiting you and your system

revolution 04 Oct 2005, 02:02

Hehe, just for comparison in ARM it looks like this:

Code:

cmp r0,r1
movlt r0,r1

And using a P2 or later this:

Code:

cmp eax,ebx
cmovl eax,ebx

04 Oct 2005, 02:02

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 04 Oct 2005, 08:26

Hmm, that looks great and with CMOVxx the answer is trivial, but is SETxx supported instruction in 8086 for example?

04 Oct 2005, 08:26

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20896
Location: In your JS exploiting you and your system

revolution 04 Oct 2005, 09:48

Quote:

but is SETxx supported instruction in 8086 for example?

SETcc is an 80386 (and up) instruction.

04 Oct 2005, 09:48

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20896
Location: In your JS exploiting you and your system

revolution 04 Oct 2005, 09:54

For signed and unsigned like above you can shorten the code to this which doesn't use SETcc:

Code:

sub eax,ebx ;a=a-b 
mov edx,eax
sar edx,31
not edx ;d=0 if a<b, d=-1 if a>=b 
and eax,edx ;a=0 if a<b, a=a-b if a>=b 
add eax,ebx ;a=b if a<b, a=a if a>=b

04 Oct 2005, 09:54

Madis731

Joined: 25 Sep 2003
Posts: 2138
Location: Estonia

Madis731 04 Oct 2005, 10:57

why I couldn't have thought about that :S SAR fills edx with signbits Smile

04 Oct 2005, 10:57

r22

Joined: 27 Dec 2004
Posts: 805

r22 04 Oct 2005, 20:35

The following can be easily adapted to 32bit
(hint find and replace all R's with E's Very Happy

)

Benchmarking with randomly produced numbers will show that it's slightly faster than using Conditional Mov instructions.

Code:

;;;;64bit Fastcall
Max:
;;RCX = num1, RDX = num2
MOV RAX,RCX
SUB RCX,RDX
SBB RDX,RDX   
AND RDX,RCX
SUB RAX,RDX
RET

;;For inline usage you can make the MIN function smaller.
;;Removing the MOV RAX,RDX and replacing RAX with RDX in last line.
Min:
;;RCX = num1, RDX = num2
MOV RAX,RDX   ;;;drop this line of code for inline
SUB RCX,RDX
SBB R8,R8       ;;;use SBB RAX,RAX for inline
AND R8,RCX     ;;;;replace with AND RAX,RCX for inline
ADD RAX,R8    ;;replace with ADD RDX,RAX for inline
RET

04 Oct 2005, 20:35

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20896
Location: In your JS exploiting you and your system

revolution 05 Oct 2005, 00:48

r22: Your code works for unsigned numbers, but Madis731 was using signed numbers. SBB x,x propagates the carry whereas SAR x,31 propagates the sign.

I am rather suprised that your code is faster than CMOVcc. What processor are you testing on?

05 Oct 2005, 00:48

r22

Joined: 27 Dec 2004
Posts: 805

r22 05 Oct 2005, 04:22

Yea signed numbers ruin everything.
Just have to replace SBB with CDQ or for 64bit CQO

Code:

SUB EAX,ECX
CDQ
AND EAX,EDX
ADD ECX,EAX

P4 3.2ghz
Ok the bencharmking with CMOVL I stand corrected
Trying both with random 32bit values running them 07FFFFFFh each
-----------------------------------
My re-code above 3734ms 3rd so close to beating cmov dang!
Cmp & Cmovl 3625ms 2nd
The code below 3406ms 1st
-----------------------------------

Code:

;;;INLINE SSE MIN
CVTSI2SD XMM0, EAX
CVTSI2SD XMM1, ECX
MINSD XMM0,XMM1
CVTSD2SI EAX,XMM0

Using Scalar SINGLE (SS) instead of SCALAR DOUBLE (SD) shaves off another 80ms on the benchmark (3375ms).

SSE works great for 32bit signed integer, but in 64bit land and unsigned 32bit integers the conditional mov will likely come out on top.

05 Oct 2005, 04:22

hopcode

Joined: 04 Mar 2008
Posts: 563
Location: Germany

hopcode 02 Nov 2008, 02:54

revolution wrote:

I've always used this sequence for MAX:

Code:

sub     eax,ebx ;a=a-b
setge   dl
movzx   edx,dl
neg     edx     ;d=0 if a<b, d=-1 if a>=b
and     eax,edx ;a=0 if a<b, a=a-b if a>=b
add     eax,ebx ;a=b if a<b, a=a if a>=b

....

Very cool !!! Cool

A possible way with xmm/sse instruction on Agner Fog Chap 13.1
Edit:2

hopcode[mrk]

Last edited by hopcode on 09 Nov 2008, 16:36; edited 4 times in total

02 Nov 2008, 02:54

baldr

Joined: 19 Mar 2008
Posts: 1651

baldr 02 Nov 2008, 22:04

Signed subtraction can easily overflow, so code have to check OF as well... Thus cdq/sar rxx, 31 will give incorrect result (e.g. try 2³¹-1 and -1), while CMOVcc/SETcc examples will do fine.

02 Nov 2008, 22:04

hopcode

Joined: 04 Mar 2008
Posts: 563
Location: Germany

hopcode 09 Nov 2008, 16:51

Edit
/to bladr/ Thanks a lot pointing it out.
I like working code so i would avoid the real mess Laughing

Perhaps, it is more useful choosing the range of the numbers,positive
or negative, signed unsigned, consequently apply the right code...

hopcode[mrk]

Last edited by hopcode on 09 Nov 2008, 19:43; edited 1 time in total

09 Nov 2008, 16:51

baldr

Joined: 19 Mar 2008
Posts: 1651

baldr 09 Nov 2008, 17:30

hopcode,

I'd just pointed to difference between signed and unsigned subtraction result flags (signed eax < ebx iff SF<>OF after sub/cmp eax, ebx, while for unsigned CF is sufficient). Your code appears to go great lengths to correct only one case of the OF being set, 2³¹-1 - (-1). How about 2³¹-2 - (-2)? Wink

By the way, signed 0x80000000 is -2³¹, much less than -1.

09 Nov 2008, 17:30

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum