flat assembler
Message board for the users of flat assembler.

Index > Main > Best way to do absolute difference with general purpose regs


Which |eax-ebx| is the fastest/best for you?
1
25%
 25%  [ 2 ]
2
0%
 0%  [ 0 ]
3
0%
 0%  [ 0 ]
4
0%
 0%  [ 0 ]
5
0%
 0%  [ 0 ]
6
0%
 0%  [ 0 ]
7
0%
 0%  [ 0 ]
8
0%
 0%  [ 0 ]
9
0%
 0%  [ 0 ]
10
0%
 0%  [ 0 ]
11
0%
 0%  [ 0 ]
12
0%
 0%  [ 0 ]
13
37%
 37%  [ 3 ]
14
0%
 0%  [ 0 ]
15
0%
 0%  [ 0 ]
I don't care
0%
 0%  [ 0 ]
All (It doesn't matter)
12%
 12%  [ 1 ]
None of them (another variant)
0%
 0%  [ 0 ]
I don't understand
25%
 25%  [ 2 ]
Total Votes : 8

Author
Thread Post new topic Reply to topic
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 22 Aug 2008, 09:52
Lately I was hanging around a piece of code and noticed that there are too many ways to do a 32bit absolute difference with general purpose registers.

Well, I haven't tested them, but would like to know which you find the best/fastest way Smile
Code:
;compute |eax-ebx|
abs_dif:
       sub     eax,ebx
;1
       cdq
 xor     eax,edx
     sub     eax,edx
;2
       sbb     edx,edx
     xor     eax,edx
     sub     eax,edx
;3
       cdq
 or      edx,1
       imul    eax,edx
;4
       sbb     edx,edx
     or      edx,1
       imul    eax,edx
;5
       cdq
 bts     edx,0
       imul    eax,edx
;6
       sbb     edx,edx
     bts     edx,0
       imul    eax,edx
;7
       cdq
 lea     edx,[edx*2+1]
       imul    eax,edx
;8
       sbb     edx,edx
     lea     edx,[edx*2+1]
       imul    eax,edx
;9
       xor     edx,edx
     sub     edx,eax
     cmovnc  eax,edx
;10
      mov     edx,0
       sub     edx,eax
     cmovnc  eax,edx
;11
      mov     edx,eax
     neg     eax
 cmovs   eax,edx
;12
      mov     edx,eax
     neg     edx
 cmovns  eax,edx
;13
      imul    edx,eax,-1
  cmovnc  eax,edx
;14
      jnc     @f
  neg     eax
@@:
;15
       jnc     @f
  imul    eax,eax,-1
@@:
;end
    

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 22 Aug 2008, 09:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 22 Aug 2008, 10:47
Best in what sense? Fastest in what sense? I'm being serious here, there are many ways to define how we measure those terms.

In a critical performance loop I would try to use the version that uses the least CPU clock-cycles/resources in the average weighted case. In other non-performance code I would try to use the easiest to understand at a quick glance.


Last edited by revolution on 23 Aug 2008, 06:23; edited 1 time in total
Post 22 Aug 2008, 10:47
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 22 Aug 2008, 14:26
Code:
sub eax,ebx
jnl @f
neg eax
@@:
    

in general, absolute value is not alone, we have to combine it with other operation, then, i prefer to use an intermediate step to compute it.
Post 22 Aug 2008, 14:26
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 22 Aug 2008, 14:35
depends on circumstances ...
Post 22 Aug 2008, 14:35
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 22 Aug 2008, 14:37
(but the list of ways is still cool, it just looks weird inside such overgeneralizing question)
Post 22 Aug 2008, 14:37
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 22 Aug 2008, 14:48
13 all the way! I haven't measured its size or speed yet, but I think Core 2- and future-proof it is (Nehalem etc.).

But what about:
Code:
;...insert SSE-code here... Smile
    
Post 22 Aug 2008, 14:48
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 22 Aug 2008, 15:50
Well, I was trying to code some GCD algorithm and needed to calculate the absolute difference of the 2 numbers each time in a loop and once at the beginning. Also, because of the GCD algorithm, SSE/MMX stuff seem to be unfortunate.

I coded all without testing them, and I'm especially uncertain if variant 13 actually works. the critical point is whether IMUL updates the carry flag as needed.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 22 Aug 2008, 15:50
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 23 Aug 2008, 03:54
The one which improves compression of the final product because when a million people download your program every byte counts. Okay, maybe not. Razz

(13) shouldn't work. Carry is only clear for EAX=0 and 1, iirc.

Edit: just tested it - wrong again - carry only set for $80000000.

( http://www.asmcommunity.net/board/index.php?topic=4184.0 )

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 11 Sep 2008, 01:53; edited 1 time in total
Post 23 Aug 2008, 03:54
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 23 Aug 2008, 22:24
bitRAKE wrote:

[---]
(13) shouldn't work. Carry is only clear for EAX=0 and 1, iirc.

Edit: just tested it - wrong again - carry only set for $80000000.

...so either we need an imul with no flags set or we need a conditional move before that. Some delayed-writing comes to mind which is possible only with microinstructional level - not accessible to us, humans.

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 23 Aug 2008, 22:24
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1900
DOS386 24 Aug 2008, 00:35
Nice Smile

1 and 2 look good (hope they work also) ... IMUL, CMOVNTQ and SSE are no-go's IMHO Rolling Eyes
Post 24 Aug 2008, 00:35
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 24 Aug 2008, 05:06
DOS386 wrote:

IMUL, CMOVNTQ and SSE are no-go's IMHO Rolling Eyes

From what I've tested, CMOVcc is as fast as other simple instructions like add, sub, and, mov... (BTW, there is no CMOVNTQ). So there is now reason why not to use them, except if you code for code size (AFAIR CMOVcc needs 3 bytes with only registers) or what to be for elder than 80686 CPUs.

For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU, while the other 32x32 -> 64 IMUL and MUL take longer.
Post 24 Aug 2008, 05:06
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Aug 2008, 07:56
Quote:
For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU

good to know. I'd quess it is because of smart use of lookup tables in CPU. Then, it's just some (not many) shiftings and addings
Post 24 Aug 2008, 07:56
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 24 Aug 2008, 10:13
Its because its a Core I think, at least Agner says that. Maybe newer AMD64's also have 1-cycle imuls. Latency is still bad I think...
Code:
opcode           fused p0-5 p_0 p_1 p_5   lat RcpTh
IMUL        r16,r16    1     1       1         3    1
IMUL       r32,r32    1     1       1         3    1
IMUL       r64,r64    1     1   1             5    2
IMUL       r16,r16,i  1     1       1         3    1
IMUL       r32,r32,i  1     1       1         3    1
IMUL       r64,r64,i  1     1   1             5    2
    
Post 24 Aug 2008, 10:13
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 06 Sep 2008, 21:37
That's what I call "Zwicky Box in action"... Wink

Let's figure out, which variants will not work definitely.

#9 & #10 is an obvious examples:
Code:
        xor     edx,edx
        sub     edx,eax    
will result in CF clear only if eax is zero.

#13: imul will set CF (and OF) only if result does not fit in the destination (as bitRAKE noticed, -1 * (-2^31) does not).

Even sub eax, ebx is not so simple. Both signed and unsigned operands can (and certainly will) overflow the destination.

Difference (which I will refer from now on as R, notice that it is not the eax value after the subtraction) can be anywhere in range from -2^32+1 to 2^32-1. So |R| will nicely fit in unsigned 32-bit.

If operands are unsigned, CF (as a result of sub) indicates the need of negation as a sign bit of 33-bit subtraction result. This leaves us variants 2, 4, 6, 8, 14 and 15 (e.g.
Code:
        cdq
        xor     eax,edx
        sub     eax,edx    
will fail if after subtraction eax had most significant bit set and CF is not set, for example $80000000-0).

What if operands are signed?

Let's first check for signed overflow, that is, for R >= 2^31. In that case OF and SF are set. eax will contain unsigned 32-bit absolute difference.

In range 0 <= R < 2^31 OF and SF are clear. eax is our absolute difference.

In range -2^31 <= R < 0 OF is clear but SF is set. We need to negate eax.

In range -2^32+1 <= R < -2^31 OF is set buf SF is clear. We need to negate eax to obtain unsigned 32-bit absolute difference. Isn't it a kind of magic?

Alas, no code sample adheres to this (negate eax after subtraction if OF<>SF), like this one:
Code:
;16
        jnl     @f
        neg     eax
@@:    
As a general rule of thumb: CF for unsigned, OF xor SF for signed.
Post 06 Sep 2008, 21:37
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 06 Sep 2008, 21:41
new instruction?
why didn't they do this before?

abs eax
abs [eax]
etc...
Post 06 Sep 2008, 21:41
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 10 Sep 2008, 23:55
edfed wrote:
new instruction?
why didn't they do this before?

abs eax
abs [eax]
etc...
They're too busy to implement CRC32 hardwired to not-so-good polynomial 0x11EDC6F41... Damn, even AAD/AAM had hidden immediate operand, which extends their use.
Post 10 Sep 2008, 23:55
View user's profile Send private message Reply with quote
Pirata Derek



Joined: 31 Oct 2008
Posts: 259
Location: Italy
Pirata Derek 24 Jul 2009, 11:46
For me the best way to get absolute difference is this:
Code:
sub eax,ebx ; calculate difference
jns @F ; if not negative number then positive difference
neg eax ; convert negative difference to positive
@@: ; exit with eax = absolute (positive) difference
    

Matematical explanation:

1) If A > B ---> A - B = +Delta

2) If A < B ---> A - B = -Delta ---> - ( -Delta ) = +Delta

This way is more comprensible
Post 24 Jul 2009, 11:46
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.