flat assembler
Message board for the users of flat assembler.

 Index > Main > Best way to do absolute difference with general purpose regs

Which |eax-ebx| is the fastest/best for you?
1  25%  [ 2 ]
2  0%  [ 0 ]
3  0%  [ 0 ]
4  0%  [ 0 ]
5  0%  [ 0 ]
6  0%  [ 0 ]
7  0%  [ 0 ]
8  0%  [ 0 ]
9  0%  [ 0 ]
10  0%  [ 0 ]
11  0%  [ 0 ]
12  0%  [ 0 ]
13  37%  [ 3 ]
14  0%  [ 0 ]
15  0%  [ 0 ]
I don't care  0%  [ 0 ]
All (It doesn't matter)  12%  [ 1 ]
None of them (another variant)  0%  [ 0 ]
I don't understand  25%  [ 2 ]

Author
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 22 Aug 2008, 09:52
Lately I was hanging around a piece of code and noticed that there are too many ways to do a 32bit absolute difference with general purpose registers.

Well, I haven't tested them, but would like to know which you find the best/fastest way
Code:
```;compute |eax-ebx|
abs_dif:
sub     eax,ebx
;1
cdq
xor     eax,edx
sub     eax,edx
;2
sbb     edx,edx
xor     eax,edx
sub     eax,edx
;3
cdq
or      edx,1
imul    eax,edx
;4
sbb     edx,edx
or      edx,1
imul    eax,edx
;5
cdq
bts     edx,0
imul    eax,edx
;6
sbb     edx,edx
bts     edx,0
imul    eax,edx
;7
cdq
lea     edx,[edx*2+1]
imul    eax,edx
;8
sbb     edx,edx
lea     edx,[edx*2+1]
imul    eax,edx
;9
xor     edx,edx
sub     edx,eax
cmovnc  eax,edx
;10
mov     edx,0
sub     edx,eax
cmovnc  eax,edx
;11
mov     edx,eax
neg     eax
cmovs   eax,edx
;12
mov     edx,eax
neg     edx
cmovns  eax,edx
;13
imul    edx,eax,-1
cmovnc  eax,edx
;14
jnc     @f
neg     eax
@@:
;15
jnc     @f
imul    eax,eax,-1
@@:
;end
```

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
22 Aug 2008, 09:52
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 19869
revolution 22 Aug 2008, 10:47
Best in what sense? Fastest in what sense? I'm being serious here, there are many ways to define how we measure those terms.

In a critical performance loop I would try to use the version that uses the least CPU clock-cycles/resources in the average weighted case. In other non-performance code I would try to use the easiest to understand at a quick glance.

Last edited by revolution on 23 Aug 2008, 06:23; edited 1 time in total
22 Aug 2008, 10:47
edfed

Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 22 Aug 2008, 14:26
Code:
```sub eax,ebx
jnl @f
neg eax
@@:
```

in general, absolute value is not alone, we have to combine it with other operation, then, i prefer to use an intermediate step to compute it.
22 Aug 2008, 14:26
vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 22 Aug 2008, 14:35
depends on circumstances ...
22 Aug 2008, 14:35
vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 22 Aug 2008, 14:37
(but the list of ways is still cool, it just looks weird inside such overgeneralizing question)
22 Aug 2008, 14:37

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
13 all the way! I haven't measured its size or speed yet, but I think Core 2- and future-proof it is (Nehalem etc.).

Code:
```;...insert SSE-code here...
```
22 Aug 2008, 14:48
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 22 Aug 2008, 15:50
Well, I was trying to code some GCD algorithm and needed to calculate the absolute difference of the 2 numbers each time in a loop and once at the beginning. Also, because of the GCD algorithm, SSE/MMX stuff seem to be unfortunate.

I coded all without testing them, and I'm especially uncertain if variant 13 actually works. the critical point is whether IMUL updates the carry flag as needed.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
22 Aug 2008, 15:50
bitRAKE

Joined: 21 Jul 2003
Posts: 3884
Location: vpcmipstrm
bitRAKE 23 Aug 2008, 03:54
The one which improves compression of the final product because when a million people download your program every byte counts. Okay, maybe not.

(13) shouldn't work. Carry is only clear for EAX=0 and 1, iirc.

Edit: just tested it - wrong again - carry only set for \$80000000.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup

Last edited by bitRAKE on 11 Sep 2008, 01:53; edited 1 time in total
23 Aug 2008, 03:54

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
bitRAKE wrote:

[---]
(13) shouldn't work. Carry is only clear for EAX=0 and 1, iirc.

Edit: just tested it - wrong again - carry only set for \$80000000.

...so either we need an imul with no flags set or we need a conditional move before that. Some delayed-writing comes to mind which is possible only with microinstructional level - not accessible to us, humans.

_________________
My updated idol http://www.agner.org/optimize/
23 Aug 2008, 22:24
DOS386

Joined: 08 Dec 2006
Posts: 1898
DOS386 24 Aug 2008, 00:35
Nice

1 and 2 look good (hope they work also) ... IMUL, CMOVNTQ and SSE are no-go's IMHO
24 Aug 2008, 00:35
MCD

Joined: 21 Aug 2004
Posts: 602
Location: Germany
MCD 24 Aug 2008, 05:06
DOS386 wrote:

IMUL, CMOVNTQ and SSE are no-go's IMHO

From what I've tested, CMOVcc is as fast as other simple instructions like add, sub, and, mov... (BTW, there is no CMOVNTQ). So there is now reason why not to use them, except if you code for code size (AFAIR CMOVcc needs 3 bytes with only registers) or what to be for elder than 80686 CPUs.

For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU, while the other 32x32 -> 64 IMUL and MUL take longer.
24 Aug 2008, 05:06
vid
Verbosity in development

Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 24 Aug 2008, 07:56
Quote:
For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU

good to know. I'd quess it is because of smart use of lookup tables in CPU. Then, it's just some (not many) shiftings and addings
24 Aug 2008, 07:56

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Its because its a Core I think, at least Agner says that. Maybe newer AMD64's also have 1-cycle imuls. Latency is still bad I think...
Code:
```opcode           fused p0-5 p_0 p_1 p_5   lat RcpTh
IMUL        r16,r16    1     1       1         3    1
IMUL       r32,r32    1     1       1         3    1
IMUL       r64,r64    1     1   1             5    2
IMUL       r16,r16,i  1     1       1         3    1
IMUL       r32,r32,i  1     1       1         3    1
IMUL       r64,r64,i  1     1   1             5    2
```
24 Aug 2008, 10:13
baldr

Joined: 19 Mar 2008
Posts: 1651
baldr 06 Sep 2008, 21:37
That's what I call "Zwicky Box in action"...

Let's figure out, which variants will not work definitely.

#9 & #10 is an obvious examples:
Code:
```        xor     edx,edx
sub     edx,eax    ```
will result in CF clear only if eax is zero.

#13: imul will set CF (and OF) only if result does not fit in the destination (as bitRAKE noticed, -1 * (-2^31) does not).

Even sub eax, ebx is not so simple. Both signed and unsigned operands can (and certainly will) overflow the destination.

Difference (which I will refer from now on as R, notice that it is not the eax value after the subtraction) can be anywhere in range from -2^32+1 to 2^32-1. So |R| will nicely fit in unsigned 32-bit.

If operands are unsigned, CF (as a result of sub) indicates the need of negation as a sign bit of 33-bit subtraction result. This leaves us variants 2, 4, 6, 8, 14 and 15 (e.g.
Code:
```        cdq
xor     eax,edx
sub     eax,edx    ```
will fail if after subtraction eax had most significant bit set and CF is not set, for example \$80000000-0).

What if operands are signed?

Let's first check for signed overflow, that is, for R >= 2^31. In that case OF and SF are set. eax will contain unsigned 32-bit absolute difference.

In range 0 <= R < 2^31 OF and SF are clear. eax is our absolute difference.

In range -2^31 <= R < 0 OF is clear but SF is set. We need to negate eax.

In range -2^32+1 <= R < -2^31 OF is set buf SF is clear. We need to negate eax to obtain unsigned 32-bit absolute difference. Isn't it a kind of magic?

Alas, no code sample adheres to this (negate eax after subtraction if OF<>SF), like this one:
Code:
```;16
jnl     @f
neg     eax
@@:    ```
As a general rule of thumb: CF for unsigned, OF xor SF for signed.
06 Sep 2008, 21:37
edfed

Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 06 Sep 2008, 21:41
new instruction?
why didn't they do this before?

abs eax
abs [eax]
etc...
06 Sep 2008, 21:41
baldr

Joined: 19 Mar 2008
Posts: 1651
baldr 10 Sep 2008, 23:55
edfed wrote:
new instruction?
why didn't they do this before?

abs eax
abs [eax]
etc...
They're too busy to implement CRC32 hardwired to not-so-good polynomial 0x11EDC6F41... Damn, even AAD/AAM had hidden immediate operand, which extends their use.
10 Sep 2008, 23:55
Pirata Derek

Joined: 31 Oct 2008
Posts: 259
Location: Italy
Pirata Derek 24 Jul 2009, 11:46
For me the best way to get absolute difference is this:
Code:
```sub eax,ebx ; calculate difference
jns @F ; if not negative number then positive difference
neg eax ; convert negative difference to positive
@@: ; exit with eax = absolute (positive) difference
```

Matematical explanation:

1) If A > B ---> A - B = +Delta

2) If A < B ---> A - B = -Delta ---> - ( -Delta ) = +Delta

This way is more comprensible
24 Jul 2009, 11:46
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum