flat assembler
Message board for the users of flat assembler.
Index
> Main > Best way to do absolute difference with general purpose regs |
Which |eax-ebx| is the fastest/best for you? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Total Votes : 8 |
Author |
|
revolution 22 Aug 2008, 10:47
Best in what sense? Fastest in what sense? I'm being serious here, there are many ways to define how we measure those terms.
In a critical performance loop I would try to use the version that uses the least CPU clock-cycles/resources in the average weighted case. In other non-performance code I would try to use the easiest to understand at a quick glance. Last edited by revolution on 23 Aug 2008, 06:23; edited 1 time in total |
|||
22 Aug 2008, 10:47 |
|
edfed 22 Aug 2008, 14:26
Code: sub eax,ebx jnl @f neg eax @@: in general, absolute value is not alone, we have to combine it with other operation, then, i prefer to use an intermediate step to compute it. |
|||
22 Aug 2008, 14:26 |
|
vid 22 Aug 2008, 14:35
depends on circumstances ...
|
|||
22 Aug 2008, 14:35 |
|
vid 22 Aug 2008, 14:37
(but the list of ways is still cool, it just looks weird inside such overgeneralizing question)
|
|||
22 Aug 2008, 14:37 |
|
Madis731 22 Aug 2008, 14:48
13 all the way! I haven't measured its size or speed yet, but I think Core 2- and future-proof it is (Nehalem etc.).
But what about: Code: ;...insert SSE-code here... |
|||
22 Aug 2008, 14:48 |
|
MCD 22 Aug 2008, 15:50
Well, I was trying to code some GCD algorithm and needed to calculate the absolute difference of the 2 numbers each time in a loop and once at the beginning. Also, because of the GCD algorithm, SSE/MMX stuff seem to be unfortunate.
I coded all without testing them, and I'm especially uncertain if variant 13 actually works. the critical point is whether IMUL updates the carry flag as needed. _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || |
|||
22 Aug 2008, 15:50 |
|
bitRAKE 23 Aug 2008, 03:54
The one which improves compression of the final product because when a million people download your program every byte counts. Okay, maybe not.
(13) shouldn't work. Carry is only clear for EAX=0 and 1, iirc. Edit: just tested it - wrong again - carry only set for $80000000. ( http://www.asmcommunity.net/board/index.php?topic=4184.0 ) _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup Last edited by bitRAKE on 11 Sep 2008, 01:53; edited 1 time in total |
|||
23 Aug 2008, 03:54 |
|
Madis731 23 Aug 2008, 22:24
bitRAKE wrote:
...so either we need an imul with no flags set or we need a conditional move before that. Some delayed-writing comes to mind which is possible only with microinstructional level - not accessible to us, humans. |
|||
23 Aug 2008, 22:24 |
|
DOS386 24 Aug 2008, 00:35
Nice
1 and 2 look good (hope they work also) ... IMUL, CMOVNTQ and SSE are no-go's IMHO |
|||
24 Aug 2008, 00:35 |
|
MCD 24 Aug 2008, 05:06
DOS386 wrote:
From what I've tested, CMOVcc is as fast as other simple instructions like add, sub, and, mov... (BTW, there is no CMOVNTQ). So there is now reason why not to use them, except if you code for code size (AFAIR CMOVcc needs 3 bytes with only registers) or what to be for elder than 80686 CPUs. For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU, while the other 32x32 -> 64 IMUL and MUL take longer. |
|||
24 Aug 2008, 05:06 |
|
vid 24 Aug 2008, 07:56
Quote: For some reason, IMUL with only 32bit operands is as fast as the other simple instructions on my CPU good to know. I'd quess it is because of smart use of lookup tables in CPU. Then, it's just some (not many) shiftings and addings |
|||
24 Aug 2008, 07:56 |
|
Madis731 24 Aug 2008, 10:13
Its because its a Core I think, at least Agner says that. Maybe newer AMD64's also have 1-cycle imuls. Latency is still bad I think...
Code: opcode fused p0-5 p_0 p_1 p_5 lat RcpTh IMUL r16,r16 1 1 1 3 1 IMUL r32,r32 1 1 1 3 1 IMUL r64,r64 1 1 1 5 2 IMUL r16,r16,i 1 1 1 3 1 IMUL r32,r32,i 1 1 1 3 1 IMUL r64,r64,i 1 1 1 5 2 |
|||
24 Aug 2008, 10:13 |
|
baldr 06 Sep 2008, 21:37
That's what I call "Zwicky Box in action"...
Let's figure out, which variants will not work definitely. #9 & #10 is an obvious examples: Code: xor edx,edx sub edx,eax #13: imul will set CF (and OF) only if result does not fit in the destination (as bitRAKE noticed, -1 * (-2^31) does not). Even sub eax, ebx is not so simple. Both signed and unsigned operands can (and certainly will) overflow the destination. Difference (which I will refer from now on as R, notice that it is not the eax value after the subtraction) can be anywhere in range from -2^32+1 to 2^32-1. So |R| will nicely fit in unsigned 32-bit. If operands are unsigned, CF (as a result of sub) indicates the need of negation as a sign bit of 33-bit subtraction result. This leaves us variants 2, 4, 6, 8, 14 and 15 (e.g. Code: cdq xor eax,edx sub eax,edx What if operands are signed? Let's first check for signed overflow, that is, for R >= 2^31. In that case OF and SF are set. eax will contain unsigned 32-bit absolute difference. In range 0 <= R < 2^31 OF and SF are clear. eax is our absolute difference. In range -2^31 <= R < 0 OF is clear but SF is set. We need to negate eax. In range -2^32+1 <= R < -2^31 OF is set buf SF is clear. We need to negate eax to obtain unsigned 32-bit absolute difference. Isn't it a kind of magic? Alas, no code sample adheres to this (negate eax after subtraction if OF<>SF), like this one: Code: ;16 jnl @f neg eax @@: |
|||
06 Sep 2008, 21:37 |
|
edfed 06 Sep 2008, 21:41
new instruction?
why didn't they do this before? abs eax abs [eax] etc... |
|||
06 Sep 2008, 21:41 |
|
baldr 10 Sep 2008, 23:55
edfed wrote: new instruction? |
|||
10 Sep 2008, 23:55 |
|
Pirata Derek 24 Jul 2009, 11:46
For me the best way to get absolute difference is this:
Code: sub eax,ebx ; calculate difference jns @F ; if not negative number then positive difference neg eax ; convert negative difference to positive @@: ; exit with eax = absolute (positive) difference Matematical explanation: 1) If A > B ---> A - B = +Delta 2) If A < B ---> A - B = -Delta ---> - ( -Delta ) = +Delta This way is more comprensible |
|||
24 Jul 2009, 11:46 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.