flat assembler
Message board for the users of flat assembler.

Index > Windows > Java faster than ASM!?

Goto page Previous  1, 2, 3, 4, 5
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
r22 wrote:
On a 64bit system you should be able to calculate the binary reciprical accurate enough to remove any error for all 32bit numbers using the 64x64->128 MUL.
Yes, easy enough, indeed you only need one more bit (33bits total) and a shift in your reciprocal code to make all 32 bit values correct
r22 wrote:
I wonder if calculating the reciprical on the fly using SSE2 would be faster or slower than DIV.
Unfortunately no, the accuracy of the SSE reciprocal is only ~12 bits.

BTW: RCPSD? No such instruction.
Post 07 Feb 2008, 15:42
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
LocoDelAssembly wrote:
Code:
include 'win32ax.inc'
use_reciprocal = 0 ; Comment this line to use DIV instruction
.data 
    start_time dd 0 
    _output rb 20 
.code 

start: 
            invoke MessageBox,NULL,"Click Ok to start","Speed Test",MB_OK 
            invoke GetTickCount 
            mov [start_time],eax 
            xor ecx, ecx
            mov ebx, 024924925h
            mov edi, 7

            align 16
            place:
            if defined use_reciprocal
                mov eax, ecx
                mul ebx
                mov eax, ecx
                sub eax, edx
                shr eax, 1
                add eax, edx
                shr eax, 2
            else
                xor edx, edx
                mov eax, ecx
                div edi
            end if
                dec ecx
                jnz place

            invoke GetTickCount
            sub eax, [start_time] 
            invoke wsprintf,_output,"%d milliseconds",eax 
            invoke MessageBox,NULL,_output,"Speed Test",MB_OK 
            invoke ExitProcess,0

 .end start    

On Athlon64 2.0 GHz Venice:
With DIV ~89000 ms
With reciprocal ~10000 ms

I have to confess that I cheated Razz The reciprocal code was generated by Visual C++ 2005 Express by using volatile on all variables.

Code:
int _tmain(int argc, _TCHAR* argv[])
{
00401000  sub         esp,8 
   _asm{int 3}
00401003  int         3    
        volatile unsigned int count = 1000000000;
00401004  mov         dword ptr [esp],3B9ACA00h 
       volatile unsigned int nop;
  while (count)
0040100B  mov         eax,dword ptr [esp] 
0040100E  test        eax,eax 
00401010  je          wmain+32h (401032h) 
 {
              nop = count / 7;
00401012  mov         ecx,dword ptr [esp] 
00401015  mov         eax,24924925h 
0040101A  mul         eax,ecx 
0040101C  sub         ecx,edx 
0040101E  shr         ecx,1 
00401020  add         ecx,edx 
00401022  shr         ecx,2 
00401025  mov         dword ptr [esp+4],ecx 
                count--;
00401029  add         dword ptr [esp],0FFFFFFFFh 
0040102D  mov         ecx,dword ptr [esp] 
00401030  jne         wmain+12h (401012h) 
   }
}
00401032  add         esp,8 
00401035  ret     

I tested the reciprocal against all posible dividends and produces correct result for them all, and if the count max value is the same as in the original code you can remove all the SHR-ADD because it will produce correct result without them (but with bigger dividends it works wrong, I not tested from exactly what dividend starts to fails but I tested 100000000 and below and worked)


I couldn't get this reciprocal trick to work for dividing by 10...why is that? I get the wrong answer

I used 0x1999999A ...what did I do wrong?
Post 14 Feb 2008, 06:36
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
You can't simply change the constant, you also need to change the fixup code that follows.

For div 10 you can do this:
Code:
mov eax,value
mov ecx,0xCCCCCCCD ;ecx=ceil(2^35/10)
mul ecx ;edx=value*8/10
shr edx,3 ;edx=quotient, the result    
This works for all 32 bit values.
Post 14 Feb 2008, 07:38
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2906
Location: [RSP+8*5]
bitRAKE
For the range [0,$7FFFFFFF]:
Code:
    imul eax,[number],$CCCCCCCD
    shr eax,1    
...more flexible with regard to register use, and faster.
Post 14 Feb 2008, 16:13
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
bitRAKE wrote:
For the range [0,$7FFFFFFF]:
Code:
    imul eax,[number],$CCCCCCCD
    shr eax,1    
...more flexible with regard to register use, and faster.
Hmm, let's see. My input value is 1, now mul 0xCCCCCCCD --> 0xCCCCCCCD, now shr val,1 --> 0x66666666. Somehow that does not appear to be the result of 1/10(=0).

bitRAKE, what are you on about?
Post 14 Feb 2008, 16:40
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2906
Location: [RSP+8*5]
bitRAKE
It only works for multiples of ten. Confused
(Let me get some coffee in me and wake up, lol.)
Post 14 Feb 2008, 17:10
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
revolution wrote:
You can't simply change the constant, you also need to change the fixup code that follows.

For div 10 you can do this:
Code:
mov eax,value
mov ecx,0xCCCCCCCD ;ecx=ceil(2^35/10)
mul ecx ;edx=value*8/10
shr edx,3 ;edx=quotient, the result    
This works for all 32 bit values.


Thanks

Do you have a link on more information about the reciprocal trick?
Post 14 Feb 2008, 22:36
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
itsnobody wrote:
Do you have a link on more information about the reciprocal trick?
Hmm, I don't have a link, because it can all be worked out with simple maths. Somewhere up there previously I show how to compute the expected range of accuracy for this type of thing.
Post 14 Feb 2008, 22:44
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
revolution wrote:
itsnobody wrote:
Do you have a link on more information about the reciprocal trick?
Hmm, I don't have a link, because it can all be worked out with simple maths. Somewhere up there previously I show how to compute the expected range of accuracy for this type of thing.


Ok thanks
Post 14 Feb 2008, 22:52
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
Post 15 Feb 2008, 06:49
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
Madis731 wrote:
http://www.azillionmonkeys.com/qed/adiv.html ?


Thanks, I made this macro using the formula:
Code:
macro RECPDIV a,b { ;a/b
      mov ebx,(0x100000000+(b-1))/b
      mov eax,a
      mul ebx
}
    


Works for all 32-bit values, it's a LOT faster than regular div, around 8-10 times faster

Here's the average results of the speed tests:
Java - 2600 milliseconds
ASM regular div - 2600 milliseconds
ASM reciprocal div - 280 milliseconds


Last edited by itsnobody on 17 Feb 2008, 08:56; edited 3 times in total
Post 17 Feb 2008, 07:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
itsnobody wrote:
I made this macro using the formula:
Code:
macro RECPDIV a,b { ;a/b
      mov ebx,(0x100000000+(b-1))/b
      mov eax,a
      mul ebx
      shr eax,5 ;Result in edx
}
    


Works for all 32-bit values, ...
The only problem is your macro doesn't work for all input values. A simple calculation will show the first value to fail for b=3 is a=2147483648.

BTW: Why do you have "shr eax,5"?
Post 17 Feb 2008, 08:42
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
revolution wrote:
itsnobody wrote:
I made this macro using the formula:
Code:
macro RECPDIV a,b { ;a/b
      mov ebx,(0x100000000+(b-1))/b
      mov eax,a
      mul ebx
      shr eax,5 ;Result in edx
}
    


Works for all 32-bit values, ...
The only problem is your macro doesn't work for all input values. A simple calculation will show the first value to fail for b=3 is a=2147483648.

BTW: Why do you have "shr eax,5"?


That's right, shr eax, 5 is unnecessary, its edited out now, I don't know how I confused myself to adding that in, must have been thinking about dividing by 2^32 or something

As for it failing at a=2147483648, it returns 715,827,883 when the correct integer answer should be 715,827,882

Seems to work for all values less than that though


Last edited by itsnobody on 17 Feb 2008, 09:00; edited 1 time in total
Post 17 Feb 2008, 08:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
Here is a table for b<=50
Code:
b     First errored input value
3  2147483648
5 1073741824
6 2147483651
7 1431655770
9 858993461
10 1073741829
11        613566766
12 536870915
13 1073741824
14        429496731
15 306783389
17 268435456
18 306783395
19 330382107
20 1073741839
21        252645140
22 238609315
23 390451587
24 536870927
25 1073741824
26        1073741837
27        858993470
28 178956987
29 330382122
30 306783389
31 159072872
33 148102349
34 268435473
35 178956994
36 134217755
37 143165579
38 134217747
39 252645158
40 178956999
41 1073741824
42        113025485
43 159072866
44 107374211
45 306783404
46 126322577
47 858993478
48 134217743
49 429496759
50 1073741849    
Post 17 Feb 2008, 08:58
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
revolution wrote:
Here is a table for b<=50
Code:
b      First errored input value
3  2147483648
5 1073741824
6 2147483651
7 1431655770
9 858993461
10 1073741829
11        613566766
12 536870915
13 1073741824
14        429496731
15 306783389
17 268435456
18 306783395
19 330382107
20 1073741839
21        252645140
22 238609315
23 390451587
24 536870927
25 1073741824
26        1073741837
27        858993470
28 178956987
29 330382122
30 306783389
31 159072872
33 148102349
34 268435473
35 178956994
36 134217755
37 143165579
38 134217747
39 252645158
40 178956999
41 1073741824
42        113025485
43 159072866
44 107374211
45 306783404
46 126322577
47 858993478
48 134217743
49 429496759
50 1073741849    


Wow

Is there any way to fix these errors using a fixed formula? Or is it not possible to use a fixed formula?
Post 17 Feb 2008, 09:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17253
Location: In your JS exploiting you and your system
revolution
itsnobody wrote:
Is there any way to fix these errors using a fixed formula? Or is it not possible to use a fixed formula?
You can use a fixed formula for all values of b if you use some adjustment code after the mul. But you can save time for many values that do not need adjustment. See the code I posted for /10. It needs no adjustment other than a single shift.

But for b<=50: 7,14,19,21,27,28,31,35,37,38,39,42,45 will all require additional adjustment other than a simple shift.
Post 17 Feb 2008, 09:10
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
revolution wrote:
itsnobody wrote:
Is there any way to fix these errors using a fixed formula? Or is it not possible to use a fixed formula?
You can use a fixed formula for all values of b if you use some adjustment code after the mul. But you can save time for many values that do not need adjustment. See the code I posted for /10. It needs no adjustment other than a single shift.

But for b<=50: 7,14,19,21,27,28,31,35,37,38,39,42,45 will all require additional adjustment other than a simple shift.


Yeah, I guess I'll have to modify the macro to include the correction errors
Post 17 Feb 2008, 11:41
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.