flat assembler
Message board for the users of flat assembler.
Index
> Main > Code optimization (AKA C vs. Asm) Goto page Previous 1, 2, 3, 4, 5 Next |
Author |
|
Borsuc 30 Dec 2009, 17:23
tthsqe wrote: ritRAKE, @f0dder: how is it difficult to implement it in that folder-scanning example? You just replace the call with a jmp but still push/pop the folder handles on the stack. (of course you could use an array instead of the stack if you wanted but that's not necessarily). (also replace the ret with a corresponding jmp obviously) _________________ Previously known as The_Grey_Beast |
|||
30 Dec 2009, 17:23 |
|
tthsqe 31 Dec 2009, 00:49
Ah, so you were talking about only keeping track of TWO values. I gave that a shot, but the results were about 50% slower than the built in function:
a = 0; b = 1; Accumulate[ Table[Timing[c = (b - a)^2; {a, b} = {a^2 + c, b^2 + c};][[1]], {i, 1, 30}]][[18 ;; 30]] {0.015, 0.047, 0.078, 0.171, 0.374, 0.78, 1.841, 4.118, 8.798, 18.361, 38.719, 81.713, 178.371} The built in function probably uses a host of other optimizations as well as the one you mentioned. (The timing must be accumulated because although mathematica caches computed results, this seems not affect the calculation of further results) |
|||
31 Dec 2009, 00:49 |
|
bitRAKE 31 Dec 2009, 04:27
I'd rather code it in x86 and compare.
Mathematica is awesome, but I no longer have a license. |
|||
31 Dec 2009, 04:27 |
|
tthsqe 31 Dec 2009, 06:05
Mathematica is awesome, but the new version 7 has some kind debugger deflector, so you can' see what's going on in ollydbg.
Quote:
Please do - I would not be uninterested in the results. But, you are going to have to code a good multiprecision multiplier (f(2^30) is about 10 million qwords), and I hear that mathematica has adopted the GMP library just for this reason. |
|||
31 Dec 2009, 06:05 |
|
bitRAKE 31 Dec 2009, 15:15
Don't you mean a good specialized squaring routined? Mathematica has adopted much open source on the back-end from what I've read.
|
|||
31 Dec 2009, 15:15 |
|
Borsuc 31 Dec 2009, 15:32
Can't you use floating point though?
bitRAKE wrote: Mathematica is awesome For computations (not symbolic) use PARI. Light and very fast. _________________ Previously known as The_Grey_Beast |
|||
31 Dec 2009, 15:32 |
|
bitRAKE 01 Jan 2010, 02:59
True, from a certain perspective Mathematica is bloated, but we cannot ignore the wider audience and scope of utility covered by Mathematica - from the initial design Wolfram has had a desire to bring mathematics to a wider audience as well as easing the digital use of existing mathematics in education. I have much respect for this - both the man and his work.
|
|||
01 Jan 2010, 02:59 |
|
tthsqe 01 Jan 2010, 04:14
borsuc,
what do you have in mind for a floating point approach? |
|||
01 Jan 2010, 04:14 |
|
bitRAKE 01 Jan 2010, 20:48
tthsqe wrote: f(2^30) is about 10 million qwords _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
01 Jan 2010, 20:48 |
|
tthsqe 02 Jan 2010, 04:18
I'm not sure how to find that out. Here is the cpuz report, though.
|
|||||||||||
02 Jan 2010, 04:18 |
|
Alphonso 02 Jan 2010, 04:46
What is this code really about?
Code: f: ;argument passed in eax cmp eax,1 jbe .1 push ebx lea ebx,[eax-2] dec eax call f xchg eax,ebx call f add eax,ebx pop ebx .1 ret It also seems the thread has turned into a 'fibber what's his name' optimization. Just out of curiosity how long does it take mathematica to produce an answer for say fib(1,000,000)? Happy new year to everyone, even though I'm a day late. |
|||
02 Jan 2010, 04:46 |
|
LocoDelAssembly 02 Jan 2010, 05:53
Alphonso, it computes this:
Code: f(0) = 1 f(1) = 1 f(n) = (f(n-1) + f(n-2))(mod 2^32) The LUT is of course the way to go for computing the real fib inside the bounds of not-astronomically-big unsigned integers but the idea was not to use precomputed values (note that in any moment the discussion was directed to real-life use except for the arbitrary precision). About the direction of this thread yeah, I was thinking about merging part of it into the other thread that was specific of fib. |
|||
02 Jan 2010, 05:53 |
|
bitRAKE 02 Jan 2010, 06:10
tthsqe wrote: I'm not sure how to find that out. Here is the cpuz report, though. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
02 Jan 2010, 06:10 |
|
Alphonso 02 Jan 2010, 07:23
LocoDelAssembly wrote: The idea was not to use precomputed values Code: f: cmp eax,1 jbe .1 mov ecx,eax xor eax,eax mov ebx,eax inc ebx @@: push eax add eax,ebx pop ebx dec ecx jnz @b .1: ret |
|||
02 Jan 2010, 07:23 |
|
revolution 02 Jan 2010, 07:52
Alphonso: You may have missed the discussion about sticking with the (inefficient) recursive algorithm for the purposes of testing. So your standard linear approach above, while certainly much faster, was not the intended purpose of all the past posts in here.
Although, I am at a loss to understand what the recursive test is supposed to prove or solve? |
|||
02 Jan 2010, 07:52 |
|
LocoDelAssembly 02 Jan 2010, 08:11
It started from here when tthsqe was talking about the inefficient Assembly code his compilers generated and that his own was the best. Then I've shown my code which was better on my AMD, then there was also some discussion about using iteration with an stack, and then... Well, you'll have to read
|
|||
02 Jan 2010, 08:11 |
|
tthsqe 02 Jan 2010, 08:41
Alphonso,
I guess the point was the obvious fact that it the really put the effort in you can gain some improvements over some compilers - they just don't know where to put the variables in the case of that c code. mathematica timings for various values: f(1000^2): 0.05 f(1024^2): 0.05 f(1000^3): 106 f(1024^3): 112 Also, it was mentioned before, but the xadd operation can replace your push, add, pop sequence. don't forget about xadd ! bitRAKE, how did you get that I have two cores? only one 64 bit core trapped in a 32 bit OS .... anyways, the result for three runs of the program was 1.83, 2.08, 1.84 GB/s |
|||
02 Jan 2010, 08:41 |
|
Alphonso 02 Jan 2010, 14:12
Thanks for those numbers tthsqe. I tried going up to f(1 million) using the add method and three streams but it took about 2 minutes, so pretty slow really. Thanks for the tip on XADD but it seems to run 25% slower than a push-add-pop on my laptop.
Code: ;;Test XADD vs Push-Add-Pop format PE GUI 4.0 include 'win32a.inc' TestTime=0x10000000 ;======================================== section '.code' code readable executable ;----------------------------------------;Loop Time Check invoke GetTickCount mov [TStart],eax xor eax,eax mov ebx,1 align 16 mov ecx,TestTime @@: dec ecx jnz @b invoke GetTickCount sub eax,[TStart] mov [LoopTime],eax ;----------------------------------------;PushPop Time Check invoke GetTickCount mov [TStart],eax xor eax,eax mov ebx,1 align 16 mov ecx,TestTime @@: push eax add eax,ebx pop ebx dec ecx jnz @b invoke GetTickCount sub eax,[TStart] mov [PushPopTime],eax ;----------------------------------------;Xadd Time Check invoke GetTickCount mov [TStart],eax xor eax,eax mov ebx,1 align 16 mov ecx,TestTime @@: xadd eax,ebx dec ecx jnz @b invoke GetTickCount sub eax,[TStart] mov [XaddTime],eax ;----------------------------------------;Results mov eax,[LoopTime] sub [PushPopTime],eax sub [XaddTime],eax cinvoke wsprintf,Buff,wsformat,eax,[PushPopTime],[XaddTime] invoke MessageBox,0,Buff,Tit,1 MB_RIGHT or MB_ICONINFORMATION invoke ExitProcess,0 ;======================================== section '.data' data readable writeable TStart dd ? LoopTime dd ? PushPopTime dd ? XaddTime dd ? Tit db 'P-P vs XADD',0 wsformat db 'Loop Time : %04ums',10,'Push Pop Time : %04ums',10,'Xadd Time : %04ums',0 Buff rb 128 ;======================================== section '.idata' import data readable writeable library kernel,'KERNEL32.DLL',\ user,'USER32.DLL' import kernel,\ GetTickCount,'GetTickCount',\ ExitProcess,'ExitProcess' import user,\ MessageBox,'MessageBoxA',\ wsprintf,'wsprintfA' ;======================================== |
|||
02 Jan 2010, 14:12 |
|
Borsuc 02 Jan 2010, 19:39
My initial suggestion with float was because of the much higher dynamic range, unfortunately addition doesn't do very well in floats and since fibonacci sequence is additions the precision loss might be huge. It may not have been such a good idea.
|
|||
02 Jan 2010, 19:39 |
|
Goto page Previous 1, 2, 3, 4, 5 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.