flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page 1, 2, 3 ... 18, 19, 20 Next |
Author |
|
Kuemmel 17 Apr 2006, 19:05
Hi people,
due to a lot of help here and spending various hours looking on lots of source codes, finding fast routines, I coded a mandelbrot fractal benchmark to see how good or bad recent CPU's are on double precision calculation power. You can download the executable and source from: http://www.mikusite.de/pages/x86.htm It's the 'KMB 0.3' file. What is interesting, is that the FPU of the Pentium 4 is really bad, regarding the clock speed, but on SSE2 it flies. On the other hand, normalised to the clock speed the efficiency of AMD is the same. Other interesting finding is the good performance of the Pentium M and that AMD didn't change anything on their FPU since years, when I look at the result of my old Athlon. Does anybody got PII or PIII or K6 to test ? Would be interesting. The FPU code should work starting from Pentium Pro level (I used a command not in the instruction set of earlier Pentiums). The SSE2 of course only for compatible CPU's. Didn't implement some SSE2 detection, should be done in the future. The graphics display is written in DirectDraw and consumes less than 1 or 2% of the computation time. Any comments or results welcome. It's my first ever x86 assembler coded stuff, so don't kick me too hard |
|||
17 Apr 2006, 19:05 |
|
vid 17 Apr 2006, 19:34
ntoebook with AMD turion64
FPU - 99.941, then 2 times 102.something SSE - 139.518, then 140 good? ps: good start |
|||
17 Apr 2006, 19:34 |
|
vid 17 Apr 2006, 20:50
yes, your calculation is precise, it's 1.6ghz
|
|||
17 Apr 2006, 20:50 |
|
madmatt 17 Apr 2006, 22:02
Hi Kuemmel,
Here are my results on a 2.7ghz Celeron Processor (Desktop Computer): FPU -> 95.314 SSE2 -> 223.737 Have a question, Are you including video writes in your timing? Do you have a version that plots in system memory? This may improve speed on some systems. |
|||
17 Apr 2006, 22:02 |
|
r22 18 Apr 2006, 04:46
AMD x2 3800+ 2ghz per core, 1gb ddr lowend ram
132,005 FPU 179,135 SSE I'm running Windows XP64bit, so I think my results are skewed a little because the app is running over the WOW64 for 32bit compatibility. |
|||
18 Apr 2006, 04:46 |
|
Kuemmel 18 Apr 2006, 05:47
Thanx guys, I'll include your results in my table soon !
@madmatt: Yes, I include video times. But on my Sempron 3100+ I figured out that it takes less than 1% of computation time, so I think it's okay to include it. Of course to make a total real comparison it would be better to take it out, but somehow I don't like to make a benchmark let's say 'too synthetic' and more real world. But thanks for the hint with the memory, at the moment I plot directly on the screen, as far as I understand DirectDraw. |
|||
18 Apr 2006, 05:47 |
|
cod3b453 18 Apr 2006, 07:10
I keep getting memory access violation
... 004011BB 8B00 mov eax, dword ptr [eax] ; from cominvk ??? ... |
|||
18 Apr 2006, 07:10 |
|
chris 18 Apr 2006, 11:13
My result:
FPU 102.89 SSE@ 134.817 on Pentium M 1.6GHz The strange thing is that the compiled program runs ok but if I compile them on my machine and I also get access violation in MessageBoxA. It seems like a stack corruption, since the eip is loaded with some value in the stack. Code: ... invoke CloseWindow,[mainhwnd] ;int3 invoke MessageBox,NULL,text_result,caption,[flags] ; <-- 0xc0000005 invoke ExitProcess, [msg.wParam] ... |
|||
18 Apr 2006, 11:13 |
|
LocoDelAssembly 18 Apr 2006, 13:52
AMD Athlon64 3200+ (socket 939, clock 2.0 GHz)
FPU 129.594 SSE 176.149 My SSE performance is worst than a Celeron |
|||
18 Apr 2006, 13:52 |
|
Vasilev Vjacheslav 18 Apr 2006, 14:43
Intel Northwood (socket 475, clock 2,3 GHz)
fpu: 92.141 sse: 220.615 ps. approx. as madmatt have |
|||
18 Apr 2006, 14:43 |
|
r22 18 Apr 2006, 17:24
Interesting results, AMD processors seem to have better FPU but worse SSE performance. I wonder if this holds true in 64bit as well or if it's just a side affect of 32bit computing on a 64bit processor .?
|
|||
18 Apr 2006, 17:24 |
|
Kuemmel 18 Apr 2006, 17:53
@r22: Basically you are right, on the other hand it looks both have the same SSE2 unit, just Intel can clock their processor higher...for 64bit I don't know...don't expect any change. Just for the Intel Conroe CPU coming in 3rd quarter this year, Intel promised an almost double performance of SSE2 regarding the same clock speed of a Pentium M, on who's architecture it's based. Early benchmarks seem to prove this. There are even rumors that the hole FPU unit will be kicked out in some years and switched completely to SSE. The benefit is obvious, when you program it...the FPU has this stupid 8 register stack and SSE has 'real' 8 registers. I hope to manage to code a multi-threaded version of the benchmark, so dual core processor should be able to do really double speed.
Regarding the compiler errors/other errors...I don't know yet. My include directory is a mess of a lot of different include files from my start with flat assembler, older versions and include files from other users here. I will try compiling it with a 'fresh' latest FASM installation...may be I will come to some conclusion about it...or did anybody manage to compile it already on a 'fresh' FASM installation ? |
|||
18 Apr 2006, 17:53 |
|
vbVeryBeginner 18 Apr 2006, 18:22
my result. on p4 2.66
fpu: 94+- sse2: 195+- |
|||
18 Apr 2006, 18:22 |
|
UCM 18 Apr 2006, 20:48
hmm, odd...
on my Athlon 64 X2 4200+ 2.2ghz: Test 1 FPU 142.006 SSE2 195.621 Test 2 FPU 142.964 SSE2 195.621 |
|||
18 Apr 2006, 20:48 |
|
Kuemmel 18 Apr 2006, 22:29
I put all new results from you on my webpage, have a look as you like ! The list is ordered according to maximum SSE2 performance. But for the FPU you can easily see that the Athlon's got the crown. Everything of your results more or less like expected, except:
@Vasilev Vjacheslav: Are you sure your CPU runs at 2,3 Ghz ? If I'm not wrong, should be more like 2,5 Ghz or they did some nice enhancements for the Northwood P4. @vbVeryBeginner: Your results seem to be too slow...sure you don't run other applications or are in power save mode or something ? |
|||
18 Apr 2006, 22:29 |
|
vbVeryBeginner 18 Apr 2006, 23:21
@kuemmel
is directx version an issue? the dxdiag shows me 8.1 (4.08.01.0810) and i am using onboard vga via/s3g unichrome igp |
|||
18 Apr 2006, 23:21 |
|
Madis731 19 Apr 2006, 08:39
http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-2992.png
FPU: 87.704 SSE2: 212.521 The Pentium IIIs were the best CPUs ever - I still got one at home - and I'm not planning on letting go http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-697.png - too bad this hasn't got an SSE2/3 instruction set, but at least it has SSE. I will test for FPU on it when I get home... I'm a bit worried about the switch between FPU and SSE in your SSE-code: Code: fstp qword[rz_temp_fpu] ;Here it must wait for FPU to finish waht its doing xorpd xmm2,xmm2 ; xmm2: 0 | 0 (zeros xmm6) Code: shufpd xmm5,xmm5,0 ; xmm5: iz | iz MOV [plot_x],0 x_loop: ;Here its the other way round - although SSE is MUCH faster so I'm not too much worried about this... fld qword[dz] Is there a possibility to make all the code in SSE? Second suggestion is that when Pentium 4s are so bad in branch prediction penalties and lots of other stuff you can make minor replacements like all INCs/DECs with ADD...1/SUB...1 Maybe then (hopefully) Pentium 4s won't lose *that* much |
|||
19 Apr 2006, 08:39 |
|
Kuemmel 19 Apr 2006, 17:11
Hi guys,
I uploaded an INCLUDE directory on my webpage. So for those who want to recompile it, this should help. Get it from the same page like the benchmark. It works with the latest FASM version. But I didn't had time to check what's really wrong or different. It didn't work with a normal new installtion of the latest FASM. @vbVeryBeginner: Hm, don't know, I also run it on a very cheap shared memory graphics card on my notebook with an AMD Sempron. Here the results are consistent...any virus scanner, whatever resource-stealing-software running ? @madis731: I stay tuned for your PIII-result ! Your P4-result is also strange (check my website, I included it). It seems too slow compared to others and the P4 who has the lead has a real shitty graphics card ! About SSE adaption for PIII, the problem is that SSE has only single precision, SSE2 introduced double precision. So theoretically I could even calculate 4 pixels in one time than 2 like now (SSE registers are always 128bit, so either 4 packed single precision 4x32bit or 2 packed double 2x64bit)...just when you go very 'deep' into a mandelbrot fractal single precision isn't enough precise. You are right with the mix of FPU and SSE2, this could be optimized to only SSE2, but at the moment it's not really a point, as the routines spend up to 2400 times more time in the SSE2-only or FPU-only iteration loop and not in the part of the code you mentioned. But I keep it in mind for next release. I'm still worried about some P4 results...seems to be a strange variation where I thought the CPU core is really the same and the graphics normally shouldn't matter like you see on AMD systems... |
|||
19 Apr 2006, 17:11 |
|
Madis731 19 Apr 2006, 20:47
Hi, I'm back with my PIII results:
the exact specs are the ones that I already linked http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-697.png Code: FPU: 43.731 ;When you calculate the FPU speed then its 62473i/MHz ;which is almost as good as other CPUs in that category SSE2: crashed of course Last edited by Madis731 on 20 Apr 2006, 12:21; edited 1 time in total |
|||
19 Apr 2006, 20:47 |
|
Goto page 1, 2, 3 ... 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.