flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page Previous 1, 2, 3 ... , 18, 19, 20 Next |
Author |
|
Madis731 30 Nov 2009, 20:22
10 years ago I remember finding http://www.toymaker.info/Games/html/text.html useful. Maybe there are better ones now.
|
|||
30 Nov 2009, 20:22 |
|
bitshifter 30 Nov 2009, 20:33
tthsqe wrote: BTW, I find the pop up box annoying too, is there an api function to easily draw strings to the screen? The best way would be to dump the results into a txt file so its easy to copy/paste your results back into a thread. _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
30 Nov 2009, 20:33 |
|
Madis731 02 Dec 2009, 08:57
Or don't show anything and put the results on clipboard (through WinAPI) and let the user know somehow that you did it.
Like a shortcut key "S" to copy info to clipboard... |
|||
02 Dec 2009, 08:57 |
|
tthsqe 07 Dec 2009, 08:08
Madis731,
Here is a better version. On my computer the gflops value does jump around probibly due to the timer (QPC), but the flopc (flop's per clock cycle) value is very stable - just use these to find out the gflops if you know your clock speed. Mine maxed out around 3.293 flopc
|
|||||||||||
07 Dec 2009, 08:08 |
|
Madis731 07 Dec 2009, 10:38
It shows 14.5GFLOPS max (~2.9FLOP/c)
EDIT: ~3.12FLOP/c in the black area |
|||
07 Dec 2009, 10:38 |
|
tthsqe 08 Dec 2009, 08:22
Thanks for trying it. Would someone with a core i7 mind trying it with HT on/off? Here is a version with a minor bug fixed (I think the code is 100% correct now).
|
|||||||||||
08 Dec 2009, 08:22 |
|
LocoDelAssembly 08 Dec 2009, 20:43
OK, in all cases two 800 MHz DDR2 memory sticks of 2 GB each 5.0-5-5-18-24-T2 (working in unganged dual channel mode). Micro AMD Phenom II X4 955 (3.2 GHz full, 800 MHz idle), 64KB Instruction + 64KB Data L1 cache, 512 KB L2 cache each core, shared 6MB L3 cache. Video nVidia GeForce 8600 GTS. OS Windows 7.
xorpd! KMB_V0.57_2T_X1.exe: Speed [million iterations / second] : 806,589 KMB_V0.57_2T_X2.exe: Speed [million iterations / second] : 1472,073 KMB_V0.57_2T_X3.exe: Speed [million iterations / second] : 851,829 KMB_V0.57_2T_X4.exe: Speed [million iterations / second] : 1901,334 KMB_V0.57_MT_X1.exe: Speed [million iterations / second] : 1535,603 KMB_V0.57_MT_X2.exe: Speed [million iterations / second] : 2779,852 KMB_V0.57_MT_X3.exe: Speed [million iterations / second] : 1621,580 KMB_V0.57_MT_X4.exe: Speed [million iterations / second] : 3478,698 Kuemmel --------------------------- Kümmel Mandelbrot Benchmark V 0.53I-32b-MT_FPU --------------------------- Speed [Million Iterations / Second] : 1436.887 Logical CPU cores detected : 4 CPU Brand detected : AMD Phenom(tm) II X4 955 Processor --------------------------- --------------------------- Kümmel Mandelbrot Benchmark V 0.53I-32b-MT_SSE2 --------------------------- Speed [Million Iterations / Second] : 3818.517 Logical CPU cores detected : 4 CPU Brand detected : AMD Phenom(tm) II X4 955 Processor --------------------------- --------------------------- Kümmel Mandelbrot Benchmark V 0.53I-32b-MT_SSE4.1 --------------------------- Sorry, your CPU does not support SSE4.1... --------------------------- Haven't looked into the sources much so I'll ask: do you warm up the cores before starting the measurements? [edit]tthsqe, I've also tested your MandelbrotPlot, but the one in page 19 all crash here and the one at page 18 shows me no stats (do I have to do something to make them visible?). |
|||
08 Dec 2009, 20:43 |
|
Kuemmel 08 Dec 2009, 22:49
tthsqe wrote: Thanks for trying it. Would someone with a core i7 mind trying it with HT on/off? Here is a version with a minor bug fixed (I think the code is 100% correct now). My i7@3200 MHz / Windows Vista 64 shows the following in the black area: HT on: around 43.3 GFlops peak HT off: around 43.1 GFlops peak ...so may be almost now difference... @LocoDelAssembly: Neither me or Xorpd! use any kind of warm up code...I guess it's not a big issue, but of course I can't confirm it...I think we had this discussion before...what would be a good 'warm up' ? Your results show same efficiency as the results I got on my webpage, I just wonder why the normally faster 64bit MT_X4 version is slower than 32 bit... |
|||
08 Dec 2009, 22:49 |
|
LocoDelAssembly 08 Dec 2009, 23:31
Quote:
I believe something like this per core should be enough: Code: call [GetTickCount] lea ebx, [eax+1000] .loop: call [GetTickCount] cmp eax, ebx jb .loop I think I told this already but I think I'll say it again, I think it would be better to perform the writes to regular RAM rather than directly to the video memory, that way the benchmark will not be contaminated by the video card's own capabilities (unless you really want to know the CPU performance under these conditions). If you are discarding the first run or always taking the max score of them rather than the average then just simply ignore all I've said above (except the video card thing). |
|||
08 Dec 2009, 23:31 |
|
windwakr 08 Dec 2009, 23:53
I just ran version "Kümmel Mandelbrot Benchmark V 0.53I-32b-MT"(latest on the site) and these are my results:
Code: FPU Speed: 481.548 SSE2 Speed: 1113.876 Cores: 2 Intel Pentium D 3.4GHZ (it's a 945) It surprised me how much faster the SSE version is compared to the FPU version. |
|||
08 Dec 2009, 23:53 |
|
kalambong 18 Dec 2009, 00:49
Someone wrote a plugin for Photoshop which draws 3D fractals
Something that looks like or More info at http://www.subblue.com/blog/2009/12/13/mandelbulb Project page at http://www.subblue.com/projects/mandelbulb More pictures at http://www.subblue.com/gallery/album/89 and http://www.flickr.com/photos/subblue/[/img] |
|||
18 Dec 2009, 00:49 |
|
f0dder 18 Dec 2009, 13:31
Oh, the bottom picture is that broccoli thing mentioned in that über-long thread Kuemmel posted a link to - I ended up skimming it to the end, nice pictures there
|
|||
18 Dec 2009, 13:31 |
|
kalambong 19 Dec 2009, 11:34
Thanks for the reminder for that über-long thread
Went there and saw that it's locked, finally, and coincidentally, they locked that thread today. |
|||
19 Dec 2009, 11:34 |
|
Kuemmel 19 Dec 2009, 13:13
...that thread was just closed because it became to big, so everything is no distrubuted in 4 threads here:
http://www.fractalforums.com/the-3d-mandelbulb/ ...at the moment it looks like with this 8th order broccoli Mandelbulb they hardly use CPU-based stuff, iteration depth is often only up to 20 or something, so the focus is almost totally on the GPU...I really wonder if it's worth porting to ASM as there's lots of needed trigonometric functions that to be 'correct' might have to be executed by the standard x87 code...anyway the pics there are great, explore them if you got time ! |
|||
19 Dec 2009, 13:13 |
|
kalambong 08 Apr 2010, 04:47
Oh, btw, any update on your benchmark?
|
|||
08 Apr 2010, 04:47 |
|
kalambong 18 Apr 2010, 09:40
Just in case you guys are still interested in Mandelbulb, someone made a video of it:
http://www.youtube.com/watch?v=W3x4uJJqs_w http://vimeo.com/10740680 Enjoy ! |
|||
18 Apr 2010, 09:40 |
|
Kuemmel 22 Jan 2011, 23:54
No code update yet, just some results added for the new Intel Sandy Bridge. This might be interesting for some. The gain (compared to the former Intel I7-9xx architecture for Hyperthreading based CPU's) for the FPU version is around +18% and for SSE2/SSE4 around +3%.
What one could see is that despite the again twice as wide instruction path the gain isn't very big. I think that the insutruction units are really busy with the code, not much more computing 'air' there that could be used. The verdict for me is clearly that any heavy floating point code now needs to switch to AVX to use the full potential of these new CPU's. Hope to find some time this year to give it a try. |
|||
22 Jan 2011, 23:54 |
|
Madis731 24 Jan 2011, 07:20
Actually the instruction path isn't twice the size AFAIK and even the SSE datapath is still 128-bit wide. I think what we see here is SB is clock-for-clock a bit better.
Even if it were true you wouldn't be able to spot the difference right away because the code needs to use AVX. You need to have Windows 7 SP1 (still in RC) to be able to code AVX (which I find very disturbing). Every 256-bit AVX instruction that accesses data does it by 128-bit strides and while the instructions itself aren't slower you cannot have as many parallel instructions running as with SSE. http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=99&Itemid=1&limit=1&limitstart=6 Quote:
|
|||
24 Jan 2011, 07:20 |
|
revolution 24 Jan 2011, 07:33
Madis731 wrote:
But, of course, that quote does appear to assume a certain data usage model that may not be evident in some applications. I expect in some fields of application that AVX can give 100% improvement over SSE. |
|||
24 Jan 2011, 07:33 |
|
Goto page Previous 1, 2, 3 ... , 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.