flat assembler
Message board for the users of flat assembler.

Index > Windows > Mandelbrot Benchmark FPU/SSE2 released

Goto page Previous  1, 2, 3 ... 15, 16, 17, 18, 19, 20  Next
Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
Nope...my computer is far too old to run that...did you install it or know somebody who did and can make a test run ?
Post 25 Nov 2008, 17:02
View user's profile Send private message Visit poster's website Reply with quote
kalambong



Joined: 08 Nov 2008
Posts: 165
kalambong
I'm testing it on a quad core PC. Not very stable, btw. Will try run the benchmark this weekend.
Post 26 Nov 2008, 02:30
View user's profile Send private message Reply with quote
adnimo



Joined: 18 Jul 2008
Posts: 49
adnimo
Smile I could finally try this out on the athlon xp

here are the results of 3 runs, however, notice that the first run was done with a lot of programs running on the background and even though the benchmark runs in realtime priority, I think it affected the end result a bit.


Quote:
---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 204.865
---------------------------
OK
---------------------------

---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 205.000
---------------------------
OK
---------------------------

---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 205.000
---------------------------
OK
---------------------------



AMD Athlon(TM) XP 2600+

Hope that helps.

PS: it only took about 20 to 30 seconds or so to run. Running at 1.92Ghz but I guess you were able to determine that by the results I could try at 2.3Ghz (I think that's the next clock) if you want to.
Post 28 Nov 2008, 12:49
View user's profile Send private message Reply with quote
adnimo



Joined: 18 Jul 2008
Posts: 49
adnimo
same processor at 2.3Ghz:


Quote:
---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 246.478
---------------------------
OK
---------------------------

---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 246.380
---------------------------
OK
---------------------------

---------------------------
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 246.283
---------------------------
OK
---------------------------



I wonder why the results became slower after each run
Post 28 Nov 2008, 13:03
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
Hi people,

I coded a modified version of my latest benchmark, the differences are:

- Detects the amount of logical cores and sets up the equal amount of threads (in all old versions there were always 16 threads set up, what's of course not necessary and creates some small overhead on low core machines...) and displays the core count in the end.

- Bug corrected, if somebody got a machine with more than 16 cores, a crash would happen, now goes up to 32 logical cores.

I attached it, please send some results, anything welcome, especially, machines with lots of cores and HyperThreading to see if the detection of the cores works good.

...and Happy New Year 2009 !!!


Description:
Download
Filename: KMB_V_0.53H_mod.zip
Filesize: 27.99 KB
Downloaded: 83 Time(s)

Post 31 Dec 2008, 13:52
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Once again
Code:
                invoke  CloseWindow, [mainhwnd]
    

doesn't destroy mainwnd - just minimizes. If you click tab on taskbar black window covers all top level windows with black 800*600 rectangle till someone press enter on active MessageBox window with results.
Post 31 Dec 2008, 15:04
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
asmfan wrote:
...doesn't destroy mainwnd - just minimizes. If you click tab on taskbar black window covers all top level windows with black 800*600 rectangle till someone press enter on active MessageBox window with results.

As I said before also in this thread, changing from "CloseWindow" to "DestroyWindow" doesn't show on my computers any result then any more...so I'm kind of lost how to fix your problem...
Post 31 Dec 2008, 15:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17473
Location: In your JS exploiting you and your system
revolution
The MS docs have this to say:
Win32 docs wrote:
The CloseWindow function minimizes (but does not destroy) the specified window
...
The window is minimized by reducing it to the size of an icon and moving the window to the icon area of the screen. Windows displays the window's icon instead of the window and draws the window's title below the icon.

To destroy a window, an application must use the DestroyWindow function.
Post 31 Dec 2008, 15:31
View user's profile Send private message Visit poster's website Reply with quote
Ivan2k2



Joined: 08 Sep 2004
Posts: 80
Location: Russia, Angarsk
Ivan2k2
laptop with t8100 and vista sp1 32bit
sse2 1465.557
fpu 617.535
Post 31 Dec 2008, 15:48
View user's profile Send private message ICQ Number Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2940
Location: vpcmipstrm
bitRAKE
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2
---------------------------
Speed [Million Iterations / Second] : 5382.133
Logical CPU cores detected : 8

Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 2320.920
Logical CPU cores detected : 8

(Seems very consistent, but I only ran it a dozen times.)
Post 31 Dec 2008, 16:19
View user's profile Send private message Visit poster's website Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Intel(R) Celeron(R) D CPU 3.06GHz:
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2
---------------------------
Speed [Million Iterations / Second] : 511.534
Logical CPU cores detected : 1

Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 220.299
Logical CPU cores detected : 1
Post 31 Dec 2008, 23:43
View user's profile Send private message Reply with quote
dacid



Joined: 31 Aug 2008
Posts: 57
dacid
AMD Athlon 64 X2 4200+ (2,2ghz)

Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2
---------------------------
Speed [Million Iterations / Second] : 720.516
Logical CPU cores detected : 2

Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
---------------------------
Speed [Million Iterations / Second] : 493.345
Logical CPU cores detected : 2
Post 01 Jan 2009, 08:54
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
I released the modified version as V0.53I on my page.

So basically I implemented the logical CPU core detection, what sets the maximum of threads created (before always set to 16 threads, what can cause some overhead). Furthermore the display of logical CPU cores and CPU brand is implemented for the result message box similar to Kempis Julia Code. Iteration code remains unchanged for SSE2 and FPU. Fixed some alignments and got rid of some left over stuff. Still no solution regarding the Destroy/CloseWindow, what seems to be problem only on some machines.

Website: http://www.mikusite.de/pages/x86.htm
Link: http://www.mikusite.de/x86/KMB_V0.53I-32b-MT.zip

@BitRAKE: I got another test Result with a Dual 5450 x 2 (similar to yours...). His systems seems to run quite at almost the efficiency of the single CPU results compared to yours. For your interest, the results were posted here:
http://forums.2cpu.com/showthread.php?t=76178&page=8

The Dual CPU stuff seems to be complicated to configure, I guess.
Post 05 Jan 2009, 17:46
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Kuemmel wrote:
Still no solution regarding the Destroy/CloseWindow, what seems to be problem only on some machines.

Just make DestroyWindow not CloseWindow call and use MSDN to check what these functions for.
CloseWindow
DestroyWindow
First doesn't send WM_CLOSE as you suppose it.
If you need what you really need (WM_CLOSE) - send WM_SYSCOMMAND + SC_CLOSE. Then you Send/PostMessage instead CloseWindow.
But i'm not sure if it all work cuz there is no message loop at all. Although some message can be sent directly to WindowProc not through message loop.
Rewrite your WindowProc so it returns correct values if it processes WM_* messages. Eliminate unneeded processing. WM_PAINT - is it neededed with empty processing?
CS_DBLCLKS or CS_HREDRAW or CS_VREDRAW - what are they for?

_________________
Any offers?
Post 05 Jan 2009, 18:53
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2940
Location: vpcmipstrm
bitRAKE
Kuemmel wrote:
@BitRAKE: I got another test Result with a Dual 5450 x 2 (similar to yours...). His systems seems to run quite at almost the efficiency of the single CPU results compared to yours. For your interest, the results were posted here:
http://forums.2cpu.com/showthread.php?t=76178&page=8

The Dual CPU stuff seems to be complicated to configure, I guess.
Thanks. That is a rather extreme difference.
Might be the video driver, and/or faster bus having an impact.
(He's pulling 3x the watts my system does!)

[edit] Finally, figured it out. The Harpertown processors have additional hardware prefetch features and enabling any one of them increases the latency for L1 and L2 cache!
Code:
         Latency
       OFF     ON  <--- advanced prefetch DCU,IP, or DCA
L1      3       4
L2     15      18    
Here are the updated benchmark numbers:
Code:
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2
Speed [Million Iterations / Second] : 6212.214
Logical CPU cores detected : 8  

Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU
Speed [Million Iterations / Second] : 2702.717
Logical CPU cores detected : 8    
...I still don't know what the C states must be enabled in EIST - else I get a %40 drop. Sad

Thank you for bringing this to light.

_________________
¯\(°_o)/¯ unlicense.org
Post 05 Jan 2009, 19:41
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@BitRAKE: Okay, nice now, same result as he got. I updated my page.

asmfan wrote:
Just make DestroyWindow not CloseWindow call and use MSDN to check what these functions for.
CloseWindow
DestroyWindow
First doesn't send WM_CLOSE as you suppose it.
If you need what you really need (WM_CLOSE) - send WM_SYSCOMMAND + SC_CLOSE. Then you Send/PostMessage instead CloseWindow.
But i'm not sure if it all work cuz there is no message loop at all. Although some message can be sent directly to WindowProc not through message loop.
Rewrite your WindowProc so it returns correct values if it processes WM_* messages. Eliminate unneeded processing. WM_PAINT - is it neededed with empty processing?
CS_DBLCLKS or CS_HREDRAW or CS_VREDRAW - what are they for?

Thanks for the hints. So finally I went all over the Windows code (It was just historical copy/paste from examples, which made this mess, I guess. The CS_... was just crap copy/paste, replaced all the stuff at window creation to stuff really needed, I hope.)

So I tracked down that problem why DestroyWindow was not working was in the WindowProc routine. I took your advice, about eliminate unneeded processing and looked also what Kempis did with his JuliaSSE and far as I see it I don't need any processing basically, as it's a benchmark and no interactive windows needed.

I use DestroyWindow now in the end and stripped down the whole WindowProc to this (I think similar like Kempis):
Code:
proc WindowProc  hwnd, wmsg, wparam, lparam
  invoke  DefWindowProc, [hwnd], [wmsg], [wparam], [lparam]
  ret
endp    

On my machines everything works fine now with this and the Window now is finally correctly destroyed. May be it's a matter of programming philosophy, some might think there should be some windows messages that should be handled here ?
Everybody seems to handle this different, as I see in other examples, especially for DDRAW and OPENGL. The corrected version is attached.


Description:
Download
Filename: KMB_V0.53I-32b-MT.zip
Filesize: 33.95 KB
Downloaded: 46 Time(s)

Post 06 Jan 2009, 12:00
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Much better) But imperfect.
I see msg.wParam & msg struct declared but not used at all.
For bench it is nice enough but if it will become something like screencaver it will definitely need ungreedy message loop (see MSDN for details) with PeekMessage and drawing inside this loop...
Also. to reduce the size: move initialized variables to the beginning of the section and uninit ("?") to the end. Also for alignment sake move huge initializes structures, then huge variables (16,8,4,2 - bytes accordingly) to the very beginning then strings and 1-byte aligned vars finally. Immidiately after - aligned (on 16 for example) uninit data with the same priority as above should reside.
Post 06 Jan 2009, 12:49
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
asmfan wrote:
Much better) But imperfect.
I see msg.wParam & msg struct declared but not used at all.
Another left over from the past. Deleted the declaration and use now "invoke ExitProcess, 0". Latest files can be found on my homepage.
asmfan wrote:
For bench it is nice enough but if it will become something like screencaver it will definitely need ungreedy message loop (see MSDN for details) with PeekMessage and drawing inside this loop...
I agree totally, some people wanna go back from a screensaver to the OS once in a while Wink

I'll take care about the order of the variables in my next projects, promise, just for now I experienced some weird result changes in the past 2 years when changing stuff there too much and at the moment it runs at best and it still fits in any 1st level cache anyway, except of may be ancient CPU's.
Post 06 Jan 2009, 14:15
View user's profile Send private message Visit poster's website Reply with quote
geppy



Joined: 06 Jan 2009
Posts: 16
geppy
Core2Duo E8400 3584MHz (Wolfdale)

KMB_V0.53I-32b-MT_SSE2
Speed [Million Iterations / Second] : 2492.325
Logical CPU cores detected : 2

KMB_V0.53I-32b-MT_FPU
Speed [Million Iterations / Second] : 1050.172
Logical CPU cores detected : 2

;-------------------------------------------------------------

Pentium M 1.86GHz (Dothan)

KMB_V0.53I-32b-MT_SSE2
Speed [Million Iterations / Second] : 205.000
Logical CPU cores detected : 1

KMB_V0.53I-32b-MT_FPU
Speed [Million Iterations / Second] : 221.314
Logical CPU cores detected : 1

;--------------------------------------------------------------

tested on AMD Athlon 64 4200+ X2 2200MHz(EDITED, was 2000Mhz) and got about the same result as on your webpage


Last edited by geppy on 06 Jan 2009, 15:49; edited 1 time in total
Post 06 Jan 2009, 14:33
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 198
Location: Stuttgart, Germany
Kuemmel
@geppy: Thanks for the results ! On my homepage soon.

While looking through my copy/paste code I found another thing may be:
Code:
invoke CreateThread,NULL, 0, thread_draw_fpu, ebx, \
            REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId
...snip...
invoke SetThreadAffinityMask,dword [edi],esi    
As I can't find it documented that REALTIME_PRIORITY_CLASS can be passed here as an argument on MSDN, I suspect it was an left over of something else. The realtime thing is set by SetPriorityClass before anyway.

So I think this (including the may be also good thing to set the threadpriority, and the not needed tID) would do the job correctly and as good or even better ? ->
Code:
invoke  CreateThread,NULL, 0, thread_draw_fpu, ebx, CREATE_SUSPENDED, NULL
...snip...
invoke SetThreadPriority, dword[edi], THREAD_PRIORITY_TIME_CRITITCAL
invoke SetThreadAffinityMask, dword [edi], esi    
It seems to works as good as before here and makes more sense to me...
Post 06 Jan 2009, 15:23
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3 ... 15, 16, 17, 18, 19, 20  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.