flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page Previous 1, 2, 3 ... 15, 16, 17, 18, 19, 20 Next |
Author |
|
Kuemmel 25 Nov 2008, 17:02
Nope...my computer is far too old to run that...did you install it or know somebody who did and can make a test run ?
|
|||
25 Nov 2008, 17:02 |
|
kalambong 26 Nov 2008, 02:30
I'm testing it on a quad core PC. Not very stable, btw. Will try run the benchmark this weekend.
|
|||
26 Nov 2008, 02:30 |
|
adnimo 28 Nov 2008, 13:03
same processor at 2.3Ghz:
Quote: --------------------------- I wonder why the results became slower after each run |
|||
28 Nov 2008, 13:03 |
|
Kuemmel 31 Dec 2008, 13:52
Hi people,
I coded a modified version of my latest benchmark, the differences are: - Detects the amount of logical cores and sets up the equal amount of threads (in all old versions there were always 16 threads set up, what's of course not necessary and creates some small overhead on low core machines...) and displays the core count in the end. - Bug corrected, if somebody got a machine with more than 16 cores, a crash would happen, now goes up to 32 logical cores. I attached it, please send some results, anything welcome, especially, machines with lots of cores and HyperThreading to see if the detection of the cores works good. ...and Happy New Year 2009 !!!
|
|||||||||||
31 Dec 2008, 13:52 |
|
asmfan 31 Dec 2008, 15:04
Once again
Code: invoke CloseWindow, [mainhwnd] doesn't destroy mainwnd - just minimizes. If you click tab on taskbar black window covers all top level windows with black 800*600 rectangle till someone press enter on active MessageBox window with results. |
|||
31 Dec 2008, 15:04 |
|
Kuemmel 31 Dec 2008, 15:25
asmfan wrote: ...doesn't destroy mainwnd - just minimizes. If you click tab on taskbar black window covers all top level windows with black 800*600 rectangle till someone press enter on active MessageBox window with results. As I said before also in this thread, changing from "CloseWindow" to "DestroyWindow" doesn't show on my computers any result then any more...so I'm kind of lost how to fix your problem... |
|||
31 Dec 2008, 15:25 |
|
revolution 31 Dec 2008, 15:31
The MS docs have this to say:
Win32 docs wrote: The CloseWindow function minimizes (but does not destroy) the specified window |
|||
31 Dec 2008, 15:31 |
|
Ivan2k2 31 Dec 2008, 15:48
laptop with t8100 and vista sp1 32bit
sse2 1465.557 fpu 617.535 |
|||
31 Dec 2008, 15:48 |
|
bitRAKE 31 Dec 2008, 16:19
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2
--------------------------- Speed [Million Iterations / Second] : 5382.133 Logical CPU cores detected : 8 Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU --------------------------- Speed [Million Iterations / Second] : 2320.920 Logical CPU cores detected : 8 (Seems very consistent, but I only ran it a dozen times.) |
|||
31 Dec 2008, 16:19 |
|
kempis 31 Dec 2008, 23:43
Intel(R) Celeron(R) D CPU 3.06GHz:
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2 --------------------------- Speed [Million Iterations / Second] : 511.534 Logical CPU cores detected : 1 Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU --------------------------- Speed [Million Iterations / Second] : 220.299 Logical CPU cores detected : 1 |
|||
31 Dec 2008, 23:43 |
|
dacid 01 Jan 2009, 08:54
AMD Athlon 64 X2 4200+ (2,2ghz)
Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2 --------------------------- Speed [Million Iterations / Second] : 720.516 Logical CPU cores detected : 2 Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU --------------------------- Speed [Million Iterations / Second] : 493.345 Logical CPU cores detected : 2 |
|||
01 Jan 2009, 08:54 |
|
Kuemmel 05 Jan 2009, 17:46
I released the modified version as V0.53I on my page.
So basically I implemented the logical CPU core detection, what sets the maximum of threads created (before always set to 16 threads, what can cause some overhead). Furthermore the display of logical CPU cores and CPU brand is implemented for the result message box similar to Kempis Julia Code. Iteration code remains unchanged for SSE2 and FPU. Fixed some alignments and got rid of some left over stuff. Still no solution regarding the Destroy/CloseWindow, what seems to be problem only on some machines. Website: http://www.mikusite.de/pages/x86.htm Link: http://www.mikusite.de/x86/KMB_V0.53I-32b-MT.zip @BitRAKE: I got another test Result with a Dual 5450 x 2 (similar to yours...). His systems seems to run quite at almost the efficiency of the single CPU results compared to yours. For your interest, the results were posted here: http://forums.2cpu.com/showthread.php?t=76178&page=8 The Dual CPU stuff seems to be complicated to configure, I guess. |
|||
05 Jan 2009, 17:46 |
|
asmfan 05 Jan 2009, 18:53
Kuemmel wrote: Still no solution regarding the Destroy/CloseWindow, what seems to be problem only on some machines. Just make DestroyWindow not CloseWindow call and use MSDN to check what these functions for. CloseWindow DestroyWindow First doesn't send WM_CLOSE as you suppose it. If you need what you really need (WM_CLOSE) - send WM_SYSCOMMAND + SC_CLOSE. Then you Send/PostMessage instead CloseWindow. But i'm not sure if it all work cuz there is no message loop at all. Although some message can be sent directly to WindowProc not through message loop. Rewrite your WindowProc so it returns correct values if it processes WM_* messages. Eliminate unneeded processing. WM_PAINT - is it neededed with empty processing? CS_DBLCLKS or CS_HREDRAW or CS_VREDRAW - what are they for? _________________ Any offers? |
|||
05 Jan 2009, 18:53 |
|
bitRAKE 05 Jan 2009, 19:41
Kuemmel wrote: @BitRAKE: I got another test Result with a Dual 5450 x 2 (similar to yours...). His systems seems to run quite at almost the efficiency of the single CPU results compared to yours. For your interest, the results were posted here: Might be the video driver, and/or faster bus having an impact. (He's pulling 3x the watts my system does!) [edit] Finally, figured it out. The Harpertown processors have additional hardware prefetch features and enabling any one of them increases the latency for L1 and L2 cache! Code: Latency OFF ON <--- advanced prefetch DCU,IP, or DCA L1 3 4 L2 15 18 Code: Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_SSE2 Speed [Million Iterations / Second] : 6212.214 Logical CPU cores detected : 8 Kümmel Mandelbrot Benchmark V 0.53H-32b-MT_FPU Speed [Million Iterations / Second] : 2702.717 Logical CPU cores detected : 8 Thank you for bringing this to light. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
05 Jan 2009, 19:41 |
|
Kuemmel 06 Jan 2009, 12:00
@BitRAKE: Okay, nice now, same result as he got. I updated my page.
asmfan wrote: Just make DestroyWindow not CloseWindow call and use MSDN to check what these functions for. Thanks for the hints. So finally I went all over the Windows code (It was just historical copy/paste from examples, which made this mess, I guess. The CS_... was just crap copy/paste, replaced all the stuff at window creation to stuff really needed, I hope.) So I tracked down that problem why DestroyWindow was not working was in the WindowProc routine. I took your advice, about eliminate unneeded processing and looked also what Kempis did with his JuliaSSE and far as I see it I don't need any processing basically, as it's a benchmark and no interactive windows needed. I use DestroyWindow now in the end and stripped down the whole WindowProc to this (I think similar like Kempis): Code: proc WindowProc hwnd, wmsg, wparam, lparam invoke DefWindowProc, [hwnd], [wmsg], [wparam], [lparam] ret endp On my machines everything works fine now with this and the Window now is finally correctly destroyed. May be it's a matter of programming philosophy, some might think there should be some windows messages that should be handled here ? Everybody seems to handle this different, as I see in other examples, especially for DDRAW and OPENGL. The corrected version is attached.
|
|||||||||||
06 Jan 2009, 12:00 |
|
asmfan 06 Jan 2009, 12:49
Much better) But imperfect.
I see msg.wParam & msg struct declared but not used at all. For bench it is nice enough but if it will become something like screencaver it will definitely need ungreedy message loop (see MSDN for details) with PeekMessage and drawing inside this loop... Also. to reduce the size: move initialized variables to the beginning of the section and uninit ("?") to the end. Also for alignment sake move huge initializes structures, then huge variables (16,8,4,2 - bytes accordingly) to the very beginning then strings and 1-byte aligned vars finally. Immidiately after - aligned (on 16 for example) uninit data with the same priority as above should reside. |
|||
06 Jan 2009, 12:49 |
|
Kuemmel 06 Jan 2009, 14:15
asmfan wrote: Much better) But imperfect. asmfan wrote: For bench it is nice enough but if it will become something like screencaver it will definitely need ungreedy message loop (see MSDN for details) with PeekMessage and drawing inside this loop... I'll take care about the order of the variables in my next projects, promise, just for now I experienced some weird result changes in the past 2 years when changing stuff there too much and at the moment it runs at best and it still fits in any 1st level cache anyway, except of may be ancient CPU's. |
|||
06 Jan 2009, 14:15 |
|
geppy 06 Jan 2009, 14:33
Core2Duo E8400 3584MHz (Wolfdale)
KMB_V0.53I-32b-MT_SSE2 Speed [Million Iterations / Second] : 2492.325 Logical CPU cores detected : 2 KMB_V0.53I-32b-MT_FPU Speed [Million Iterations / Second] : 1050.172 Logical CPU cores detected : 2 ;------------------------------------------------------------- Pentium M 1.86GHz (Dothan) KMB_V0.53I-32b-MT_SSE2 Speed [Million Iterations / Second] : 205.000 Logical CPU cores detected : 1 KMB_V0.53I-32b-MT_FPU Speed [Million Iterations / Second] : 221.314 Logical CPU cores detected : 1 ;-------------------------------------------------------------- tested on AMD Athlon 64 4200+ X2 2200MHz(EDITED, was 2000Mhz) and got about the same result as on your webpage Last edited by geppy on 06 Jan 2009, 15:49; edited 1 time in total |
|||
06 Jan 2009, 14:33 |
|
Kuemmel 06 Jan 2009, 15:23
@geppy: Thanks for the results ! On my homepage soon.
While looking through my copy/paste code I found another thing may be: Code: invoke CreateThread,NULL, 0, thread_draw_fpu, ebx, \ REALTIME_PRIORITY_CLASS or CREATE_SUSPENDED, tId ...snip... invoke SetThreadAffinityMask,dword [edi],esi So I think this (including the may be also good thing to set the threadpriority, and the not needed tID) would do the job correctly and as good or even better ? -> Code: invoke CreateThread,NULL, 0, thread_draw_fpu, ebx, CREATE_SUSPENDED, NULL ...snip... invoke SetThreadPriority, dword[edi], THREAD_PRIORITY_TIME_CRITITCAL invoke SetThreadAffinityMask, dword [edi], esi |
|||
06 Jan 2009, 15:23 |
|
Goto page Previous 1, 2, 3 ... 15, 16, 17, 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.