flat assembler
Message board for the users of flat assembler.
Index
> Windows > Mandelbrot Benchmark FPU/SSE2 released Goto page Previous 1, 2, 3 ... 14, 15, 16 ... 18, 19, 20 Next |
Author |
|
Kuemmel 04 Aug 2008, 19:09
bitRAKE wrote: Currently, i'm using the on board video, and doubt that has any effect on the results. My guess would be memory contention of the thread data between the two cpus. This could be easily tested by having threads select a data area based on which cpu is being used - cacheline aligned and all that goodness. Eh, I'm lazy though, so maybe just 16 copies of the work you've already done - should see a change. Dirty cachelines going across the bus has to slow things down. ...hm, okay. Just for a test, can you run my very old slow version and post the results ? It's there: http://www.mikusite.de/x86/KMB_V0.53_MT.zip I did the threading and drawing a bit different...may be that gives a hint... |
|||
04 Aug 2008, 19:09 |
|
bitRAKE 05 Aug 2008, 03:41
Kuemmel wrote: Just for a test, can you run my very old slow version and post the results ? It's there: http://www.mikusite.de/x86/KMB_V0.53_MT.zip Also, I ran your ten times version from the 2cpu.com thread: ( 2319.196 FPU, 4241.355 SSE2 ). All cores very close to 100% for the complete duration. I'm tuning up (i.e. turning off) Vista and only expect those numbers to improve. I'll create a test of the current version the day after tomorrow. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
05 Aug 2008, 03:41 |
|
f0dder 05 Aug 2008, 04:20
Sorry for OT, but: is there any particular reason you got that monster machine, bitRAKE? Or is it "because I could"?
|
|||
05 Aug 2008, 04:20 |
|
Kuemmel 05 Aug 2008, 16:49
bitRAKE wrote: ( 1125.018 FPU, 2314.821 SSE2 ) Aren't all threads still writing to the same global data? Are you able to discern anything additional from these data points? All cores hit 100% as before. Hm, at least your cpu's show 100% load compared to the 70% of the 2cpu.com guy. It seems that the later version is less efficient, may be it's really the changed screen drawing and threading. I will try to make some test versions may be without graphics output...just I might need about 4 weeks as I'll going on holiday ...in the meantime also feel free to experiment with my code in case you want to...it's just weird that these things happen only with 2 cpu's, with 1 cpu even with 4 cores everything seems okay... |
|||
05 Aug 2008, 16:49 |
|
Madis731 05 Aug 2008, 17:27
Maybe the new Nehalem will give us answers, when memory controller is brought to the chip and memory-based operations are faster. Q4 will be interesting as anandtech "promises" the same WOW! effect that Core 2 did.
|
|||
05 Aug 2008, 17:27 |
|
bitRAKE 07 Aug 2008, 23:10
Finally got the computer all together and development software moved over. (No more complaints about taking up the kitchen table.) My previous comments are in error - now I think it has more to do with Windows threading than anything happening with the memory. Was able to get a substantial (imho) 11% improvement with only 8 threads and an interleave of two:
Code: 2249.829 / 4190.117 ; 16 threads, interleave 1 2329.580 / 4276.216 ; 16 threads, interleave 2 2392.059 / 4459.482 ; 8 threads, interleave 1 2429.290 / 4676.610 ; 8 threads, interleave 2 Code: 4845.812 - 16 threads, interleave 1 4771.547 - 16 threads, interleave 2 5083.158 - 8 threads, interleave 1 5001.501 - 8 threads, interleave 2 Quickman bench.log: Code: 3.567s - 4 threads 1.835s - 8 threads 2.031s - 16 threads 2.014s - 32 threads *16 threads actually run faster without SetThreadAffinityMask! _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
07 Aug 2008, 23:10 |
|
Kuemmel 13 Aug 2008, 03:47
bitRAKE wrote: 2329.580 / 4276.216 ; 16 threads, interleave 2 ...hm, very interesting !!! Did you check if a higher interleave, like 10 or so has an even better effect ? |
|||
13 Aug 2008, 03:47 |
|
bitRAKE 13 Aug 2008, 04:03
Kuemmel wrote:
Edit: just rechecked...have many processes running, too: Code: ; 4247.126 ; 16 threads, interleave 1 ; 4396.672 ; 16 threads, interleave 2 ; 4427.855 ; 16 threads, interleave 3 ; 4459.482 ; 16 threads, interleave 4 ; 4427.854 ; 16 threads, interleave 5 ; 4427.855 ; 16 threads, interleave 6 ; 4427.855 ; 16 threads, interleave 8 ; 4365.926 ; 16 threads, interleave 10 ; 3865.805 ; 16 threads, interleave 25 ; 4335.608 ; 8 threads, interleave 1 ; 4557.135 ; 8 threads, interleave 2 ; 4524.112 ; 8 threads, interleave 3 ; 4540.564 ; 8 threads, interleave 4 ; 4524.112 ; 8 threads, interleave 5 ; 4491.565 ; 8 threads, interleave 6 ; 4491.565 ; 8 threads, interleave 8 ; 4459.482 ; 8 threads, interleave 10 ; 4135.619 ; 8 threads, interleave 25 Edit Again: Sorry, those figures (new ones above) had SetThreadAffinityMask commented throughout! Using SetThreadAffinityMask: Code: ; 4396.672 ; 8 threads, interleave 1 ; 4607.583 ; 8 threads, interleave 2 ; 4607.583 ; 8 threads, interleave 3 ; 4607.583 ; 8 threads, interleave 4 ; 4607.583 ; 8 threads, interleave 5 ; 4607.583 ; 8 threads, interleave 6 ; 4524.112 ; 8 threads, interleave 8 ; 4507.780 ; 8 threads, interleave 10 ; 4190.117 ; 8 threads, interleave 25 _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
13 Aug 2008, 04:03 |
|
Kuemmel 20 Aug 2008, 05:07
Got first Via Nano (1.8 GHz) result:
FPU: 112,380 MIter (Efficiency: 62,4) SSE2: 379,070 MIter (Efficiency: 210,6) ...not bad, I would say, especially SSE2 compared that old VIA stuff... |
|||
20 Aug 2008, 05:07 |
|
Madis731 20 Aug 2008, 10:34
Has anyone got Atom - I can't find any Eee PC 900 having that. They keep coming with Celerons and whatnot.
EDIT: Oh, I just found N270 entries on Kümmel's site so no prob Btw, here are E8400@3GHz times with XP 32-bit: Code: FPU 857.004 / Eff. 142.8 SSE2 2010.717 /Eff. 335.1 SSE2PM 1926.937 /Eff. 321.2 And the Q6600@2.4GHz with Server 2003 64-bit: Code: FPU 1363.161 / Eff. 142.0 SSE2 3121.637 / Eff. 325.2 SSE2PM 3030.716 / Eff. 315.7 |
|||
20 Aug 2008, 10:34 |
|
MCD 25 Aug 2008, 04:29
The reason why AMD CPUs are so bad with in executing this benchmark is because they are pretty much more optimized for 3DNow instead of SSE, even the newer ones that got SSE3 and SSSE3.
So I would like to see someone making a comparative benchmark, one that uses the SSE1/2/3 code for Intel CPUs and the 3DNow!,3DNow!+,MMX,MMX+ code(you can take my mandelbrot benchmark code which I have posted earlier in this thread for that) for the AMD CPUs. Does anyone have enough of both CPUs brands? |
|||
25 Aug 2008, 04:29 |
|
Madis731 25 Aug 2008, 08:45
@MCD: The problem is that AMD's 3DNow! (jeesh how hard it is to type this thing no-caps, caps, caps, no-caps, no-caps, caps) is using only 64-bit datatype - same as MMX and this means it has to be at least 2xfaster than Intel on any SSE calculations, to be faster overall...
I think if we can prove that AMD's 3DNow! can be about half the speed of Intel SSE, then we can agree that they have done a good job! |
|||
25 Aug 2008, 08:45 |
|
bitRAKE 17 Sep 2008, 04:52
Wow, big change with new video card:
0.53H-32b-MT_FPU : 2562.961 0.53H-32b-MT_SSE2 : 5525.022 (with 8 threads 5889.822) _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
17 Sep 2008, 04:52 |
|
Kuemmel 13 Nov 2008, 19:39
Hi guys,
I got the first i7 Nehalem result (Intel Core i7 920 / 4000 Mhz / 4 Cores / HT on): FPU: 2221,806 MIter/s - Efficiency: 138,9 SSE2: 6151,010 MIter/s - Efficiency: 384,4 It was just one run, so may be some inaccuracy is possible, anyway, the result means about +13% for SSE2 and a -4% for FPU compared to same clocked Core2Duo. The FPU results seems a bit strange, but the SSE2 is more or less the level of other floating point intense benchmarks...so another nice achievement by Intel...and with memory intense benches that thing seems to fly...I guess spring'09 is time for shopping @Bitrake, just discovered your mail now, really interesting with the graphics card, that bottleneck didn't show up at all on 1 CPU systems...strange. |
|||
13 Nov 2008, 19:39 |
|
Ivan2k2 14 Nov 2008, 11:24
p8400 - 2.26 GHz - vista 32bit
sse2 1508.037 fpu 653.404 |
|||
14 Nov 2008, 11:24 |
|
adnimo 17 Nov 2008, 05:17
Kuemmel wrote:
Sorry for the delay! I tried it on the P2 today, and sadly it's running on 9x - the screen just went black and I can see the hourglass cursor but nothing seems to be going on... in fact I couldn't return to the system, oh well. how long do you think it would take to benchmark an athlon xp 2600, I could try on that one (didn't see any on your table) |
|||
17 Nov 2008, 05:17 |
|
Kuemmel 17 Nov 2008, 18:00
...no problem for beeing late...seems to be a problem with the OS with the PII, I guess.
Athlon xp 2600 would be interesting ! From my Athlon result I would think it delivers a result of about 227,xxx MIter/s for FPU. SSE2 isn't supported anyway. |
|||
17 Nov 2008, 18:00 |
|
adnimo 18 Nov 2008, 11:49
How long do you think it would take to finish the benchmark on that Athlon?
Regarding the OS issue, I don't have any spare license of XP so I can't really do much on that side, sadly. |
|||
18 Nov 2008, 11:49 |
|
kalambong 25 Nov 2008, 03:53
have you tried running the benchmark on the new Windows 7 beta??
|
|||
25 Nov 2008, 03:53 |
|
Goto page Previous 1, 2, 3 ... 14, 15, 16 ... 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.