flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2, 3, 4, 5 ... 18, 19, 20 Next |
Author |
|
Kuemmel 10 May 2006, 17:44
New results seem to prove the effect of the Hyper-Threading:
http://www.mikusite.de/pages/x86.htm But strange enough, it doesn't live up for the Intel Dual Cores. One guy tested a Pentium EE with Dual Core and Hyper-Threading...same result per MHz like a Pentium D without HT...really interesting, but I got no conclusion why... |
|||
![]() |
|
Madis731 10 May 2006, 19:49
BTW, the guys who get bad results with multiple cores should take care that
ALL their CORES AND THREADS wind up and not end in a power-saving ritual. They should leave TaskManager running in the first run and if really 2 threads (4 threads on EE) wind up to at least 95% each. If it does happen then run the test without anything running in the background - I mean NOTHING: you can shut down many services and you can disable you LAN and shutdown AntiVirus (I think you know why disabling LAN is neccessary ![]() ...on to my post... What I've read about HT was that it only doubled ALU and nothing else. I haven't heard anything about double-integer ![]() FPU is also an ALU but I think you can draw a line there where FPU should do multiple multiplys or divisions at once. It can only do 1 clock math. Sure I can test it with one thread, that is "affinity". You can set it in the task-manager, but I can't be sure that I can find a way to *start* a process with one affinity. I know for sure that I can dynamically change it on the runtime. Maybe there's an API for starting processes with default and specific affinity. I thought I'd PM you, Kuemmel, but I figured that other would like to know my results too. So... Score on my http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-697.png was: 43.979 The problem now is that I'm at home and can't test the Prescott ![]() ...and the results are in: http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-2651.png Code: FPU : 98.070 SSE2: 234.093 ![]() remember Pentium II ![]() http://enos.itcollege.ee/~mkalme/PAHN/Up/cpu-350.png 22.166 I can see the relevance here: 350*2=700MHz and 22.166=44.332 which has only a little higher inter/MHz. If you look closely then it has half-pumped FSB compared to my P!!!, because both run at 100MHz FSB. Another fact is that the Pentium II has 2x more L2 cache. My P4 at home has the same amount back as the Pentium II did, but it has less L1 data (code is about the same 12microops >= 16KB). EDIT: Okay, now I tested it both ways on my Prescott at work: Code: With 1 and 2 threads respectively: SSE2: 222.666 / 323.726 FPU : 95.316 / 174.792 |
|||
![]() |
|
kuscsikp 11 May 2006, 10:27
Hi people!
http://board.flatassembler.net/topic.php?t=5232 It is another CPU benchmark. /works in linux, win, dos too/ It is not finished yet.../i am working hardly/ If someone wants to help me, please, send me some results! ![]() |
|||
![]() |
|
Kuemmel 11 May 2006, 14:47
Madis731 wrote: EDIT: ...first thanx for all the testing ! The PII result again proves that they didn't change too much on the FPU directly regarding PIII or P-M, just P4 sucks without HT and is still more bad with HT...but the clock rate matters... EDIT: When I look at the factor between PII and the Intel Core Duo...the basic law that every two years the processor power doubles is more than fullfilled...8 years between them...2*2*2*2=16...here the factor is >19 ![]() What's still strange is that the HT effect doesn't take place with the Dual Core (like you can see on a result on my home page)...may be the HT can't deal with that...I must say all the results from AMD architecture are way more consistent...it look kind of hard to optimize for Intel or in a common way for all...that's one more reason to look forward to Conroe architecture for me...in July we'll know more... @kuscsikp: I'll give it a run when I'm back home !!! |
|||
![]() |
|
Kuemmel 11 May 2006, 21:45
Hi guys,
I'm still discovering new things ![]() Due to the localisation of the variables there seems to be a problem with the order of them, I found this out, when I had to put a new local variable inside the proc specification. The difference wasn't too big (like result of 158,xxx to 163,xxx) but noticable for the SSE2 code...so either there's a problem with the actual position of the variable due to cache or whatever or the alignment... Can I align local variables somehow in the proc ? I tried: proc ...uses ebi ..., data:dq, locals align 16 data dq ... endl ...but didn't work...error... |
|||
![]() |
|
f0dder 11 May 2006, 22:02
Hm, I don't know if FASM can align local variables - so we should keep QWORD variables first, grouped together. And we also need to do some align-by-16 to ebp/esp.
|
|||
![]() |
|
Kuemmel 12 May 2006, 20:36
f0dder wrote: Hm, I don't know if FASM can align local variables - so we should keep QWORD variables first, grouped together. And we also need to do some align-by-16 to ebp/esp. Hm, how can I achieve this ? I see from a HEX editor that the addresses of the local variables are stored liek [epb - 0xx], but where or what is epb and how to modify the location ? |
|||
![]() |
|
f0dder 13 May 2006, 10:42
It should be possible by manipulating the prologue code... currently the prologue is something like:
Code: .code:00401480 push ebp .code:00401481 mov ebp, esp .code:00401483 sub esp, 4Ch Instead of this, ESP should be adjusted enough so there's room for both the locals but also any necessary alignment. Then, "mov ebp, esp" should be changed into code that makes sure EBP is aligned-to-16. Of course this poses some problems wrt. accessing function variables... *mumble* |
|||
![]() |
|
Kuemmel 13 May 2006, 11:50
I think I'll make a post in the main thread...should be interesting fron non-fractal-people, too
![]() |
|||
![]() |
|
Madis731 13 May 2006, 11:50
...or the ideas&projects section
|
|||
![]() |
|
Kuemmel 14 May 2006, 21:12
Finally I made a new release on
http://www.mikusite.de/pages/x86.htm It's now Version 0.51 MT and includes some stuff found out during the evalualtion. - A possible memory violation from the access of the same variable in the thread is cured out and localised. - Regarding the alignment I didn't find a general solution. I just looked at general optimization documents and sorted the local variables in a way so that it seems the optimum, first the qwords, then the dwords and adding dummies for each to fill up needed space to be aligned to 16 for dwords and qwords. It seems to work okay and I gained a performance plus of about 1-2 percent for the sse2 version. - The benchmark is also repeated now 5 times to have more stable results, it just was over too fast on the fast dual core CPU's ![]() If somebody is still in the mood to test ![]() |
|||
![]() |
|
LocoDelAssembly 15 May 2006, 00:32
FPU: 131; SSE2: 179
Athlon 64 3200+ (2.0 GHz Venice core, Socket 939), 1GB DDR400 Dual-Channel 3.0-3-3-8 PS: Single core of course |
|||
![]() |
|
Kuemmel 15 May 2006, 05:57
Hi guys,
a taiwanese overclocker got hands to the upcoming Mobile CPU from Intel, the Core 2, internally called 'Merom', clocked to 3200 MHz, check that out: 32 bit - Win FPU: 406,195 SSE2: 888,284 ![]() 64 bit - Win FPU: 404,612 SSE2: 893,381 ![]() I can only say: WOW !!! The SSE2 performance per MHz is enhanced by about 60 % !!! The FPU performance is the same like Core One architecture. |
|||
![]() |
|
UCM 15 May 2006, 21:39
SSE2:392,393.110,393.110
FPU:286.232,285.969,283.948 Athlon X2 4200+ 2.2 Ghz dual-core |
|||
![]() |
|
Kuemmel 06 Jun 2006, 21:10
Hi guys,
I'm still detecting new things, especially the Hyper Threading keeps me busy. In all the previous versions the multi threading was limited to 2 threads assigned to 2 cores. Now I made a version using 4 threads assigned to 4 cores. Look what happened to the Intel Pentium D 965 EE Presler (Hyper Threading and Dual Core, 3733 MHz) 2 Thread version result: 235,674 (FPU) 549,834 (SSE2) 4 Thread version result: 333,172 (FPU) 671,360 (SSE2) ...so the benefit of the Hyper Threading visible at the single core results also works for Dual Core if 4 threads are set up instead of 2. This brings me to the conclusion that my benchmark should somehow detect 1) How many cores are available and 2) If Hyper Threading is available and then set up as many threads as usefull. How can two things be detected ? And this brings me to the next question...are there any theories about Hyper Threading, about how many threads can be usefull for a benefit ? Can the guys with the Single Core P4 with Hyper Threading test the 4 Threads version again to see if it has any positive effect ? The link is: http://www.mikusite.de/x86/KMB_V_0.52_4_test.zip |
|||
![]() |
|
Madis731 07 Jun 2006, 06:08
Extreme edition has actually 4 threads - 2 cores and each core has 2 threads. You should try with 8 threads and see if there would be any benefit.
|
|||
![]() |
|
Kuemmel 07 Jun 2006, 15:38
Madis731 wrote: Extreme edition has actually 4 threads - 2 cores and each core has 2 threads. You should try with 8 threads and see if there would be any benefit. ...okay, problem is, I can't test that system any more...so hyper threading just can cope with 2 threads or even more ? EDIT: Checked the Intel Page: "HT Technology allows a single Pentium 4 processor to function as two virtual or logical processors. There's still just one physical Pentium 4 processor in your PC — but the processor can execute two threads simultaneously"...so it seems that it's limited, but may be you can prove it ![]() Could you try the 4 thread version also on your single core with hyper threading ? I thought you've got a system like this. Cheers & Thanx ! |
|||
![]() |
|
f0dder 07 Jun 2006, 17:15
HyperThreading is (or should be, anyway :-s) more limited than SMP or multicore machines, since the logical processors share physical execution units...
I'm not sure if there's some decent way to get CPU count, but perhaps the environment variable NUMBER_OF_PROCESSORS is the thing to read? |
|||
![]() |
|
Madis731 07 Jun 2006, 17:30
Kuemmel - did you watch the post I made 10 May 2006, 21:49 closely?
At the very end I made a Prescott test with 1 and 2 threads. I did it by changing boot.ini effectively adding a startup option to make it use only 1 part of the CPU. I'm sorry, but I can't really understand what you don't understand or need ![]() Hyper-Threading has 2 virtual processors, but it only means 2xALU. Dual-Core has 2 full virtual processors and the effectiveness is better. Older EEs just had more cache, but today they have BOTH HT&DC - this actually means that it has 2 virtual processors in one chip and each of these virtual processors have 2 threads running on different ALUs. This is why I said 4 threads. My Prescott is only 2-threaded, but as there have been some tests by Intel - some applications can perform well with 150-160 threads. I don't know the logic behind it, but it seems that the syncing is better with this many threads. I think even one CPU with one thread can have much help from multiple threads. EDIT: I think I understood now ![]() |
|||
![]() |
|
Goto page Previous 1, 2, 3, 4, 5 ... 18, 19, 20 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.