flat assembler
Message board for the users of flat assembler.
Index
> Windows > Julia Set Example/Benchmark Goto page Previous 1, 2, 3, 4 Next |
Author |
|
bitRAKE 22 Dec 2008, 16:27
GetProcessAffinityMask is the source of the count.
|
|||
22 Dec 2008, 16:27 |
|
Kuemmel 22 Dec 2008, 19:33
Okay...I'll read about all this functions...if that's a reliable way, may be it's an idea to make small template application as a standard FASM example how to correctly count cores/processors/HT.
As an Intel paper does it with CPUID (http://www.developers.net/intelmcshowcase/view/2093), I'm just not sure what's the perfect way to do it, but from the results the Julia app here achieves it seems to work correctly By the way...the loop unrolling doesn't help much, as the dependancy chains are not broken. As said by Xorpd! you got to go for independant instruction blocks and also use several exits from the loop. First step would be to free some registers to do that, like using memory access for the xmm7 (cy), xmm5 (fsq_max), xmm6 (cx), etc. and use them for another block of calculations. You might have a look at (http://www.mikusite.de/x86/KMB_V0.53H-32b-MT.zip). Of course you'll end up with a much more complicate code, but for the Mandelbrot, speed gains were about more than 200 % for Core2Duo @EDIT: Detection doesn't seem to be that trivia...: http://www.devx.com/go-parallel/Article/27398/2046 I try to compile the example and see what it all does... Last edited by Kuemmel on 22 Dec 2008, 23:21; edited 1 time in total |
|||
22 Dec 2008, 19:33 |
|
kempis 26 Dec 2008, 06:38
Kuemmel, I am sorry because I'm late to answer your question.
bitRAKE wrote: GetProcessAffinityMask is the source of the count. Yes, thank's. |
|||
26 Dec 2008, 06:38 |
|
shoorick 26 Dec 2008, 07:26
works correct (celeron 300 test )
_________________ UNICODE forever! |
||||||||||
26 Dec 2008, 07:26 |
|
kempis 26 Dec 2008, 23:39
Kuemmel wrote: By the way...the loop unrolling doesn't help much, as the dependancy chains are not broken. As said by Xorpd! you got to go for independant instruction blocks and also use several exits from the loop. this is the modification of JuliaSSE(JuliaSSE-1) Quote: Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
|
|||||||||||
26 Dec 2008, 23:39 |
|
kempis 27 Dec 2008, 00:08
Quote: works correct (celeron 300 test ) |
|||
27 Dec 2008, 00:08 |
|
LocoDelAssembly 27 Dec 2008, 01:55
I think he means a Celeron but of the Pentium II series so you will need to go down to MMX (if you can really find it useful but it is the best SIMD instruction set the P6 has).
|
|||
27 Dec 2008, 01:55 |
|
bitRAKE 27 Dec 2008, 02:03
This newest version caused a system reset the first time I ran it (about half way through). System has been up for several weeks. Second time it went without a problem - (I think it's the video driver) must research further...
Code: Julia SSE Benchmark --------------------------- Intel(R) Xeon(R) CPU L5410 @ 2.33GHz Number of processors: 8 Framerate: 317.565918 fps Last edited by bitRAKE on 27 Dec 2008, 02:39; edited 1 time in total |
|||
27 Dec 2008, 02:03 |
|
LocoDelAssembly 27 Dec 2008, 02:23
Quote:
But not the benchmark, it is supposed that user-mode code cannot cause system reset so you had either a BSOD immediately followed by a system reboot (this is the factory setting), or a hardware problem (or both). Do you have minidumps enabled? |
|||
27 Dec 2008, 02:23 |
|
bitRAKE 27 Dec 2008, 02:42
No BSOD, and the dump didn't happen either (kernel memory dump enabled).
|
|||
27 Dec 2008, 02:42 |
|
revolution 27 Dec 2008, 03:14
I finally have some time to look at this. Some notes:
The first thing I did before running was to replace REALTIME_PRIORITY_CLASS with IDLE_PRIORITY_CLASS, and to replace THREAD_PRIORITY_TIME_CRITICAL with THREAD_PRIORITY_IDLE. What was the reason for all the high priority stuff? It creates the problems that the others have noted here, inability to exit and all other processes are paused. Any program that has a large computation requirement should never be high priority, quite the opposite it should be low priority to allow for the other functions to happen. Putting it realtime priority may allow the framrate to increase (very) slightly, but the cost of freezing all other tasks to get such a minor increase is too high IMO. So with those changes it assembled with the addition of the new constants to the 'system.inc' file. I ran it on my Dell laptop. Quote: Intel(R) Pentium(R) M processor 1300MHz |
|||
27 Dec 2008, 03:14 |
|
Kuemmel 27 Dec 2008, 16:15
kempis wrote: Great... thank you very much for your idea... Okay, I see that you interleave the instructions. Sometimes it can be benefitial also to keep the blocks united and just do the iteration-end-check in the end. As far as I see you wait also until all 4 points are finished. So the next (but painfull for code logic ) optimization step is to check if one of the 4 points is finished and then insert the next point into that 4-point-SSE mask. Of course you got to keep track where you are in your line and if it ended...was tough work in the Mandelbrot app but gained lots of course as you don't waste any iteration. @Revolution: Hm, about the thread priority...as it's benchmarking, I would try to get all the frames you could get as far as the computer isn't crashing...I would still go the realtime priority...until now I didn't really heard of lots of problem with that. If it would be faster or same to choose idle, of course no question what to do... |
|||
27 Dec 2008, 16:15 |
|
revolution 27 Dec 2008, 16:50
I would only consider using realtime priority for short running tasks, that is, tasks that are expected to take a few milliseconds at most. I know of only two situations where realtime is appropriate. 1) For short running (<10ms) benchmarks (a user level program), and 2) A device driver that requires a fast response time from the CPU (a system level program). So if program 1 is realtime priority and also takes many seconds of contiguous CPU time while running, then program 2 may not be able to service the device properly. Things like keyboard, mouse, network, USB-drives etc. can become unresponsive while a realtime priority task is running.
Setting idle priority will make it slower, but not much (I encourage you to measure the difference). But putting the other tasks out of action for 131 seconds is too long IMO. We should allow for users to at least keep their background tasks running to avoid the possibility of causing trouble. We can never be sure just what each user requires from their system and should not assume that they run (or not run) the same things we do (or don't). I know that, so far, for most here it doesn't really cause great trouble, but it is not really the best practice either. Just seems to be a simple change to make to keep system stability. Last edited by revolution on 27 Dec 2008, 18:31; edited 1 time in total |
|||
27 Dec 2008, 16:50 |
|
LocoDelAssembly 27 Dec 2008, 18:01
I would go for above normal priority or perhaps high priority, those would have little measure contamination and the system would still be responsive (the latter makes the system almost unresponsive but with patient the task manager after pressing ALT+CTRL+DEL appears which is better than real time priority since the task manager never appears until the real time priority thread(s) end).
Additionally, real time priority is unavailable for non-administrator users so I think it would be better to stick to high priority (I don't remember if the real time priority setting is rejected or lowered to high priority, though). |
|||
27 Dec 2008, 18:01 |
|
revolution 27 Dec 2008, 20:30
LocoDelAssembly wrote: ... task manager after pressing ALT+CTRL+DEL ... |
|||
27 Dec 2008, 20:30 |
|
LocoDelAssembly 27 Dec 2008, 22:00
Quote:
Neither, and I suppose you have found out that I'm one of those stupid users who uses accounts with administration privileges in a daily basis |
|||
27 Dec 2008, 22:00 |
|
revolution 28 Dec 2008, 00:45
LocoDelAssembly: If CTRL+SHIFT+ESC doesn't bring up task manager then it seems you aren't running XP?
|
|||
28 Dec 2008, 00:45 |
|
LocoDelAssembly 28 Dec 2008, 01:24
Yes it does, but not when the benchmark is running. I also tried your DropMyRights but although the mouse pointer was responsive at all times (but moving slower than normal), the task manager wasn't usable because it was immediately hided by the fractal (consequence of rendering with DDraw without any clipping).
Alt+Ctrl+Del was able to show me the light-blue background after some seconds, but the dialog that has the button to fire up the task manager was not able to appear until the benchmark (running in background) finished. (I have welcome screen and fast user switching disabled) I guess that my doubt about how the real time priority is handled by non-privileged user is gone now. When an app ask for real time priority, Windows grants high priority instead of the requested level or a fail status. |
|||
28 Dec 2008, 01:24 |
|
revolution 28 Dec 2008, 03:14
LocoDelAssembly: Since you are an admin on your machine perhaps you can try this little test to see just how much of a performance difference idle/realtime will have.
Run the original file and log the result. Edit the .asm file and replace the two constants as I mentioned above: REALTIME_PRIORITY_CLASS ---> IDLE_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL ---> THREAD_PRIORITY_IDLE (there is only one place for each constant). Put these two lines in the .inc file: Code: IDLE_PRIORITY_CLASS = 040h THREAD_PRIORITY_IDLE = -15 What is the difference? Perhaps you can post it here. |
|||
28 Dec 2008, 03:14 |
|
Goto page Previous 1, 2, 3, 4 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.