flat assembler
Message board for the users of flat assembler.

Index > Windows > Julia Set Example/Benchmark

Goto page Previous  1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 3055
Location: vpcmipstrm
bitRAKE
GetProcessAffinityMask is the source of the count.
Post 22 Dec 2008, 16:27
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
Okay...I'll read about all this functions...if that's a reliable way, may be it's an idea to make small template application as a standard FASM example how to correctly count cores/processors/HT.

As an Intel paper does it with CPUID (http://www.developers.net/intelmcshowcase/view/2093), I'm just not sure what's the perfect way to do it, but from the results the Julia app here achieves it seems to work correctly Smile

By the way...the loop unrolling doesn't help much, as the dependancy chains are not broken. As said by Xorpd! you got to go for independant instruction blocks and also use several exits from the loop.

First step would be to free some registers to do that, like using memory access for the xmm7 (cy), xmm5 (fsq_max), xmm6 (cx), etc. and use them for another block of calculations. You might have a look at (http://www.mikusite.de/x86/KMB_V0.53H-32b-MT.zip).

Of course you'll end up with a much more complicate code, but for the Mandelbrot, speed gains were about more than 200 % for Core2Duo Smile

@EDIT: Detection doesn't seem to be that trivia...:
http://www.devx.com/go-parallel/Article/27398/2046
I try to compile the example and see what it all does...


Last edited by Kuemmel on 22 Dec 2008, 23:21; edited 1 time in total
Post 22 Dec 2008, 19:33
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
If you already have an OS running the you should use the OS interface to count CPUs. This is because the OS can only run your code on CPUs it knows about.

If you are writing an OS then you should use CPUID etc. to count CPUs. This is because there is no OS there to tell you.

Example: Win98 only supports one CPU. If you used CUPID to count CPUs and started threads based upon that count then you get nowhere. Since Win98 will just run all threads on one CPU core and doesn't know how to run code on the other cores.
Post 22 Dec 2008, 23:11
View user's profile Send private message Visit poster's website Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Kuemmel, I am sorry because I'm late to answer your question.
bitRAKE wrote:
GetProcessAffinityMask is the source of the count.

Yes, thank's.
Post 26 Dec 2008, 06:38
View user's profile Send private message Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1608
Location: Ukraine
shoorick
works correct (celeron 300 test Smile )


Description:
Filesize: 1.83 KB
Viewed: 2744 Time(s)

cel300.png



_________________
UNICODE forever!
Post 26 Dec 2008, 07:26
View user's profile Send private message Visit poster's website Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Kuemmel wrote:
By the way...the loop unrolling doesn't help much, as the dependancy chains are not broken. As said by Xorpd! you got to go for independant instruction blocks and also use several exits from the loop.
Great... thank you very much for your idea... Smile
this is the modification of JuliaSSE(JuliaSSE-1)

Quote:
Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
Number of processors: 2
Framerate: 81.817627 fps
The framerate increases about 30%...


Description: Modification of JuliaSSE
Download
Filename: JuliaSSE-1.zip
Filesize: 7.05 KB
Downloaded: 47 Time(s)

Post 26 Dec 2008, 23:39
View user's profile Send private message Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Quote:
works correct (celeron 300 test Smile )
Actually, the used SSE2-instruction is only psubd... I think it can be modified to compatible with no-SSE2 processors.
Post 27 Dec 2008, 00:08
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I think he means a Celeron but of the Pentium II series so you will need to go down to MMX (if you can really find it useful but it is the best SIMD instruction set the P6 has).
Post 27 Dec 2008, 01:55
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3055
Location: vpcmipstrm
bitRAKE
This newest version caused a system reset the first time I ran it (about half way through). System has been up for several weeks. Sad Second time it went without a problem - (I think it's the video driver) must research further...
Code:
Julia SSE Benchmark
---------------------------
Intel(R) Xeon(R) CPU           L5410  @ 2.33GHz

Number of processors: 8

Framerate: 317.565918 fps    


Last edited by bitRAKE on 27 Dec 2008, 02:39; edited 1 time in total
Post 27 Dec 2008, 02:03
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

Second time it went without a problem - must research further...

But not the benchmark, it is supposed that user-mode code cannot cause system reset so you had either a BSOD immediately followed by a system reboot (this is the factory setting), or a hardware problem (or both).

Do you have minidumps enabled?
Post 27 Dec 2008, 02:23
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3055
Location: vpcmipstrm
bitRAKE
No BSOD, and the dump didn't happen either (kernel memory dump enabled).
Post 27 Dec 2008, 02:42
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
I finally have some time to look at this. Some notes:

The first thing I did before running was to replace REALTIME_PRIORITY_CLASS with IDLE_PRIORITY_CLASS, and to replace THREAD_PRIORITY_TIME_CRITICAL with THREAD_PRIORITY_IDLE. What was the reason for all the high priority stuff? It creates the problems that the others have noted here, inability to exit and all other processes are paused. Any program that has a large computation requirement should never be high priority, quite the opposite it should be low priority to allow for the other functions to happen. Putting it realtime priority may allow the framrate to increase (very) slightly, but the cost of freezing all other tasks to get such a minor increase is too high IMO.

So with those changes it assembled with the addition of the new constants to the 'system.inc' file.

I ran it on my Dell laptop.
Quote:
Intel(R) Pentium(R) M processor 1300MHz

Number of processors: 1

Framerate: 13.198853 fps
It has a very noticeable slowdown during the times when the screen is mostly black and speeds up again as more colours come into the picture.
Post 27 Dec 2008, 03:14
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel
kempis wrote:
Great... thank you very much for your idea... Smile
this is the modification of JuliaSSE(JuliaSSE-1)
Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
Number of processors: 2
Framerate: 81.817627 fps
The framerate increases about 30%...

Okay, I see that you interleave the instructions. Sometimes it can be benefitial also to keep the blocks united and just do the iteration-end-check in the end.

As far as I see you wait also until all 4 points are finished. So the next (but painfull for code logic Wink) optimization step is to check if one of the 4 points is finished and then insert the next point into that 4-point-SSE mask. Of course you got to keep track where you are in your line and if it ended...was tough work in the Mandelbrot app but gained lots of course as you don't waste any iteration.

@Revolution: Hm, about the thread priority...as it's benchmarking, I would try to get all the frames you could get as far as the computer isn't crashing...I would still go the realtime priority...until now I didn't really heard of lots of problem with that. If it would be faster or same to choose idle, of course no question what to do...
Post 27 Dec 2008, 16:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
I would only consider using realtime priority for short running tasks, that is, tasks that are expected to take a few milliseconds at most. I know of only two situations where realtime is appropriate. 1) For short running (<10ms) benchmarks (a user level program), and 2) A device driver that requires a fast response time from the CPU (a system level program). So if program 1 is realtime priority and also takes many seconds of contiguous CPU time while running, then program 2 may not be able to service the device properly. Things like keyboard, mouse, network, USB-drives etc. can become unresponsive while a realtime priority task is running.

Setting idle priority will make it slower, but not much (I encourage you to measure the difference). But putting the other tasks out of action for 131 seconds is too long IMO. We should allow for users to at least keep their background tasks running to avoid the possibility of causing trouble. We can never be sure just what each user requires from their system and should not assume that they run (or not run) the same things we do (or don't).

I know that, so far, for most here it doesn't really cause great trouble, but it is not really the best practice either. Just seems to be a simple change to make to keep system stability.


Last edited by revolution on 27 Dec 2008, 18:31; edited 1 time in total
Post 27 Dec 2008, 16:50
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I would go for above normal priority or perhaps high priority, those would have little measure contamination and the system would still be responsive (the latter makes the system almost unresponsive but with patient the task manager after pressing ALT+CTRL+DEL appears which is better than real time priority since the task manager never appears until the real time priority thread(s) end).

Additionally, real time priority is unavailable for non-administrator users so I think it would be better to stick to high priority (I don't remember if the real time priority setting is rejected or lowered to high priority, though).
Post 27 Dec 2008, 18:01
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly wrote:
... task manager after pressing ALT+CTRL+DEL ...
CTRL+SHIFT+ESC don't work for you then?
Post 27 Dec 2008, 20:30
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

CTRL+SHIFT+ESC don't work for you then?

Neither, and I suppose you have found out that I'm one of those stupid users who uses accounts with administration privileges in a daily basis Smile
Post 27 Dec 2008, 22:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly: If CTRL+SHIFT+ESC doesn't bring up task manager then it seems you aren't running XP?
Post 28 Dec 2008, 00:45
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Yes it does, but not when the benchmark is running. I also tried your DropMyRights but although the mouse pointer was responsive at all times (but moving slower than normal), the task manager wasn't usable because it was immediately hided by the fractal (consequence of rendering with DDraw without any clipping).

Alt+Ctrl+Del was able to show me the light-blue background after some seconds, but the dialog that has the button to fire up the task manager was not able to appear until the benchmark (running in background) finished. (I have welcome screen and fast user switching disabled)

I guess that my doubt about how the real time priority is handled by non-privileged user is gone now. When an app ask for real time priority, Windows grants high priority instead of the requested level or a fail status.
Post 28 Dec 2008, 01:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly: Since you are an admin on your machine perhaps you can try this little test to see just how much of a performance difference idle/realtime will have.

Run the original file and log the result.

Edit the .asm file and replace the two constants as I mentioned above:
REALTIME_PRIORITY_CLASS ---> IDLE_PRIORITY_CLASS
THREAD_PRIORITY_TIME_CRITICAL ---> THREAD_PRIORITY_IDLE
(there is only one place for each constant).
Put these two lines in the .inc file:
Code:
IDLE_PRIORITY_CLASS    = 040h
THREAD_PRIORITY_IDLE  = -15    
Log the result.

What is the difference? Perhaps you can post it here.
Post 28 Dec 2008, 03:14
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.