flat assembler
Message board for the users of flat assembler.

Index > Windows > Julia Set Example/Benchmark

Goto page Previous  1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
FWIW, I lost about 2 fps.

kempis, I tried to change the resolution and missed something. Is there a gotcha somewhere? Changed sse_y_scale and sse_x_step as well as a number of constants throughout the code. Then there is a "shl esi,12" for the line width. What did I miss?
Post 28 Dec 2008, 03:32
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Thanks bitRAKE.
bitRAKE wrote:
Framerate: 317.565918 fps
bitRAKE wrote:
I lost about 2 fps
So ~315.565918/317.565918 = 99.37%. Or ~0.63% drop in performance.
Post 28 Dec 2008, 03:43
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
OK, this was the change:
Code:
include "system.inc"
IDLE_PRIORITY_CLASS     = 040h
THREAD_PRIORITY_IDLE    = -15

REALTIME_PRIORITY_CLASS       = IDLE_PRIORITY_CLASS
THREAD_PRIORITY_TIME_CRITICAL = THREAD_PRIORITY_IDLE
    


Result:
Quote:
---------------------------
Julia SSE Benchmark
---------------------------
AMD Athlon(tm) 64 Processor 3200+

Number of processors: 1

Framerate: 23.498535 fps


Removing all bellow 'include "system.inc"':
Quote:
---------------------------
Julia SSE Benchmark
---------------------------
AMD Athlon(tm) 64 Processor 3200+

Number of processors: 1

Framerate: 24.436951 fps


My drop was of around 4%. However, the fact that Firefox was open when the test was conducted may be the maximum responsible here (I see a CPU usage of 11% on FF from time to time even when I'm doing nothing).

I think the benchmark should run on high priority BUT, it should provide a means to kill it easily, perhaps creating an extra thread to respond to input (which should not interfere much since it should be idling most of the time and in some situations perhaps all the time).
Post 28 Dec 2008, 04:05
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
... and that extra, input reading, thread needs to run at a higher priority than the computation threads.
Post 28 Dec 2008, 04:12
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Not really necessary, if all of them run at equal priority the input will be processed with a latency that is imperceptible for us so I see no need to shorten it more. Remember that the input reader thread will be competing with only one computation thread per core so it should have great chances to get scheduled very soon (and several times in a second if needed) once the input arrives.

Also, if affinity mask is not set for this input reader thread, then it is in advantage over the computation threads because it can be dispatched in whatever core is available while the competitors (the computation threads, not the system-wide threads) have to wait for a specific core.
Post 28 Dec 2008, 05:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
That assumes a lot. Would it not be easier to simply set the priority one notch higher and not have to concerns ourselves about how the underlying OS will schedule it?
Post 28 Dec 2008, 05:22
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
The problem is that there is nothing higher than high priority for non-privileged users so you will have to lower the workers to "above normal" priority which is more prone to performance drop than high priority.
Post 28 Dec 2008, 05:31
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Each thread has a finer control than just the process priority.

But, I suggest to you that "above normal" would not suffer one iota of performance loss when compared to "high priority". Unless you specifically have another task running that also uses high-priority (unlikely though).
Post 28 Dec 2008, 05:35
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
No one besides me micro-manages their processes, lol?
Post 28 Dec 2008, 06:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Since it is being discussed here. Below are all the Windows priority assignments. The first column (WP) is the internal Windows priority. You can see that some internal priorities can be obtained in more than one way. And also that some priorities are affected by whether the task has focus or not (forground/background).
Code:
WP|F/B Process |Process Class                  |Thread Priority
--+------------+-------------------------------+-----------------------------
01           IDLE_PRIORITY_CLASS             THREAD_PRIORITY_IDLE
01              BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_IDLE
01              NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_IDLE
01              ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_IDLE
01              HIGH_PRIORITY_CLASS             THREAD_PRIORITY_IDLE
02              IDLE_PRIORITY_CLASS             THREAD_PRIORITY_LOWEST
03            IDLE_PRIORITY_CLASS             THREAD_PRIORITY_BELOW_NORMAL
04              IDLE_PRIORITY_CLASS             THREAD_PRIORITY_NORMAL
04            BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_LOWEST
05            IDLE_PRIORITY_CLASS             THREAD_PRIORITY_ABOVE_NORMAL
05              BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_BELOW_NORMAL
05 Background   NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_LOWEST
06            IDLE_PRIORITY_CLASS             THREAD_PRIORITY_HIGHEST
06           BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_NORMAL
06 Background NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_BELOW_NORMAL
07              BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_ABOVE_NORMAL
07 Background   NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_NORMAL
07 Foreground NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_LOWEST
08            BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_HIGHEST
08 Background        NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_ABOVE_NORMAL
08 Foreground   NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_BELOW_NORMAL
08              ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_LOWEST
09 Background NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_HIGHEST
09 Foreground        NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_NORMAL
09            ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_BELOW_NORMAL
10 Foreground   NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_ABOVE_NORMAL
10              ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_NORMAL
11 Foreground NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_HIGHEST
11           ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_ABOVE_NORMAL
11              HIGH_PRIORITY_CLASS             THREAD_PRIORITY_LOWEST
12            ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_HIGHEST
12           HIGH_PRIORITY_CLASS             THREAD_PRIORITY_BELOW_NORMAL
13              HIGH_PRIORITY_CLASS             THREAD_PRIORITY_NORMAL
14            HIGH_PRIORITY_CLASS             THREAD_PRIORITY_ABOVE_NORMAL
15              HIGH_PRIORITY_CLASS             THREAD_PRIORITY_HIGHEST
15           HIGH_PRIORITY_CLASS             THREAD_PRIORITY_TIME_CRITICAL
15             IDLE_PRIORITY_CLASS             THREAD_PRIORITY_TIME_CRITICAL
15             BELOW_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_TIME_CRITICAL
15             NORMAL_PRIORITY_CLASS           THREAD_PRIORITY_TIME_CRITICAL
15             ABOVE_NORMAL_PRIORITY_CLASS     THREAD_PRIORITY_TIME_CRITICAL
16             REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_IDLE
17              REALTIME_PRIORITY_CLASS         -7
18                REALTIME_PRIORITY_CLASS         -6
19                REALTIME_PRIORITY_CLASS         -5
20                REALTIME_PRIORITY_CLASS         -4
21                REALTIME_PRIORITY_CLASS         -3
22                REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_LOWEST
23            REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_BELOW_NORMAL
24              REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_NORMAL
25            REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_ABOVE_NORMAL
26              REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_HIGHEST
27           REALTIME_PRIORITY_CLASS         3
28         REALTIME_PRIORITY_CLASS         4
29         REALTIME_PRIORITY_CLASS         5
30         REALTIME_PRIORITY_CLASS         6
31         REALTIME_PRIORITY_CLASS         THREAD_PRIORITY_TIME_CRITICAL    
Post 28 Dec 2008, 09:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
Turning off all the power saving features resulted in the program displaying about half the FPS - I don't believe the results are accurate because the perceived execution time was the same and other benchmarks were not affected.
Post 29 Dec 2008, 03:59
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
The execution time is fixed by the LOG2_TICK_MAX parameter. 2^17 = 131072ms.
Post 29 Dec 2008, 04:18
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
Why would it be slower with all the cores fixed at 2.33Ghz rather than powered down? Could the C1E Halt have that big of impact? I don't see how Enhanced Intel Speedstep Technology (EIST) could have an impact.

[edit] it seems to be the GV1/3 EIST that effects it.

Code:
C1E        EIST    Prefetch

on      off     off             175.811768 fps
on    C       off             175.964355 fps
off   GV1/3   off             316.040039 fps    
(tired of rebooting for now, lol)


Last edited by bitRAKE on 29 Dec 2008, 05:45; edited 1 time in total
Post 29 Dec 2008, 05:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Maybe the video memory bus clock is slower? Maybe the main RAM bus clock is slower? Maybe the CPU is thermally throttling?
Post 29 Dec 2008, 05:39
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
The FPS are higher when the throttling options are enabled - that is why it doesn't make sense. Other CPU/RAM speed tests are the same.
Post 29 Dec 2008, 05:47
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
You can't disable the thermal throttling, it is an automatic function of the CPU to prevent damage. By enabling the power saving features perhaps the CPU runs cooler and thus does not need to modulate the clock to keep cool. Anyhow, it is just a thought, it might be something completely different of course.
Post 29 Dec 2008, 06:13
View user's profile Send private message Visit poster's website Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Kuemmel wrote:
Okay, I see that you interleave the instructions. Sometimes it can be benefitial also to keep the blocks united and just do the iteration-end-check in the end.
This is new version. It gives better. Smile


Description:
Download
Filename: JuliaSSE-1.1.zip
Filesize: 7 KB
Downloaded: 35 Time(s)

Post 29 Dec 2008, 10:04
View user's profile Send private message Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
Quote:
---------------------------
Julia SSE Benchmark
---------------------------
Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
Number of processors: 2
Framerate: 88.294983 fps
Post 29 Dec 2008, 10:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Quote:
---------------------------
Julia SSE Benchmark
---------------------------
Intel(R) Pentium(R) M processor 1300MHz

Number of processors: 1

Framerate: 15.014648 fps
Post 29 Dec 2008, 10:32
View user's profile Send private message Visit poster's website Reply with quote
kempis



Joined: 12 Jun 2008
Posts: 49
kempis
bitRAKE wrote:
kempis, I tried to change the resolution and missed something. Is there a gotcha somewhere? Changed sse_y_scale and sse_x_step as well as a number of constants throughout the code. Then there is a "shl esi,12" for the line width. What did I miss?


I'm sorry Sad ...
It's difficult to change the resolution...
If you want to do it, you need to change code in several places(at iteration body, initialization etc, data deinition etc...).

But it's maybe can help:
sse_y_scale: 3/1024 = 3/SCREEN_WIDTH
sse_x_step: 8*sse_y_scale
sse_x_left: -1.5,-1.5+sse_y_scale,...
you need to change code in DDraw->SetDisplayMode
you need to change code cmp edi,1024(above .draw_line )
you need to check the line size.
you need to change shl esi,12 (esi=esi* line size ) this code assum that line size = 4*SCREEN_WIDTH


BTW: when you test the code
change cmp eax,TICK_MAX ( at .new_frame) with cmp eax,xxx
your execution time is xxx miiliseconds.
Post 29 Dec 2008, 10:42
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.