flat assembler
Message board for the users of flat assembler.
Index
> Main > Performance difference Windows and Linux with FASM |
Author |
|
revolution 13 Oct 2015, 15:15
I suspect that it is because of the unaligned variable "one". The ELF formats do not automatically align things as the PE format does.
Try inserting "align 16" or similar before the declaration of "one". |
|||
13 Oct 2015, 15:15 |
|
typedef 13 Oct 2015, 15:15
Check difference in CPU, memory, background activities etc.
|
|||
13 Oct 2015, 15:15 |
|
coen 13 Oct 2015, 15:45
revolution wrote: I suspect that it is because of the unaligned variable "one". The ELF formats do not automatically align things as the PE format does. I didn't knew that FASM aligns automatically in PE format and not in ELF format, so thanks for that piece of useful information , however I tried it and didn't make a difference in this case. The 'one' variable is used only 30 times when running this code. typedef wrote: Check difference in CPU, memory, background activities etc. The server is a test server and doesn't have any other background processes running that could cause the difference in performance. Also I ran the test several times and posted average times here (I should have mentioned that in my opening post). |
|||
13 Oct 2015, 15:45 |
|
revolution 13 Oct 2015, 23:16
coen wrote: I didn't knew that FASM aligns automatically in PE format and not in ELF format, so thanks for that piece of useful information , |
|||
13 Oct 2015, 23:16 |
|
revolution 13 Oct 2015, 23:20
As for your problem, there could be many reasons why you see a difference. Perhaps Linux is clocking the CPU differently because of a light task load or some power management setting.
|
|||
13 Oct 2015, 23:20 |
|
coen 14 Oct 2015, 07:16
revolution wrote: As for your problem, there could be many reasons why you see a difference. Perhaps Linux is clocking the CPU differently because of a light task load or some power management setting. I've double-checked this but the CPU is a Intel Xeon CPU and is always running at 2,3Ghz, so no throttling there, but thanks for the suggestion. I'm gonna try to run some CPU benchmarks later using Geekbench to see if it also shows a performance difference based on the OS or not, this might help me to determine whether the difference is caused by my source code or not. |
|||
14 Oct 2015, 07:16 |
|
fasmnewbie 14 Oct 2015, 11:39
coen
For a 2 seconds differences (which is quite a lot), is there a chance that your code is actually inolved in memory swapping or any memory-related issue by linux? AFAIK, nothing in the north bridge should produce a 2 seconds difference. It must have come from the south (e,g hard disk) |
|||
14 Oct 2015, 11:39 |
|
coen 14 Oct 2015, 14:12
@fasmnewbie, I'm running the code as shown above, so there's no memory or IO access going on at all, just CPU registers. I also think this big performance difference shouldn't be possible for just CPU instructions, but yet it is
|
|||
14 Oct 2015, 14:12 |
|
randall 14 Oct 2015, 14:31
You can use likwid (https://github.com/RRZE-HPC/likwid) to find out what is going on on Linux.
|
|||
14 Oct 2015, 14:31 |
|
JohnFound 14 Oct 2015, 14:52
Try to play a little with the CPU frequency governor. Put it to "performance" instead of "on demand" and test again.
Code: sudo cpupower frequency-set -g performance |
|||
14 Oct 2015, 14:52 |
|
coen 14 Oct 2015, 15:06
@JohnFound, thanks for your input, I've tried it but sadly no difference in results..
|
|||
14 Oct 2015, 15:06 |
|
coen 14 Oct 2015, 15:20
Well... I've run some Geekbench benchmarks to see if it also shows a difference in performance and it does.
Windows result: 3106 CentOS result: 2536 A difference of approximately 20%, the same as my test code. I still have no idea how this is possible, but at least I know the problem is not in my code or FASM |
|||
14 Oct 2015, 15:20 |
|
l_inc 14 Oct 2015, 15:35
coen
This might be related to the lazy context switching policy. An OS normally tends to avoid saving the full context of an old thread unless newer threads need to use the unsaved part. Depending on how many other threads in the system use AVX the performance might increase due to avoided saving overhead or degrade due to excessive exception rates required to lazily save the context. AFAIR Windows engages lazy switching for the first use of the Coprocessor/MMX/SSE/AVX and then uses non-lazy context saving when switching happens between threads once spotted to have used the corresponding extension. Maybe CentOS does smth. different. Or Maybe CentOS is just slow on context switching in general. This can be easily checked by comparing runtimes of simple empty loops. _________________ Faith is a superposition of knowledge and fallacy |
|||
14 Oct 2015, 15:35 |
|
revolution 14 Oct 2015, 16:03
If it is related to context switching and/or exceptions/interrupts then you can try setting the affinity to core 2 or 3 or something. Most OSes use core 0 for the house-keeping tasks and the other cores won't be affected.
|
|||
14 Oct 2015, 16:03 |
|
l_inc 14 Oct 2015, 16:13
revolution
Quote: If it is related to context switching and/or exceptions/interrupts then you can try setting the affinity to core 2 or 3 or something. And highest priority. Not sure what you mean by "house keeping", but threads are normally assigned to cores in some form of round robin. And interrupt delivery is also often evenly distributed among cores. _________________ Faith is a superposition of knowledge and fallacy |
|||
14 Oct 2015, 16:13 |
|
coen 14 Oct 2015, 20:10
I tried running it with a higher priority using the 'nice' command, but that didn't make a difference either.
I don't know what to try anymore at this point, as a last resort I'm gonna reinstall CentOS and hope that the problem will be gone with a fresh install. |
|||
14 Oct 2015, 20:10 |
|
ACP 14 Oct 2015, 22:22
coen wrote:
If you do that you may loose the trace of problem if the problems vanishes after reinstall. At least make a backup before overwriting your CentOS installation. I am really curious what is the source of the delay. One more question that comes to mind is: are running stock kernel? |
|||
14 Oct 2015, 22:22 |
|
Melissa 15 Oct 2015, 03:19
I don't have Windows but Linux version on my i7 4790 shows:
[bmaxa@maxa-pc assembler]$ time ./slowlin real 0m5.966s user 0m5.953s sys 0m0.000s kernel is 4.3 rc5 Since max turbo is 4.0ghz it is approximately same as your Windows time at 2.3ghz. |
|||
15 Oct 2015, 03:19 |
|
coen 15 Oct 2015, 12:40
ACP wrote:
I've reinstalled CentOS and the performance difference is gone! Sadly I didn't see your post till it was too late so I did not make a backup... I was running stock kernel. I want to thank all of you for trying to find the source of this weird performance issue, unfortunately we'll never know what was the real cause , for that I apologize.. |
|||
15 Oct 2015, 12:40 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.