flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Differences in compile time for different code

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
system error



Joined: 01 Sep 2013
Posts: 670
system error 12 Dec 2016, 23:12
I've been wondering this for quite some time but only now I remember to ask this. I've noticed significant compile time difference for the same code looped 10 million times; one with a complete prologue and epilogue (shorter time, normal) and one without proper stack frame setup (longer compile time). Sample code

Code:
align 32
StringCopy:
        ;enter   0,0
        push    rsi rdi rcx rdx rbx
        mov     rsi,rbx
        mov     rdi,rax
        mov     rdx,rcx
        mov     rbx,rcx
        shr     rbx,3
        mov     rcx,rbx
        rep     movsq
        shl     rbx,3
        nop
        sub     rdx,rbx
        mov     rcx,rdx
        rep     movsb
        pop     rbx rdx rcx rdi rsi
        ;leave
        ret    


This code compiles slower (3.4 sec) but behaves ok (0.1 sec) if I uncommented the prologue/epilogue lines. This isn't the only code showing this anomaly. Others are the same particularly when it involves millions of loops. What could be the reasons?
Post 12 Dec 2016, 23:12
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 12 Dec 2016, 23:20
I don't think it's the code though. I suspect something from the compile optimization?
Post 12 Dec 2016, 23:20
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20358
Location: In your JS exploiting you and your system
revolution 13 Dec 2016, 00:20
I would suspect alignment. Something is probably crossing a cache boundary. Shifting the output code up or down by a few bytes does affect the compile time.
Post 13 Dec 2016, 00:20
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 13 Dec 2016, 00:43
mom, this is just a small code. Cache won't be that big of an issue, IMO. Could it be that FASM has special 'arrangement' for framed stack vs naked stack?
Post 13 Dec 2016, 00:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20358
Location: In your JS exploiting you and your system
revolution 13 Dec 2016, 01:21
system error wrote:
mom, this is just a small code.
Not relevant when you do something 10 million times.
system error wrote:
Cache won't be that big of an issue, IMO.
Did you try my suggestion of shifting the output a few bytes up/down? Prefix a nop to the top of the code or something and see how compile times change.
system error wrote:
Could it be that FASM has special 'arrangement' for framed stack vs naked stack?
No.
Post 13 Dec 2016, 01:21
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 13 Dec 2016, 01:36
Doesn't work either. I think with align 32, there should be enough nops up there. Injecting another one will induce an odd address for cache pickup. It must be something else.
Post 13 Dec 2016, 01:36
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20358
Location: In your JS exploiting you and your system
revolution 13 Dec 2016, 01:45
Don't use align, just put in a single, or two, or three, nop(s) as the first instruction(s) to show the effect.

BTW: If you post your entire test code with whatever 10 million looping setup you have then others can test it also.
Post 13 Dec 2016, 01:45
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 13 Dec 2016, 01:59
Now it compiles in 4.5 secs. ho ho ho. This is interesting! I am moving to PC now to see how it behaves.
Post 13 Dec 2016, 01:59
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1635
Location: Toronto, Canada
AsmGuru62 13 Dec 2016, 21:00
I had a project once: 5Mb of code files (~50 files) - compiled in ~1 sec.
You're saying the code you posted compiles in 4.5 sec?!
I must be missing something.
Post 13 Dec 2016, 21:00
View user's profile Send private message Send e-mail Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 14 Dec 2016, 09:16
gee i don't know AsmGuru62... maybe you're missing your brain, perhaps?

Here's the code to test. Test the compile time for both cases

a. When enter/leave disabled
b. When enter/leave enabled

Your time may differ but still such compile anomaly persists.


Description:
Download
Filename: testit.asm
Filesize: 969 Bytes
Downloaded: 770 Time(s)

Post 14 Dec 2016, 09:16
View user's profile Send private message Reply with quote
fragment



Joined: 11 Jan 2017
Posts: 3
Location: Berlin
fragment 14 Jan 2017, 18:28
testResults on my Laptop (Celeron-N2830/Silvermont):

fasm (version: 1.71.57) compile time:
enter/leave on = 0.1 - 0.2 seconds
enter/leave off = 0.7 - 0.8 seconds

ps: so your trend continues, the more code the faster. looks like a lot of extra work in the future Wink
Post 14 Jan 2017, 18:28
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 14 Jan 2017, 18:45
Do you have any antivirus software monitoring your file accesses? Perhaps it detects something suspicious in the ENTER/LEAVE variant and its interferes with fasm writing the executable file? On my machine I see no difference in compile time between the two variants.
Post 14 Jan 2017, 18:45
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 15 Jan 2017, 22:35
Tomasz, it could be something to do with your table entry, especially those ending with 0. "entry" is one of them. I don't know what exactly the problem is because I don't really understand your table, but it could be the suspect.
Post 15 Jan 2017, 22:35
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8356
Location: Kraków, Poland
Tomasz Grysztar 16 Jan 2017, 12:51
There is nothing special about these entries, 0 is the value of parameter passed to the instruction handler, for "enter" handler this parameter is not used at all and can be any other value.

The behavior that you describe is very irregular, currently I have no way of reproducing on my own.
Post 16 Jan 2017, 12:51
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 17 Jan 2017, 10:44
Ok.

But it's weird though. I tested it on Linux and other PCs, still the same anomaly. The compile time is back to normal speed only after I replaced / injected them with any entries ending with 0 (e.g, xchg, bswap, enter).

But it's probably nothing.
Post 17 Jan 2017, 10:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20358
Location: In your JS exploiting you and your system
revolution 18 Jan 2017, 09:33
system error wrote:
Your time may differ but still such compile anomaly persists.
Here are my results:
Code:
C:\Documents and Settings\We are the Borg\Our Documents>fasm testit1.asm
flat assembler  version 1.71.58  (3145344 kilobytes memory)
3 passes, 0.1 seconds, 2048 bytes.

C:\Documents and Settings\We are the Borg\Our Documents>fasm testit2.asm
flat assembler  version 1.71.58  (3145344 kilobytes memory)
3 passes, 0.1 seconds, 2048 bytes.

C:\Documents and Settings\We are the Borg\Our Documents>fc testit1.asm testit2.asm
Comparing files testit1.asm and testit2.asm
***** testit1.asm
StringCopy:
        ;enter   0,0
        push    rsi rdi rcx rdx rbx
***** testit2.asm
StringCopy:
        enter    0,0
        push    rsi rdi rcx rdx rbx
*****

***** testit1.asm
        pop     rbx rdx rcx rdi rsi
        ;leave
        ret
***** testit2.asm
        pop     rbx rdx rcx rdi rsi
        leave
        ret
*****    
No difference detected. It must be something on your system.
Post 18 Jan 2017, 09:33
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 430
Location: Australia
redsock 18 Jan 2017, 19:58
Isn't it sposed to be "We are Borg?" Smile
Post 18 Jan 2017, 19:58
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20358
Location: In your JS exploiting you and your system
revolution 19 Jan 2017, 02:16
redsock wrote:
Isn't it sposed to be "We are Borg?" Smile
Resistance is futile.
Post 19 Jan 2017, 02:16
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 670
system error 19 Jan 2017, 17:20
I don't know revo. This is the result on both 'my systems'. It seems consistent with fragment's finding.

Code:
D:\FASMW57>fasm testit.asm
flat assembler  version 1.71.57  (863059 kilobytes memory)
3 passes, 0.1 seconds, 2048 bytes. ;ENTER/LEAVE enabled

D:\FASMW57>fasm testit.asm
flat assembler  version 1.71.57  (863943 kilobytes memory)
3 passes, 0.7 seconds, 2048 bytes.

D:\FASMW57>fasm testit.asm
flat assembler  version 1.71.57  (861630 kilobytes memory)
3 passes, 0.8 seconds, 2048 bytes.

D:\FASMW57>fasm testit.asm
flat assembler  version 1.71.57  (861117 kilobytes memory)
3 passes, 0.8 seconds, 2048 bytes.

D:\FASMW57>fasm testit.asm
flat assembler  version 1.71.57  (861255 kilobytes memory)
3 passes, 0.7 seconds, 2048 bytes.    
Post 19 Jan 2017, 17:20
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 19 Jan 2017, 17:42
In Linux on slower CPU and with Fresh IDE (FASM 1.71.58 ) I didn't found any measurable difference. For both versions:

Preprocessing time: 150..170ms
Parsing: 40ms
Assembling: 9ms
Formatting: 50ms
Post 19 Jan 2017, 17:42
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.