flat assembler
Message board for the users of flat assembler.

Index > Main > Why LOCK can make it faster?

Author
Thread Post new topic Reply to topic
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 21 Sep 2023, 19:51
Code:
format ELF64 executable 3

op equ add
; lock fix rept 1 { db 0x2e }
_start:
        mov eax, 10000000
a:      lock op dword [msg], 1
        lock op dword [msg], 2
        lock op dword [msg], 3
        lock op dword [msg], 4
        lock op dword [msg], 5
        lock op dword [msg], 1
        lock op dword [msg], 2
        lock op dword [msg], 3
        lock op dword [msg], 4
        lock op dword [msg], 5
        dec eax
        jnz a
        mov eax, 1
        mov ebx, 0
        int 0x80
align 4
msg dd 0    


With LOCK it's 2.106s
Without lock it's 2.970s
Post 21 Sep 2023, 19:51
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20361
Location: In your JS exploiting you and your system
revolution 22 Sep 2023, 02:00
They both perform appallingly bad. Using lock makes is slightly less appalling, but still far short of anything near optimal.

You are writing to your active code cache line. The CPU has to assume you are doing self modifying code (SMC) and forces a cache reload and/or pipeline flush.on every cycle.

Try with this code instead.
Code:
;       for P in cs ds es lock ; do fasm -d PREFIX=$P l4m2.asm && time ./l4m2 ; done

format ELF64 executable 3

op equ add

_start:
        mov eax, 10000000
a:      PREFIX op dword [msg], 1
        PREFIX op dword [msg], 2
        PREFIX op dword [msg], 3
        PREFIX op dword [msg], 4
        PREFIX op dword [msg], 5
        PREFIX op dword [msg], 1
        PREFIX op dword [msg], 2
        PREFIX op dword [msg], 3
        PREFIX op dword [msg], 4
        PREFIX op dword [msg], 5
        dec eax
        jnz a
        mov eax, 1
        mov ebx, 0
        int 0x80

segment writeable
align 4
msg dd 0    
Now you notice the run times are considerably less, and that the lock imposes more delay than a segment prefix.

But to answer the specific question posed in the title? Don't know, CPU internals are weird and somewhat unpredictable. Sad

As a general rule, keep your cache happy and things usually go much smoother. Don't mix code with data. Specific results will always vary.
Post 22 Sep 2023, 02:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20361
Location: In your JS exploiting you and your system
revolution 22 Sep 2023, 06:34
It is also possible to see lock perform worse than cs if the alignment and position of msg is changed.
Code:
;       for P in cs lock ; do fasm -d PREFIX=$P l4m2.asm && time ./l4m2 ; done

format ELF64 executable 3

op equ add

_start:
        mov     eax, 10000000
        align   64
        msg     dd 0x90909090
a:      PREFIX op dword [msg], 1
        PREFIX op dword [msg], 2
        PREFIX op dword [msg], 3
        PREFIX op dword [msg], 4
        PREFIX op dword [msg], 5
        PREFIX op dword [msg], 1
        PREFIX op dword [msg], 2
        PREFIX op dword [msg], 3
        PREFIX op dword [msg], 4
        PREFIX op dword [msg], 5
        dec     eax
        jnz     a
        mov     eax, 1
        mov     ebx, 0
        int     0x80    
Putting msg into the same cache line makes lock the worse option on my system.
Code:
~ for P in cs lock ; do fasm -d PREFIX=$P l4m2.asm && time ./l4m2 ; done
flat assembler  version 1.73.31  (16384 kilobytes memory)
1 passes, 228 bytes.

real    0m3.399s
user    0m3.344s
sys     0m0.000s
flat assembler  version 1.73.31  (16384 kilobytes memory)
1 passes, 228 bytes.

real    0m4.269s
user    0m4.176s
sys     0m0.008s    
Post 22 Sep 2023, 06:34
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 22 Sep 2023, 06:37
revolution wrote:
You are writing to your active code cache line.
Noticed it maybe hours after I post using
Code:
db ? dup 1024    
I copied from [url="https://tio.run/"]TIO[/url]'s Hello world
Post 22 Sep 2023, 06:37
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.