flat assembler
Message board for the users of flat assembler.

Index > Main > Another question: lock move [edi],eax (for example)

Author
Thread Post new topic Reply to topic
Zoltanmatey31



Joined: 10 Jan 2023
Posts: 20
Zoltanmatey31 16 Jan 2023, 19:47
would it work? its not on the list in the document though:


"memory operand:
add, adc, and, btc, btr, bts, cmpxchg, cmpxchg8b, dec, inc, neg, not, or, sbb, sub,
xor, xadd and xchg. If the lock prefix is used with one of these instructions and the
source operand is a memory operand, an undefined opcode exception may be generated.
An undefined opcode exception will also be generated if the lock prefix is used with
any instruction not in the above list."


mov definitely has memory destination version.
Post 16 Jan 2023, 19:47
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 16 Jan 2023, 23:08
It won't work. And it doesn't make sense anyway. There is no need for lock if you are only storing a value, or only reading a value. The only use case that is meaningful is when you read-modify-write a value. So lock has been restricted to only those cases.
Post 16 Jan 2023, 23:08
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 17 Jan 2023, 14:05
mov is atomic without a lock. Don't be mislead by all the bullshit around the internet claiming stuff like "if you don't use LOCK prefix you'll end up with two different copies in different cache lines of the same memory!". It's not true. Cache invalidation is not rocket science. This can't happen. At least not on x86.
Post 17 Jan 2023, 14:05
View user's profile Send private message Reply with quote
Zoltanmatey31



Joined: 10 Jan 2023
Posts: 20
Zoltanmatey31 17 Jan 2023, 17:52
ok. I was probably thinking on the line: If instruction handles memory in multiple steps and needs the memory, then why not locking it always?
or if so locking could be always necessery by hand done procedure.
Post 17 Jan 2023, 17:52
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 18 Jan 2023, 14:31
Zoltanmatey31 wrote:
ok. I was probably thinking on the line: If instruction handles memory in multiple steps and needs the memory, then why not locking it always?
or if so locking could be always necessery by hand done procedure.
It's similar to how you don't use a mutex for data that is only accessed by one thread. Efficiency. LOCKing the bus is expensive.
Post 18 Jan 2023, 14:31
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 18 Jan 2023, 18:49
Furs wrote:
LOCKing the bus is expensive.
Very much so.

Some results I recorded previously showed a 44x slowdown.
Code:
~ ./no_lock 

real    0m0.081s
user    0m0.072s
sys     0m0.004s

~ ./with_lock 

real    0m3.202s
user    0m3.152s
sys     0m0.000s    
The source was this:
Code:
format elf64 executable

SYS64_EXIT              = 60
MEM_COUNT               = 1 shl 16
LOOPS                   = 1 shl 7

segment executable readable writeable

        mov     rdx,LOOPS
    .loop_outer:
        lea     rsi,[foo]
        mov     ecx,MEM_COUNT
    .loop_inner:
        lock                    ;toggle these two instructions for lock vs no lock
;       nop                     ;toggle these two instructions for lock vs no lock
        inc     qword[rsi]
        add     rsi,8
        loop    .loop_inner
        dec     rdx
        jnz     .loop_outer
        mov     eax,SYS64_EXIT
        syscall

foo     rq      MEM_COUNT    
Individual results will vary of course. But I doubt there exists a computer that can improve much upon 44x difference. Maybe a computer is worse if it has a higher clock speed? Try it.

Note: the foo array is unaligned. This was a deliberate choice for the test that was done at the time.
Post 18 Jan 2023, 18:49
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 19 Jan 2023, 15:08
Comparison of summing matrix elements iterating by rows vs by columns usually gave me something around 40x. Just as an example of something close in its effect on performance.
Post 19 Jan 2023, 15:08
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.