flat assembler
Message board for the users of flat assembler.

Index > Main > fasmg extremely slow on linux with x86/split lock detection

Author
Thread Post new topic Reply to topic
MateoConLechgua



Joined: 30 Sep 2024
Posts: 5
MateoConLechgua 30 Sep 2024, 06:39
Hello, fasmg currently runs extremely slowly on linux with recent kernel changes to support x86 split lock detection. With fasmg version g.kd3c I get this message in the system logs:

Code:
x86/split lock detection: #AC: fasmg/78798 took a split_lock trap at address: 0x4031ea    


It seems that newer kernels imposed a penalty on programs that utilize split locks, and makes them run really slowly now: https://lwn.net/Articles/911219/

I would greatly appreciate any help on fixing this issue!
Post 30 Sep 2024, 06:39
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 30 Sep 2024, 10:23
I'm afraid it would be a lot of work to modify fasmg to make it compliant. While fasmg uses correctly aligned structures in places where it really mattered for performance, there are several areas where it was preferred to tightly pack the data streams, which in turn causes relatively frequent unaligned memory accesses, unavoidable in its current architecture.
Post 30 Sep 2024, 10:23
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 30 Sep 2024, 11:31
fasmg.sh
Code:
#!/bin/bash
sysctl split_lock_mitigate=0 ; fasmg "$@" ; sysctl split_lock_mitigate=1    
It might need a sudo or two there.
Post 30 Sep 2024, 11:31
View user's profile Send private message Visit poster's website Reply with quote
MateoConLechgua



Joined: 30 Sep 2024
Posts: 5
MateoConLechgua 30 Sep 2024, 15:16
Thank you for the reply! Yes, I realize that the sysctl and kernel boot parameters exist, the main issue is that fasmg is distributed as the assembler/linker for a toolchain and thus we need to have users perform these operations now as root whereas before that wasn't needed. I am curious though why locks are needed, isn't fasmg just a single core process? Just confused why accesses would have to be atomic in the first place.
Post 30 Sep 2024, 15:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 30 Sep 2024, 15:21
It is probably an xchg instruction.
Intel wrote:
If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL.
Post 30 Sep 2024, 15:21
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 30 Sep 2024, 15:26
You could try rewriting the xchg's with equivalent code.
Code:
; xchg eax,[mem]
push ebx
mov ebx,[mem] ; if mem=esp then mov ebx,[mem+4]
mov [mem],eax ; if mem=esp then mov [mem+4],eax
mov eax,ebx
pop ebx    
Post 30 Sep 2024, 15:26
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2505
Furs 30 Sep 2024, 15:40
revolution wrote:
You could try rewriting the xchg's with equivalent code.
Code:
; xchg eax,[mem]
push ebx
mov ebx,[mem] ; if mem=esp then mov ebx,[mem+4]
mov [mem],eax ; if mem=esp then mov [mem+4],eax
mov eax,ebx
pop ebx    
If you're going to push/pop anyway, why not just push and pop directly with memory operand?
Post 30 Sep 2024, 15:40
View user's profile Send private message Reply with quote
MateoConLechgua



Joined: 30 Sep 2024
Posts: 5
MateoConLechgua 30 Sep 2024, 15:44
I think it was just an example, could just use three xors too. I'll see about changing those, thank you!
Post 30 Sep 2024, 15:44
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 30 Sep 2024, 16:22
Furs wrote:
If you're going to push/pop anyway, why not just push and pop directly with memory operand?
- Three xor's requires four memory ops, all unaligned.

- With push reg/push mem/pop reg/pop mem that is six mem ops, two of those unaligned

- With push/mov/mov/pop that is four mem ops, two of those unaligned.

The the last one has fewer mem ops, and minimises the unaligned accesses.

It needs to be tested, but often minimising mem ops, and especially minimising non-aligned mem ops, can be a great efficiency win, and maybe also a runtime win.
Post 30 Sep 2024, 16:22
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2505
Furs 01 Oct 2024, 14:41
revolution wrote:
Furs wrote:
If you're going to push/pop anyway, why not just push and pop directly with memory operand?
- Three xor's requires four memory ops, all unaligned.

- With push reg/push mem/pop reg/pop mem that is six mem ops, two of those unaligned

- With push/mov/mov/pop that is four mem ops, two of those unaligned.

The the last one has fewer mem ops, and minimises the unaligned accesses.

It needs to be tested, but often minimising mem ops, and especially minimising non-aligned mem ops, can be a great efficiency win, and maybe also a runtime win.
I mean just:
Code:
; xchg eax, [mem]
push dword [mem]
mov [mem], eax
pop eax    
I don't see how it's more memory ops than yours?
Post 01 Oct 2024, 14:41
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 957
Location: Russia
macomics 01 Oct 2024, 16:54
Code:
push dword [mem]    
The same command with the same limitations as xchg
Post 01 Oct 2024, 16:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 01 Oct 2024, 18:11
Furs wrote:
I mean just:
Code:
; xchg eax, [mem]
push dword [mem]
mov [mem], eax
pop eax    
I don't see how it's more memory ops than yours?
Yeah, that is good also. Smile

Four mem ops, two unaligned.
Post 01 Oct 2024, 18:11
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 01 Oct 2024, 18:12
macomics wrote:
Code:
push dword [mem]    
The same command with the same limitations as xchg
push does not allow lock, and doesn't have any implicit lock.
Post 01 Oct 2024, 18:12
View user's profile Send private message Visit poster's website Reply with quote
Ville



Joined: 17 Jun 2003
Posts: 304
Ville 01 Oct 2024, 19:36
MateoConLechgua wrote:
x86/split lock detection: #AC: fasmg/78798 took a split_lock trap at address: 0x4031ea
Code:
0x4031df: 0x67 0xf6 0x43 0x01 0x01  test   [ebx+0x1],byte 0x1
0x4031e4: 0x74 0x12                 je     0x12
0x4031e6: 0x67 0x8d 0x46 0xfc       lea    eax,[esi-0x4]
0x4031ea: 0x67 0x87 0x02            xchg   [edx],eax
0x4031ed: 0x67 0x8d 0x56 0xfc       lea    edx,[esi-0x4]    
Post 01 Oct 2024, 19:36
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 02 Oct 2024, 06:57
It seems I exaggerated the issue. I reviewed 107 XCHG instructions and only two of them deal with the aforementioned packed blocks. I replaced them both and released the new version "g.kkod".

Please let me know if there are more traps, I may have missed something.
Post 02 Oct 2024, 06:57
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 957
Location: Russia
macomics 02 Oct 2024, 08:07
Tomasz Grysztar wrote:
Please let me know if there are more traps, I may have missed something.
I checked it on source/linux/x64/fasmg.asm.

Code:
$ dmesg | grep 'Linux version'
[    0.000000] Linux version 6.6.52-calculate (root@localhost) (gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP PREEMPT_DYNAMIC Tue Sep 24 21:17:50 +04 2024
$ cat /proc/cpuinfo | grep 'model name' | head --lines=1 -
model name      : 11th Gen Intel(R) Core(TM) i5-11600KF @ 3.90GHz
$ fasmg fasmg.asm ./fasmg
flat assembler  version g.kd3c
5 passes, 0.4 seconds, 70638 bytes.
$ ./fasmg fasmg.asm ./fasmg.x64
flat assembler  version g.kkod
5 passes, 0.4 seconds, 70638 bytes.
$ dmesg | grep 'split lock detection'
$    
Post 02 Oct 2024, 08:07
View user's profile Send private message Reply with quote
MateoConLechgua



Joined: 30 Sep 2024
Posts: 5
MateoConLechgua 08 Oct 2024, 04:09
Just saw this! It works perfectly, thank you so much Smile
Post 08 Oct 2024, 04:09
View user's profile Send private message Reply with quote
MateoConLechgua



Joined: 30 Sep 2024
Posts: 5
MateoConLechgua 11 Oct 2024, 00:41
I have one more split lock that appears to trigger on certain inputs:

Code:
x86/split lock detection: #AC: fasmg/212583 took a split_lock trap at address: 0x408323    


Would be great to get this one patched up too Smile
Post 11 Oct 2024, 00:41
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8354
Location: Kraków, Poland
Tomasz Grysztar 11 Oct 2024, 06:32
This, on the other hand, is something that got misaligned while it should not be. I corrected it in "g.kl0e".
Post 11 Oct 2024, 06:32
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.