flat assembler
Message board for the users of flat assembler.
  
|  Index
      > Main > fasmg extremely slow on linux with x86/split lock detection | 
| Author | 
 | 
| MateoConLechgua 30 Sep 2024, 06:39 Hello, fasmg currently runs extremely slowly on linux with recent kernel changes to support x86 split lock detection. With fasmg version g.kd3c I get this message in the system logs:
 Code: x86/split lock detection: #AC: fasmg/78798 took a split_lock trap at address: 0x4031ea It seems that newer kernels imposed a penalty on programs that utilize split locks, and makes them run really slowly now: https://lwn.net/Articles/911219/ I would greatly appreciate any help on fixing this issue! | |||
|  30 Sep 2024, 06:39 | 
 | 
| revolution 30 Sep 2024, 11:31 fasmg.sh     Code: #!/bin/bash sysctl split_lock_mitigate=0 ; fasmg "$@" ; sysctl split_lock_mitigate=1 | |||
|  30 Sep 2024, 11:31 | 
 | 
| MateoConLechgua 30 Sep 2024, 15:16 Thank you for the reply! Yes, I realize that the sysctl and kernel boot parameters exist, the main issue is that fasmg is distributed as the assembler/linker for a toolchain and thus we need to have users perform these operations now as root whereas before that wasn't needed. I am curious though why locks are needed, isn't fasmg just a single core process? Just confused why accesses would have to be atomic in the first place. | |||
|  30 Sep 2024, 15:16 | 
 | 
| revolution 30 Sep 2024, 15:21 It is probably an xchg instruction.     Intel wrote: If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. | |||
|  30 Sep 2024, 15:21 | 
 | 
| revolution 30 Sep 2024, 15:26 You could try rewriting the xchg's with equivalent code.     Code: ; xchg eax,[mem] push ebx mov ebx,[mem] ; if mem=esp then mov ebx,[mem+4] mov [mem],eax ; if mem=esp then mov [mem+4],eax mov eax,ebx pop ebx | |||
|  30 Sep 2024, 15:26 | 
 | 
| Furs 30 Sep 2024, 15:40 revolution wrote: You could try rewriting the xchg's with equivalent code. | |||
|  30 Sep 2024, 15:40 | 
 | 
| MateoConLechgua 30 Sep 2024, 15:44 I think it was just an example, could just use three xors too. I'll see about changing those, thank you! | |||
|  30 Sep 2024, 15:44 | 
 | 
| revolution 30 Sep 2024, 16:22 Furs wrote: If you're going to push/pop anyway, why not just push and pop directly with memory operand? - With push reg/push mem/pop reg/pop mem that is six mem ops, two of those unaligned - With push/mov/mov/pop that is four mem ops, two of those unaligned. The the last one has fewer mem ops, and minimises the unaligned accesses. It needs to be tested, but often minimising mem ops, and especially minimising non-aligned mem ops, can be a great efficiency win, and maybe also a runtime win. | |||
|  30 Sep 2024, 16:22 | 
 | 
| Furs 01 Oct 2024, 14:41 revolution wrote: 
 Code: ; xchg eax, [mem] push dword [mem] mov [mem], eax pop eax | |||
|  01 Oct 2024, 14:41 | 
 | 
| macomics 01 Oct 2024, 16:54 Code: push dword [mem] | |||
|  01 Oct 2024, 16:54 | 
 | 
| revolution 01 Oct 2024, 18:11 Furs wrote: I mean just:   Four mem ops, two unaligned. | |||
|  01 Oct 2024, 18:11 | 
 | 
| revolution 01 Oct 2024, 18:12 macomics wrote: 
 | |||
|  01 Oct 2024, 18:12 | 
 | 
| Ville 01 Oct 2024, 19:36 MateoConLechgua wrote: x86/split lock detection: #AC: fasmg/78798 took a split_lock trap at address: 0x4031ea Code: 0x4031df: 0x67 0xf6 0x43 0x01 0x01 test [ebx+0x1],byte 0x1 0x4031e4: 0x74 0x12 je 0x12 0x4031e6: 0x67 0x8d 0x46 0xfc lea eax,[esi-0x4] 0x4031ea: 0x67 0x87 0x02 xchg [edx],eax 0x4031ed: 0x67 0x8d 0x56 0xfc lea edx,[esi-0x4] | |||
|  01 Oct 2024, 19:36 | 
 | 
| Tomasz Grysztar 02 Oct 2024, 06:57 It seems I exaggerated the issue. I reviewed 107 XCHG instructions and only two of them deal with the aforementioned packed blocks. I replaced them both and released the new version "g.kkod".
 Please let me know if there are more traps, I may have missed something. | |||
|  02 Oct 2024, 06:57 | 
 | 
| macomics 02 Oct 2024, 08:07 Tomasz Grysztar wrote: Please let me know if there are more traps, I may have missed something. Code: $ dmesg | grep 'Linux version' [ 0.000000] Linux version 6.6.52-calculate (root@localhost) (gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614, GNU ld (Gentoo 2.42 p3) 2.42.0) #1 SMP PREEMPT_DYNAMIC Tue Sep 24 21:17:50 +04 2024 $ cat /proc/cpuinfo | grep 'model name' | head --lines=1 - model name : 11th Gen Intel(R) Core(TM) i5-11600KF @ 3.90GHz $ fasmg fasmg.asm ./fasmg flat assembler version g.kd3c 5 passes, 0.4 seconds, 70638 bytes. $ ./fasmg fasmg.asm ./fasmg.x64 flat assembler version g.kkod 5 passes, 0.4 seconds, 70638 bytes. $ dmesg | grep 'split lock detection' $ | |||
|  02 Oct 2024, 08:07 | 
 | 
| MateoConLechgua 08 Oct 2024, 04:09 Just saw this! It works perfectly, thank you so much   | |||
|  08 Oct 2024, 04:09 | 
 | 
| MateoConLechgua 11 Oct 2024, 00:41 I have one more split lock that appears to trigger on certain inputs:
 Code: x86/split lock detection: #AC: fasmg/212583 took a split_lock trap at address: 0x408323 Would be great to get this one patched up too  | |||
|  11 Oct 2024, 00:41 | 
 | 
| Tomasz Grysztar 11 Oct 2024, 06:32 This, on the other hand, is something that got misaligned while it should not be. I corrected it in "g.kl0e". | |||
|  11 Oct 2024, 06:32 | 
 | 
| < Last Thread | Next Thread > | 
| Forum Rules: 
 | 
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.