flat assembler
Message board for the users of flat assembler.

Index > Unix > [solved] to report malloc bug in fasmg on macOS

Author
Thread Post new topic Reply to topic
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 18 Sep 2021, 08:04
Latest Monterey beta causes bug in fasmg x64:
bmaxa@Branimirs-Air asmFish % fasmg "x86/fish.asm" "asmfish" -e 100 -i "VERSION_OS='X'" -i "VERSION_POST = 'popcnt'"
Error: not enough memory to complete the assembly.

but:
Code:
bmaxa@Branimirs-Air memtest % cat mem.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
  for (long int i=4096;;i*=2){
    void *p = malloc(i);
    if (p)free(p);
    else {
      printf("max %ld\n", i);
      break;
    }
  }
}
    

bmaxa@Branimirs-Air memtest % ./mem
max 140737488355328
bmaxa@Branimirs-Air memtest % symbols mem
mem [arm64, 0.250362 seconds]:
9599C034-0152-3550-9A7F-01E288534647 /Users/bmaxa/projects/memtest/mem [AOUT, PIE, FaultedFromDisk, MMap64]
0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
0x0000000100000000 ( 0x4000) __TEXT SEGMENT
0x0000000100000000 ( 0x3ed8) MACH_HEADER
0x0000000100003ed8 ( 0x74) __TEXT __text
0x0000000100003ed8 ( 0x74) main [FUNC, EXT, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
0x0000000100003f4c ( 0x24) __TEXT __stubs
0x0000000100003f4c ( 0xc) DYLD-STUB$$free [DYLD-STUB, LENGTH, NameNList, MangledNameNList, NList]
0x0000000100003f58 ( 0xc) DYLD-STUB$$malloc [DYLD-STUB, LENGTH, NameNList, MangledNameNList, NList]
0x0000000100003f64 ( 0xc) DYLD-STUB$$printf [DYLD-STUB, LENGTH, NameNList, MangledNameNList, NList]
0x0000000100003f70 ( 0x3c) __TEXT __stub_helper
0x0000000100003fac ( 0x9) __TEXT __cstring
0x0000000100003fb8 ( 0x48) __TEXT __unwind_info
0x0000000100004000 ( 0x4000) __DATA_CONST SEGMENT
0x0000000100004000 ( 0x8) __DATA_CONST __got
0x0000000100008000 ( 0x4000) __DATA SEGMENT
0x0000000100008000 ( 0x18) __DATA __la_symbol_ptr
0x0000000100008018 ( 0x8) __DATA __data
0x0000000100008018 ( 0x8) _dyld_private [NameNList, MangledNameNList, NList]
0x000000010000c000 ( 0x4000) __LINKEDIT SEGMENT

so I guess it calls wrong version of malloc, or somehow goes beyond limit.
Macbook Air 13" M1 processor.
Post 18 Sep 2021, 08:04
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 18 Sep 2021, 08:08
This most likely means that it was not able to get an allocation in 32-bit addressable space, because this is what it needs to function.
Post 18 Sep 2021, 08:08
View user's profile Send private message Visit poster's website Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 18 Sep 2021, 08:18
Tomasz Grysztar wrote:
This most likely means that it was not able to get an allocation in 32-bit addressable space, because this is what it needs to function.


Worked before on earlier beta versions, should I report this to Apple?

edit:
reported, it is their problem, as it worked before
Post 18 Sep 2021, 08:18
View user's profile Send private message Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 04 Nov 2021, 12:15
reply from Apple:
Quote:

After reviewing your feedback, we have some additional information for you, or some additional information, or action is necessary for this issue:

We make no guarantee that returned addresses will fit in 32-bit space. If fasmg can't handle that, it needs to change.

All in all I will start to work on translation to aarch64, so it will be covered, no worry Razz
Post 04 Nov 2021, 12:15
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 31 Dec 2021, 17:22
I came up with a new possible solution for the cases when malloc refuses to return memory in 32-bit space. I have already applied it to the x64 Linux version of fasmg - I switched to my own implementation of malloc for fasmg (originally written for the purpose of porting to KolibriOS and DOS), which in turn uses valloc to request larger blocks of memory from OS interface.

The x64 Linux implementation of valloc uses sys_mmap, hoping that either the MAP_32BIT flag or address hint are going to be respected:
Code:
valloc:
; in: ecx = requested minimum size
; out: eax - allocated block, ecx = allocated size, zero if failed
; preserves: ebx, esi, edi
        cmp     ecx,VALLOC_MINIMUM_SIZE
        jbe     valloc_size_minimum
        dec     ecx
        and     ecx,(-1) shl 12
        add     ecx,1 shl 12
        jmp     valloc_size_ready
    valloc_size_minimum:
        mov     ecx,VALLOC_MINIMUM_SIZE
    valloc_size_ready:
        push    rbx rsi rdi
        cmp     [local_heap_available],0
        je      valloc_mmap
        cmp     ecx,LOCAL_HEAP_SIZE
        ja      valloc_mmap
        and     [local_heap_available],0
        mov     eax,local_heap
        mov     ecx,LOCAL_HEAP_SIZE
        jmp     valloc_ok
    valloc_mmap:
        xor     r9d,r9d
        or      r8,-1
        mov     r10d,62h                ; MAP_PRIVATE + MAP_ANONYMOUS + MAP_32BIT
        mov     edx,3                   ; PROT_READ + PROT_WRITE
        mov     esi,ecx
        xor     edi,edi
        mov     eax,9                   ; sys_mmap
        syscall
        cmp     eax,-1
        je      valloc_mmap_with_hint
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_mmap_unusable
        add     ecx,esi
        jnc     mmap_ok
    valloc_mmap_unusable:
        mov     rdi,rax
        mov     eax,11                  ; sys_munmap
        syscall
    valloc_mmap_with_hint:
        mov     r10d,22h                ; MAP_PRIVATE + MAP_ANONYMOUS
        mov     edx,3                   ; PROT_READ + PROT_WRITE
        mov     edi,[mmap_hint]
        mov     eax,9                   ; sys_mmap
        syscall
        cmp     eax,-1
        je      valloc_failed
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_failed
        add     ecx,esi
        jc      valloc_failed
    mmap_ok:
        sub     ecx,eax
    valloc_ok:
        lea     edx,[eax+ecx]
        mov     [mmap_hint],edx
        pop     rdi rsi rbx
        retn
    valloc_failed:
        xor     ecx,ecx
        pop     rdi rsi rbx
        retn    
But it has an additional protection built-in - even if sys_mmap would be unwilling to cooperate, initially it can use the fixed heap defined in the BSS section:
Code:
segment readable writeable

  align 1000h

  LOCAL_HEAP_SIZE = 1000000h

  local_heap rb LOCAL_HEAP_SIZE    
This is guaranteed to be in 32-bit memory space simply because the program is not PIE, so the segment has a fixed address. And 16 MB heap is enough for fasmg self-hosting, so even in the worst case it would be at least possible for fasmg to reassemble itself with larger heap.

I believe the same trick could be used for other systems, but I would need some testers. Can you help?
Post 31 Dec 2021, 17:22
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 25 Feb 2022, 23:08
Hi Tomasz,

We encountered this issue as well, and some workaround I've found that seems to work for most macOS users is to run fasmg with GuardMalloc (i.e. `DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib fasmg`), or (but that's more annoying since it requires root) under `dtruss`.

I guess that changes enough of the malloc behaviour that it makes it happy and the "not enough memory" error is gone.

Since we wanted to actually solve the issue rather than work around it, I've taken a look at this topic and tried to adapt your linux changes with valloc/mmap to macOS, and I may be close, but the mmap (sys)call isn't something I've adapted yet so it obviously crashes for now. Apparently registers are not the same, plus there's this whole 0x2000000 offset thing to take care of, but anyway Apple says calling mmap directly is much safer than syscall'ing, so maybe a ccall here would be better (if this is even possible at this point).

Here's the current state of things, let me know if you have any idea, and I'll be glad to test and report back: https://github.com/tgrysztar/fasmg/compare/master...adriweb:mac_x64_valloc (I guess some of the valloc stuff should be refactored in a common part...?)

Thanks
Post 25 Feb 2022, 23:08
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 26 Feb 2022, 09:40
Adriweb wrote:
Since we wanted to actually solve the issue rather than work around it, I've taken a look at this topic and tried to adapt your linux changes with valloc/mmap to macOS, and I may be close, but the mmap (sys)call isn't something I've adapted yet so it obviously crashes for now.

It should not be using syscalls (and therefore crashing) until it needs more memory than the initial heap provides. Make sure that [local_heap_available] variable is initialized, Linux version does it in "system_init" routine:
Code:
system_init:
        ; ...
        or      [local_heap_available],1
        retn    
As long as it uses the pre-allocated heap, no syscalls should happen for small sources (even fasmg self-hosting).

Adriweb wrote:
Apparently registers are not the same, plus there's this whole 0x2000000 offset thing to take care of, but anyway Apple says calling mmap directly is much safer than syscall'ing, so maybe a ccall here would be better (if this is even possible at this point).
Yes, I think that would be the best route to try. Just add "mmap" and "munmap" to the import statement in dynamic section, and replace the syscalls like this:
Code:
    valloc_mmap:
        push    rcx
        ccall   mmap,0,rcx, \
                        3, \            ; PROT_READ + PROT_WRITE
                        62h, \          ; MAP_PRIVATE + MAP_ANONYMOUS + MAP_32BIT
                        -1,0
        pop     rsi
        cmp     eax,-1
        je      valloc_mmap_with_hint
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_mmap_unusable
        add     ecx,esi
        jnc     mmap_ok
    valloc_mmap_unusable:
        ccall   munmap,rax,rsi
    valloc_mmap_with_hint:
        push    rcx
        mov     edi,[mmap_hint]
        ccall   mmap,rdi,rcx, \
                        3, \            ; PROT_READ + PROT_WRITE
                        22h, \          ; MAP_PRIVATE + MAP_ANONYMOUS
                        -1,0
        pop     rsi
        cmp     eax,-1
        je      valloc_failed
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_failed
        add     ecx,esi
        jc      valloc_failed    
I'm writing it on the go and I have no way of testing it, so I give no guarantees - please let me know if that helps.
Post 26 Feb 2022, 09:40
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 26 Feb 2022, 11:39
Alright, this seems to have worked!
https://github.com/adriweb/fasmg/commit/decc52adb42df0c9293eb1606b1a3669b1d59e38

Selfhost looks good, a diff of both output is empty.

That said, I still get "Error: not enough memory to complete the assembly." on a bigger source, so I'm not sure how to investigate further now.

In addition, the GuarMalloc workaround doesn't work anymore with this, but I guess that's because it acted on malloc, not mmap.

I found that mmap may ignore MAP_32BIT, so maybe that's related? (well, see https://stackoverflow.com/a/69792033/378298)
Post 26 Feb 2022, 11:39
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 26 Feb 2022, 11:48
Adriweb wrote:
That said, I still get "Error: not enough memory to complete the assembly." on a bigger source, so I'm not sure how to investigate further now.

In addition, the GuarMalloc workaround doesn't work anymore with this, but I guess that's because it acted on malloc, not mmap.

I found that mmap may ignore MAP_32BIT, so maybe that's related? (well, see https://stackoverflow.com/a/69792033/378298)
Yes, there is no reliable cross-platform way to force it to get memory in low 4G space. That's why my code has a backup call to mmap, which tries to use address hint instead, but this also may fail.

Do the mmap calls always fail for you, or are they just inconsistent? We could experiment further, trying to find mmap settings that would work.

Also, there is always an option to assemble fasmg with bigger local heap. The default one is big enough to allow fasmg re-assembling itself, jump-starting the process.
Post 26 Feb 2022, 11:48
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 26 Feb 2022, 16:27
It seems like it's consistently failing, on my end.
Looking at it under dtruss, I see the following:
Code:
60087/0x9cf7fa:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x105306000 0
60088/0x9cf81c:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x1017000 0
60088/0x9cf81c:  mmap(0x0, 0x100000, 0x3, 0x40062, 0xFFFFFFFFFFFFFFFF, 0x0)              = 0xFFFFFFFFFFFFFFFF 9
60088/0x9cf81c:  mmap(0x1014000, 0x7FF806E8D59E, 0x3, 0x40022, 0xFFFFFFFFFFFFFFFF, 0x0)          = 0xFFFFFFFFFFFFFFFF 9    


How would I test your suggestions?
Post 26 Feb 2022, 16:27
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 26 Feb 2022, 16:38
Adriweb wrote:
Code:
60087/0x9cf7fa:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x105306000 0
60088/0x9cf81c:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x1017000 0
60088/0x9cf81c:  mmap(0x0, 0x100000, 0x3, 0x40062, 0xFFFFFFFFFFFFFFFF, 0x0)              = 0xFFFFFFFFFFFFFFFF 9
60088/0x9cf81c:  mmap(0x1014000, 0x7FF806E8D59E, 0x3, 0x40022, 0xFFFFFFFFFFFFFFFF, 0x0)          = 0xFFFFFFFFFFFFFFFF 9    
It seems that MAP_32BIT might be unsupported on this machine. But it also shows at least one bug in the code I posted previously, the allocation size is corrupted in the second call. Please try the following:
Code:
    valloc_mmap:
        push    rcx
        ccall   mmap,0,rcx, \
                        3, \            ; PROT_READ + PROT_WRITE
                        62h, \          ; MAP_PRIVATE + MAP_ANONYMOUS + MAP_32BIT
                        -1,0
        pop     rsi
        cmp     eax,-1
        je      valloc_mmap_with_hint
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_mmap_unusable
        add     ecx,esi
        jnc     mmap_ok
    valloc_mmap_unusable:
        ccall   munmap,rax,rsi
    valloc_mmap_with_hint:
        push    rsi
        mov     edi,[mmap_hint]
        ccall   mmap,rdi,rsi, \
                        3, \            ; PROT_READ + PROT_WRITE
                        22h, \          ; MAP_PRIVATE + MAP_ANONYMOUS
                        -1,0
        pop     rsi
        cmp     eax,-1
        je      valloc_failed
        mov     ecx,eax
        cmp     rcx,rax
        jne     valloc_failed
        add     ecx,esi
        jc      valloc_failed    
Post 26 Feb 2022, 16:38
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 26 Feb 2022, 18:10
Error is still here, although the 2nd argument to the last mmap seems more valid now:
Code:
67683/0x9e5bc4:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x10A488000 0
67684/0x9e5bcc:  mmap(0x0, 0x2000, 0x1, 0x40001, 0x3, 0x0)               = 0x1017000 0
67684/0x9e5bcc:  mmap(0x0, 0x100000, 0x3, 0x40062, 0xFFFFFFFFFFFFFFFF, 0x0)              = 0xFFFFFFFFFFFFFFFF 9
67684/0x9e5bcc:  mmap(0x1014000, 0x100000, 0x3, 0x40022, 0xFFFFFFFFFFFFFFFF, 0x0)                = 0xFFFFFFFFFFFFFFFF 9    
Post 26 Feb 2022, 18:10
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 26 Feb 2022, 18:18
Can you experiment with different flag combinations, maybe without MAP_PRIVATE? Return value -1 suggests that mmap doesn't like some of the argument values.
Post 26 Feb 2022, 18:18
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 26 Feb 2022, 18:49
Alright, it looks like the values for the PROT_* were a bit different, and now everything seems to work!

Code:
                        9002h, \          ; MAP_PRIVATE + MAP_ANONYMOUS + MAP_32BIT
...
                        1002h, \          ; MAP_PRIVATE + MAP_ANONYMOUS
    


Thanks a lot Smile

I've updated my fork just in case https://github.com/adriweb/fasmg/commit/46d532e84d567b3017fb067e3ad977e5d34ed712
Post 26 Feb 2022, 18:49
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8034
Location: Kraków, Poland
Tomasz Grysztar 26 Feb 2022, 18:55
Thank you! I'm updating the official version with this solution, plus a couple tiny improvements. Please confirm whether the version I publish works correctly.
Post 26 Feb 2022, 18:55
View user's profile Send private message Visit poster's website Reply with quote
Adriweb



Joined: 19 Oct 2019
Posts: 8
Location: France
Adriweb 28 Feb 2022, 08:55
Can confirm, everything works fine here Smile
Post 28 Feb 2022, 08:55
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.