flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Linux > fasm as 64-bit ELF executable

Author
Thread Post new topic Reply to topic
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland
fasm as 64-bit ELF executable
While 64-bit systems are in general capable of running a 32-bit executables, there are some Linux builds out there that have this feature disabled in the kernel. This makes it impossible to run fasm on such machines without writing a 64-bit version from scratch. Or does it?

I've been thinking about this for a few years already, and I had some ideas on how to run fasm's core in "use64" setting. Now I finally decided to confront my ideas with reality.

The trick lies in the fact that 32-bit addressing is still available in long mode. As long as the address fits in 32 bits, [ebx] can be used instead of [rbx] and the instruction is going to be correctly encoded and executed. Therefore it should suffice to ensure that all memory allocated for fasm's use lies in the low 4G of addressing space. And fortunately sys_brk, which fasm's Linux interface uses to pre-allocate all the memory that fasm is going to use, appears to do exactly that - it grows the program segment (which we can put at a low address) so the allocated memory should stay in the low addressing.

But there is another problem - there is no 32-bit PUSH/POP in long mode. These two instructions need to be replaced with macros that emulate 32-bit stack. For example "push eax" has to become:

Code:
lea rsp,[rsp-4]
mov [rsp],eax

It has to be LEA instead of SUB, because emulation of PUSH needs to preserve flags. Also fasm sometimes accesses PUSHed values with instructions like "mov eax,[esp]". Because the stack cannot be assumed to reside in low memory, RSP has to be used for all such addressing. A simple "esp equ rsp" re-definition does it, but a few special cases need to be handled in macros. In some places fasm stores the value of ESP into a variable and later compares this stored value with current ESP. There is no need to use RSP there, since ESP is enough to uniquely determine the position on the stack. But I also noticed that where fasm checks for stack overflow, it uses a sub-optimal code that may not work correctly when ESP is wrapping. I'm going to change these checks in the next releases.

Similarly, the JMP/CALL instructions with target specified by 32-bit variable need to be emulated. And there are also two special cases, "call directive_handler" and "call define_data", where fasm does not use RET but treats these instructions as a kind of PUSH to store 32-bit address of a procedure on the stack. These two cases need to be recognized by macros, though I'm considering cleaning up fasm's code to get rid of these tricks - they are not really needed there, and they add another unnecessary layer of confusion.

Lastly, fasm uses a couple of instructions that are not available in long mode - SALC and JCXZ. Again, emulating them with macros is possible. I checked that fasm does not rely on them preserving flags, so the macros can be kept simple.

This led me to the creation of the following set of macros that allow to assemble fasm's core as a long-mode-compatible code:

Code:

esp equ +rsp

macro pushD arg
{
        lea rsp,[rsp-4]
        if arg eqtype eax
                mov dword [rsp],arg
        else
                mov r8d,dword arg
                mov [rsp],r8d
        end if
}

macro popD arg
{
        if arg eqtype [mem]
                mov r8d,[rsp]
                mov dword arg,r8d
        else
                mov arg,dword [rsp]
        end if
        lea rsp,[rsp+4]
}

macro add dest,src
{
        if dest eq esp
                add rsp,src
        else
                add dest,src
        end if
}

macro mov dest,src
{
        if src eq esp
                mov dest,ESP
        else
                mov dest,src
        end if
}

macro cmp dest,src
{
        if dest eq esp
                cmp ESP,src
        else
                cmp dest,src
        end if
}

macro use32
{

        macro push args
        \{
                define arg@push
                irps symargs \\{
                        define status@push
                        match =dwordsym \\\{
                                define status@push 1
                        \\\}
                        match [anystatus@push arg@push sym \\\{
                                define arg@push [any
                                match [mem], arg@push \\\\{
                                        pushD [mem]
                                        define arg@push
                                \\\\}
                                define status@push 1
                        \\\}
                        match [, status@push arg@push sym \\\{
                                define arg@push [
                                define status@push 1
                        \\\}
                        match , status@push \\\{
                                pushD sym
                        \\\}
                \\}
        \}

        macro pop args
        \{
                define arg@pop
                irps symargs \\{
                        define status@pop
                        match =dwordsym \\\{
                                define status@pop 1
                        \\\}
                        match [anyarg@pop sym \\\{
                                define arg@pop [any
                                match [mem], arg@pop \\\\{
                                        popD [mem]
                                        define arg@pop
                                \\\\}
                                define status@pop 1
                        \\\}
                        match [, status@pop arg@pop sym \\\{
                                define arg@pop [
                                define status@pop 1
                        \\\}
                        match , status@pop \\\{
                                popD sym
                        \\\}
                \\}
        \}

        macro jmp arg
        \{
                if arg eq near eax
                        jmp near rax
                else if arg eq near edx
                        jmp near rdx
                else if arg eqtype [mem]
                        mov r8d,arg
                        jmp near r8
                else
                        jmp arg
                end if
        \}

        macro call arg
        \{
                if arg eq define_data | arg eq directive_handler        ; special cases that do not use RET
                        \local next
                        pushD next
                        if arg <> $
                                jmp arg
                        end if
                        next:
                else if 1
                match =near =dword [mem], arg \\{
                        mov r8d,[mem]
                        call near r8
                else
                \\}
                        call arg
                end if
        \}

        macro salc              ; for fasm's core it does not need to preserve flags
        \{
                setc al
                neg al
        \}

        macro jcxz target       ; for fasm's core it does not need to preserve flags
        \{
                test cx,cx
                jz target
        \}

        use64

}

macro use16
{

        purge push,pop,jmp,call,salc,jcxz

        use16

}

use32

The macros are organized this way to handle the USE16/USE32 switch in the formatter code (it is there to generate the PE stub). The "use16" macro disables some of the instruction emulation macros while entering the 16-bit mode - then "use32" switches back long mode with the emulation macros.

Of course, the interface code needed to be adapted to x64 environment, too. Since the command line parameters may reside in the high memory, the ones that are used by the core have to be copied into a low memory area. I also thought that syscalls may need additional code to align the stack (because 32-bit PUSH/POP emulation causes it to often be terribly misaligned for a 64-bit world standards), but apparently it works correctly without it (though it definitely would be required in 64-bit Windows).

I'm attaching the additional source files (that need to be merged into fasm's SOURCE directory from the regular downloads) and the assembled 64-bit ELF executable. It seems to work correctly in my tests, but please let me know if you find any instability.


Description: fasm 1.71.57 as 64-bit ELF executable
Download
Filename: fasmon64.zip
Filesize: 72.29 KB
Downloaded: 43 Time(s)



Last edited by Tomasz Grysztar on 08 Dec 2016, 19:24; edited 1 time in total
Post 07 Dec 2016, 19:49
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland
BTW, it almost hurt to write non-trivial macros for fasm 1 after a period of writing only ones for fasmg. At first I cringed a few times, but then I got it going.
Post 07 Dec 2016, 19:55
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 251
Location: Australia
This is very cool, thanks for your painful effort! I have run non-multiarch platforms before, and in some cases they make for very nice minimalist environments.

Are there any standard distros that are not multilib?
Post 07 Dec 2016, 21:56
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland

redsock wrote:
Are there any standard distros that are not multilib?

Note that not having 32-bit libc available does not necessarily entail having kernel with no support for 32-bit executables. fasm's executable uses syscalls only and it is able to run on systems with no 32-bit libraries, as long as the kernel supports loading 32-bit ELF at all.
Post 07 Dec 2016, 22:08
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland
I have updated the attachment with an improved version, I rewrote the 32-bit stack emulation in such way that PUSH/POP with multiple arguments in single line generates just one LEA instruction:

Code:

esp equ +rsp

macro pushD [arg]
{
        common
                local offset,total
                offset = 0
                lea rsp,[rsp-total]
        forward
                offset = offset + 4
                if arg eqtype eax
                        mov dword [rsp+total-offset],arg
                else
                        mov r8d,dword arg
                        mov [rsp+total-offset],r8d
                end if
        common
                total = offset
}

macro popD [arg]
{
        common
                local offset
                offset = 0
        forward
                if arg eqtype [mem]
                        mov r8d,[rsp+offset]
                        mov dword arg,r8d
                else
                        mov arg,dword [rsp+offset]
                end if
                offset = offset + 4
        common
                lea rsp,[rsp+offset]
}

macro add dest,src
{
        if dest eq esp
                add rsp,src
        else
                add dest,src
        end if
}

macro mov dest,src
{
        if src eq esp
                mov dest,ESP
        else
                mov dest,src
        end if
}

macro cmp dest,src
{
        if dest eq esp
                cmp ESP,src
        else
                cmp dest,src
        end if
}

macro use32
{

        macro push args
        \{
                local list,arg,status
                define list
                define arg
                irps symargs \\{
                        define status
                        match =dwordsym \\\{
                                define status :
                        \\\}
                        match [anystatus arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previouslist \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previouslist \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arglist \\{ pushD arg \\}
        \}

        macro pop args
        \{
                local list,arg,status
                define list
                define arg
                irps symargs \\{
                        define status
                        match =dwordsym \\\{
                                define status :
                        \\\}
                        match [anystatus arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previouslist \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previouslist \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arglist \\{ popD arg \\}
        \}

        macro jmp arg
        \{
                if arg eq near eax
                        jmp near rax
                else if arg eq near edx
                        jmp near rdx
                else if arg eqtype [mem]
                        mov r8d,arg
                        jmp near r8
                else
                        jmp arg
                end if
        \}

        macro call arg
        \{
                if arg eq define_data | arg eq directive_handler        ; special cases that do not use RET
                        \local next
                        pushD next
                        if arg <> $
                                jmp arg
                        end if
                        next:
                else if 1
                match =near =dword [mem], arg \\{
                        mov r8d,[mem]
                        call near r8
                else
                \\}
                        call arg
                end if
        \}

        macro salc              ; for fasm's core it does not need to preserve flags
        \{
                setc al
                neg al
        \}

        macro jcxz target       ; for fasm's core it does not need to preserve flags
        \{
                test cx,cx
                jz target
        \}

        use64

}

macro use16
{

        purge push,pop,jmp,call,salc,jcxz

        use16

}

use32

Post 08 Dec 2016, 08:03
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14795
Location: Lost in translation

Tomasz Grysztar wrote:
I also thought that syscalls may need additional code to align the stack (because 32-bit PUSH/POP emulation causes it to often be terribly misaligned for a 64-bit world standards), but apparently it works correctly without it (though it definitely would be required in 64-bit Windows).

I have never seen any version of 64-bit Windows that can't run 32-bit code, so perhaps a Windows version is not very important. And with the enormous amount of 32-bit programs still around it would be unbelievable that support would ever be dropped. The only possible use case would be for files larger than ~3GB but that would also require rewriting all the addresses everywhere, so not really practical that I can see.

As for not writing the stack alignment code because it currently works on Linux: A quick search of the Linux specs shows "The stack pointer shall maintain 8-byte alignment", so it appears as though this will be no problem. Smile Unless, that is, Linux decides to break everything with the next release. Confused


Last edited by revolution on 08 Dec 2016, 16:09; edited 1 time in total
Post 08 Dec 2016, 08:55
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland

revolution wrote:
I have never seen any version of 64-bit Windows that can't run 32-bit code, so perhaps a Windows version is not very important. And with the enormous amount of 32-bit programs still around it would be unbelievable that support would ever be dropped. The only possible use case would be for files larger then ~3GB but that would also require rewriting all the addresses everywhere, so not really practical that I can see.

Also, a 32-bit code can be more compact, it is not always advantageous to use 64-bit one over it. It is good to have both options available, IMO.


revolution wrote:
As for not writing the stack alignment code because it currently works on Linux: A quick search of the Linux specs shows "The stack pointer shall maintain 8-byte alignment", so it appears as though this will be no problem. Smile Unless that is Linux decides to break everything with the next release. Confused

The problem is that fasm's emulation of 32-bit stack sometimes leaves it aligned on 4-byte but not 8-byte boundary. But even this seems to work with syscalls.
Post 08 Dec 2016, 09:03
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14795
Location: Lost in translation

Tomasz Grysztar wrote:
The problem is that fasm's emulation of 32-bit stack sometimes leaves it aligned on 4-byte but not 8-byte boundary. But even this seems to work with syscalls.

Okay, I should clarify. The 16-byte alignment requirement for Windows is because of the SSE/AVX instructions used in the kernel and the APIs. But if Linux says 8-byte alignment is okay then that suggests that it either doesn't use those instructions, or it uses the non-aligned equivalents. So either way, even a 1-byte alignment (although terrible) would probably still work fine.

But even so, stack alignment requirements do not affect internal calls within an application. Only API calls need to be aligned, so it would appear that just fasm.asm and system.inc would need to have the stack adjustment code.
Post 08 Dec 2016, 09:09
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland
Does anyone here run 64-bit OpenBSD somewhere? I wonder if this would work there.
Post 08 Dec 2016, 16:28
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 396
Will this in any way affect anything in elf.inc and import64.inc? These changes kind of scare me because I am relying on both for my SO.
Post 08 Dec 2016, 16:33
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14795
Location: Lost in translation

fasmnewbie wrote:
Will this in any way affect anything in elf.inc and import64.inc?

No.
Post 08 Dec 2016, 16:44
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2112
Location: Usono (aka, USA)

revolution wrote:
I have never seen any version of 64-bit Windows that can't run 32-bit code, so perhaps a Windows version is not very important.



Quoting here:


Microsoft wrote:

The Server Core installation option for Windows Server 2008 R2 allows you to uninstall WoW64. WoW64 is now an optional feature that you can uninstall if it is not necessary to run 32-bit code.



Probably somewhat rare, but who knows.


Señor Naïveté wrote:

And with the enormous amount of 32-bit programs still around it would be unbelievable that support would ever be dropped.



If decades of DOS software has been thrown away, why not also Win32?
Post 08 Dec 2016, 19:13
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland
I have cleaned up some things in fasm's core to make this adaptation a bit simpler and I included this 64-bit executable in the Linux package with the new 1.71.58 release. If I only made it available here, it is probable that the people that may find it useful would never find out about it.
Post 08 Dec 2016, 20:30
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14795
Location: Lost in translation

rugxulo wrote:

revolution wrote:
I have never seen any version of 64-bit Windows that can't run 32-bit code, so perhaps a Windows version is not very important.



Quoting here:


Microsoft wrote:

The Server Core installation option for Windows Server 2008 R2 allows you to uninstall WoW64. WoW64 is now an optional feature that you can uninstall if it is not necessary to run 32-bit code.



Probably somewhat rare, but who knows.

So 32-bit is still supported, but can be removed by the user if desired.

rugxulo wrote:

Señor Naïveté wrote:

And with the enormous amount of 32-bit programs still around it would be unbelievable that support would ever be dropped.


Are you sure it is Señor? Anyhow, it doesn't nullify my point that 32-bit code is still supported.

rugxulo wrote:
If decades of DOS software has been thrown away, why not also Win32?

Because DOS code hit hard limits very quickly, 32-bit code is still well within limits for many purposes. There may come a time when 32-bit falls out of favour, but I don't expect that will be any time soon.
Post 09 Dec 2016, 02:47
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477
interesting... Tomasz, how about that nasty Y2038 issue? Limiting syscall arguments to 32-bit may re-introduce that problem in 64-bit environment. timespec is 64-bit the last time I checked.
Post 09 Dec 2016, 10:35
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 6356
Location: Kraków, Poland

system error wrote:
interesting... Tomasz, how about that nasty Y2038 issue? Limiting syscall arguments to 32-bit may re-introduce that problem in 64-bit environment. timespec is 64-bit the last time I checked.

As long as Linux provides a syscall that can return 64-bit time value, fasm should have no problem in using that value.
Post 09 Dec 2016, 11:19
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3434
Location: Bulgaria
I missed this thread from the beginning, but it is remarkable. Very interesting workaround.
Post 09 Dec 2016, 11:42
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
CandyMan



Joined: 04 Sep 2009
Posts: 210
Location: film "CandyMan" directed through Bernard Rose
Tomasz whether it is possible to porting fasm to windows 64-bit (like you did for linux) ? Below is such a code correct?

Code:
close:
        call    PushAll                 ;\
        mov     rbp,rsp                 ;  > #1
        and     rsp,not 0xF             ;/
        invoke  CloseHandle,rbx
        mov     rsp,rbp                 ;\ #1
        call    PopAll                  ;/
        ret

#1 put this code in every invoked procedure through fasm

PushAll:
        xchg    r15,[rsp]       ;rip
        push    r14 r13 r12 r11 r10 r9 r8 rbp rsi rdi rdx rcx rbx rax
        push    r15
        mov     r15,[rsp+15*8]
        ret

PopAll:
        pop     r15             ;rip
        pop     rax rbx rcx rdx rdi rsi rbp r8 r9 r10 r11 r12 r13 r14
        xchg    [rsp],r15
        ret


_________________
smaller is better
Post 08 Feb 2017, 21:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14795
Location: Lost in translation

comrade wrote:
Don't you need to keep the stack aligned to 16-bytes on 64-bit? It is true for Win64 ABI, not sure if likewise for Linux64.

This was already discussed up-thread. The short answer is no for Linux. But even if it was a requirement (like with Win64) it only affects calls to the OS, not your internal code.

Edit: The original message has disappeared. Oh well.
Post 09 Feb 2017, 04:41
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.