flat assembler
Message board for the users of flat assembler.

Index > Linux > fasm as 64-bit ELF executable

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 09 Jul 2017, 13:40
YONG wrote:
BTW, where can I find a full list of fasm reserved names for the 64-bit compiler? Rolling Eyes
You can find complete lists in the TABLES.INC, they do not depend on the OS nor bitness.
Post 09 Jul 2017, 13:40
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2598
Furs 09 Jul 2017, 14:17
system error wrote:
how about that nasty Y2038 issue?
Off topic but, this reminds me of how much the media overblown the "Y2K" problem, which is sort of funny in its own way. Razz

I wonder how many people back when this was widespread (don't tell me it still is?) actually set their computer clock to 2038 and see how a black hole is formed? I was pretty disappointed myself as nothing terrible happened, computer didn't even freeze.

Though you should post that this version of Fasm should only really be used if your kernel can't run 32-bit executables (on very minimalist installs maybe), since it's quite bloated/slower with all the addressing mode prefixes and the "lea + mov" combination instead of push.
Post 09 Jul 2017, 14:17
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 09 Jul 2017, 16:12
Furs wrote:
Though you should post that this version of Fasm should only really be used if your kernel can't run 32-bit executables (on very minimalist installs maybe), since it's quite bloated/slower with all the addressing mode prefixes and the "lea + mov" combination instead of push.
Yes, definitely. This was my intention but I probably should have made this more clear - the only rational reason to use this version is when the kernel has 32-bit execution disabled. It gives no other advantages over the 32-bit version and suffers from the adjustments needed to run in the long mode.
Post 09 Jul 2017, 16:12
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 7997
Location: 22° 15' N | 114° 10' E
YONG 10 Jul 2017, 02:11
Tomasz Grysztar wrote:
YONG wrote:
BTW, where can I find a full list of fasm reserved names for the 64-bit compiler? Rolling Eyes
You can find complete lists in the TABLES.INC, they do not depend on the OS nor bitness.
Yes! Smile Thanks a lot!

Wink
Post 10 Jul 2017, 02:11
View user's profile Send private message Visit poster's website Reply with quote
keantoken



Joined: 19 Mar 2008
Posts: 69
keantoken 01 Aug 2017, 07:29
I read that IA64 CPUs and some old VIA CPUs aren't backwards compatible with 32-bit code.
Post 01 Aug 2017, 07:29
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 01 Aug 2017, 10:00
IA64 is the Itanium instruction set. It is not even close to IA32. Although the CPUs implementing IA64 do have a compatibility mode to run IA32 code (albeit slowly) within the IA64 OS.
Post 01 Aug 2017, 10:00
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 13 Feb 2019, 01:42
I think the x64 code can be enhanced. The goal to have it compile an existing 32-bit application for x64 mode with absolutely ZERO changes to the original source. Sounds impossible at first reading, right?

This goal is achieveable, somewhat, but not at a 100% success rate. For applications that use the kernel in a simple manner (like fasm) this goal should actually be quite easy. But as you will see later it is only "easy" if the application itself doesn't have an internal limitation.

So the plan is to use the "-d" feature on the command line to set things up in a manner suitable to compile the application as-is. So that means absolutely NO changes at all the the original source code. And if possible without any assumptions about what instructions the target application uses. An important point is to take care to ensure all the flags are correctly handled and not corrupted.

To do this fasm is invoked like this:
Code:
fasm <prefix_file> -dTARGET="'awesome_app.asm'" awesome_app    
Note the two sets of quotes around awesome_app.asm. This is a quirk with bash. Maybe other shells don't need this.

The prefix_file does all the setup and creates an environment with everything fitting into the 32-bit address space. It looks like this below. It is incomplete though. It is mostly a proof of concept. But it can be expanded to support more options if needed.
Code:
format elf64 executable 0 at 1 shl 16
entry X86_X64_begin
macro format ignore {}
macro entry address {X86_X64_ENTRY = address}

segment executable readable

macro int value {
        if value = 0x80
                mov     r12,rdi
                mov     r13,rsi
                mov     r14,rcx
                mov     r9,rbp
                mov     r8,rdi
                mov     r10,rsi
                mov     edi,ebx
                mov     esi,ecx
                movzx   eax,word[eax * 2 + X86_X64_syscall_translation_table]
                syscall
                mov     rcx,r14
                mov     rsi,r13
                mov     rdi,r12
        else
                int     value
        end if
}
macro X86_X64_pushD [arg] {
        common
                local offset,total
                offset = 0
                lea esp,[esp-total]
        forward
                offset = offset + 4
                if arg eqtype eax
                        mov dword [esp+total-offset],arg
                else
                        mov r11d,dword arg
                        mov [esp+total-offset],r11d
                end if
        common
                total = offset
}
macro X86_X64_popD [arg] {
        common
                local offset
                offset = 0
        forward
                if arg eqtype [mem]
                        mov r11d,[esp+offset]
                        mov dword arg,r11d
                else
                        mov arg,dword [esp+offset]
                end if
                offset = offset + 4
        common
                lea esp,[esp+offset]
}
macro use32 {
        macro push args \{
                local list,arg,status
                define list
                define arg
                irps sym, args \\{
                        define status
                        match =dword, sym \\\{
                                define status :
                        \\\}
                        match [any, status arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previous, list \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previous, list \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arg, list \\{ X86_X64_pushD arg \\}
        \}
        macro pop args \{
                local list,arg,status
                define list
                define arg
                irps sym, args \\{
                        define status
                        match =dword, sym \\\{
                                define status :
                        \\\}
                        match [any, status arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previous, list \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previous, list \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arg, list \\{ X86_X64_popD arg \\}
        \}
        macro pushfd \{
                pushfq
                POP     r11
                push    r11d
        \}
        macro popfd \{
                pop     r11d
                PUSH    r11
                popfq
        \}
        macro X86_X64_do_target target,instr \{
                if target eqtype [0]
                        mov     r10d,target
                        instr
                        JMP     r10
                else if target eqtype dword[0]
                        mov     r10d,target
                        instr
                        JMP     r10
            match =near =dword[addr],target \\{
                else if 1
                        mov     r10d,[addr]
                        instr
                        JMP     r10
            \\}
            irp reg,ax,bx,cx,dx,si,di,bp,sp \\{
                else if target eq near e\\#reg
                        instr
                        JMP     r\\#reg
                else if target eq e\\#reg
                        instr
                        JMP     r\\#reg
            \\}
                else
                        instr
                        JMP     target
                end if
        \}
        macro jmp target \{ X86_X64_do_target target \}
        macro call target \{
                \local ..return
                X86_X64_do_target target,push ..return
            ..return:
        \}
        macro ret \{
                pop     r11d
                jmp     r11
        \}
        macro retn \{ret\}
        macro das \{
                \local  ..adjust_by_6,..clear_AF,..low_nibble_done,..adjust_by_60,..high_nibble_okay
                pushfw
                mov     r11b,al
                shl     r11,8
                mov     r11b,al
                and     r11b,0xf
                cmp     r11b,9
                ja      ..adjust_by_6
                test    byte[esp],X86_X64_AF
                jz      ..clear_AF
            ..adjust_by_6:
                sub     al,6
                setc    r11b
                or      r11b,X86_X64_AF
                assert  X86_X64_CF = 0x01
                or      byte[esp],r11b
                JMP     ..low_nibble_done
            ..clear_AF:
                and     byte[esp],not X86_X64_AF
            ..low_nibble_done:
                cmp     r11w,0x99ff
                ja      ..adjust_by_60
                test    byte[esp],X86_X64_CF
                jz      ..high_nibble_okay
            ..adjust_by_60:
                sub     al,0x60
                or      byte[esp],X86_X64_CF
            ..high_nibble_okay:
                popfw
        \}
        macro loop target \{
                \local  ..dont_go
                if $ + 2 - target > 128 | $ + 2 - target < -127
                        pushfw
                        dec     ecx
                        jnz     ..dont_go
                        popfw
                        jmp     target
                    ..dont_go:
                        popfw
                else
                        loop    target
                end if
        \}
        macro jcxz target \{
                \local  ..dont_go
                pushfw
                test    cx,cx
                jnz     ..dont_go
                popfw
                jmp     target
            ..dont_go:
                popfw
        \}
        macro salc \{
                pushfw
                setc    al
                neg     al
                popfw
        \}
        USE64
}
macro use16 {
        purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc
        use16
}
macro use64 {
        purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc
        use64
}

X86_X64_CF                      = 0x01
X86_X64_AF                      = 0x10
X86_X64_STACK_TARGET            = 1 shl 32
X86_X64_STACK_SIZE              = 1 shl 22
X86_X64_PAGE_SIZE               = 1 shl 12
X86_X64_ENOMEM                  = 12
X86_X64_AT_SYSINFO_EHDR         = 33
X86_X64_MADV_NORMAL             = 0
X86_X64_PROT_READ               = 0x1
X86_X64_PROT_WRITE              = 0x2
X86_X64_MAP_PRIVATE             = 0x2
X86_X64_MAP_FIXED               = 0x10
X86_X64_MAP_ANONYMOUS           = 0x20
X86_X64_MAP_GROWSDOWN           = 0x100
X86_X64_MAP_FIXED_NOREPLACE     = 0x100000

X86_X64_SYS32_EXIT              = 1
X86_X64_SYS32_READ              = 3
X86_X64_SYS32_WRITE             = 4
X86_X64_SYS32_OPEN              = 5
X86_X64_SYS32_CLOSE             = 6
X86_X64_SYS32_TIME              = 13
X86_X64_SYS32_LSEEK             = 19
X86_X64_SYS32_MMAP              = 90
X86_X64_SYS32_BRK               = 45
X86_X64_SYS32_MADVISE           = 219

X86_X64_SYS64_READ              = 0
X86_X64_SYS64_WRITE             = 1
X86_X64_SYS64_OPEN              = 2
X86_X64_SYS64_CLOSE             = 3
X86_X64_SYS64_LSEEK             = 8
X86_X64_SYS64_MMAP              = 9
X86_X64_SYS64_BRK               = 12
X86_X64_SYS64_MADVISE           = 28
X86_X64_SYS64_EXIT              = 60
X86_X64_SYS64_TIME              = 201

X86_X64_syscall_translation_table rw 376

macro SYS_TRANSLATE [func] {forward store word X86_X64_SYS64_#func at X86_X64_SYS32_#func * 2 + X86_X64_syscall_translation_table}
        SYS_TRANSLATE EXIT,READ,WRITE,OPEN,CLOSE,TIME,LSEEK,MMAP,BRK,MADVISE
purge SYS_TRANSLATE

X86_X64_begin:
        xor     r9,r9                   ;offset
        or      r8,-1                   ;fd
        mov     r10,X86_X64_MAP_PRIVATE or X86_X64_MAP_ANONYMOUS or X86_X64_MAP_FIXED or X86_X64_MAP_GROWSDOWN or X86_X64_MAP_FIXED_NOREPLACE
        mov     edx,X86_X64_PROT_READ or X86_X64_PROT_WRITE
        mov     esi,X86_X64_STACK_SIZE
        mov     edi,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE
        mov     eax,X86_X64_SYS64_MMAP
        syscall
        cmp     eax,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE
        jnz     .failed

        mov     rdi,rsp
        and     rdi,-X86_X64_PAGE_SIZE
    .loop_find_top_of_stack:
        add     rdi,X86_X64_PAGE_SIZE
        mov     edx,X86_X64_MADV_NORMAL
        mov     esi,1
        mov     eax,X86_X64_SYS64_MADVISE
        syscall
        cmp     rax,-X86_X64_ENOMEM
        jnz     .loop_find_top_of_stack

        ;copy the stack into low memory
        mov     rcx,rdi
        lea     rsi,[rdi - 8]
        mov     edi,X86_X64_STACK_TARGET - 8
        sub     rcx,rsp
        shr     rcx,3
        std
        rep     movsq
        cld

        sub     rsi,rdi         ;rsi = conversion offset
        neg     rsi
        add     edi,8
        mov     esp,edi
        mov     edx,edi
        mov     ebx,2
    .convert_argv_env:
        mov     rax,[edi]
        cmp     rax,rsp
        lea     rcx,[rax + rsi]
        cmovae  rax,rcx
        mov     [edx],eax
        add     edi,8
        add     edx,4
        test    eax,eax
        jnz     .convert_argv_env
        dec     ebx
        jnz     .convert_argv_env

    .convert_auxv:
        ;note that the AT_SYSINFO_EHDR value won't be valid, so it gets removed
        mov     rax,[rdi]
        mov     rbx,[rdi + 8]
        cmp     rbx,rsp
        lea     rcx,[rbx + rsi]
        cmovae  rbx,rcx
        mov     [edx],eax
        mov     [edx + 4],ebx
        add     edi,16
        lea     ecx,[edx + 8]
        cmp     eax,X86_X64_AT_SYSINFO_EHDR
        cmovnz  edx,ecx
        test    eax,eax
        jnz     .convert_auxv
        jmp     X86_X64_ENTRY
    .failed:
        or      rdi,-1
        mov     eax,X86_X64_SYS64_EXIT
        syscall

use32
match f,TARGET {include f}    
All variables and labels are prefixed with "X86_X64_" so as to not clash with anything defined within the target application.

And it works. However like I mentioned, it doesn't work for everything. Notably it fails for fasm because of a problem with the increase in size. Specifically this error occurs
Code:
flat assembler  version 1.73.08  (4014015 kilobytes memory)
..\TABLES.INC [672]:
 dw adx_instruction-instruction_handler
processed: dw adx_instruction-instruction_handler
error: value out of range.    
The code expands to exceed the range of 64kB. Sad


Last edited by revolution on 14 Feb 2019, 05:58; edited 1 time in total
Post 13 Feb 2019, 01:42
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 13 Feb 2019, 11:12
It is possible to assemble fasm using this method. It just needs to be made a bit more memory efficient to overcome the 64kB limitation imposed by the structure of the TABLES.INC file.

So the first step is to make a common call gate which saves us a few bytes each time it is used.
Code:
        macro call target \{
                \local ..return
                if target eqtype [0]
                        mov     r10d,target
                        call    X86_X64_call_gate
                else if target eqtype dword[0]
                        mov     r10d,target
                        call    X86_X64_call_gate
            match =near =dword[addr],target \\{
                else if 1
                        mov     r10d,[addr]
                        call    X86_X64_call_gate
            \\}
            irp reg,eax,ebx,ecx,edx,esi,edi,ebp,esp \\{
                else if target eq near reg
                        mov     r10d,reg
                        call    X86_X64_call_gate
                else if target eq reg
                        mov     r10d,reg
                        call    X86_X64_call_gate
            \\}
                else
                        mov     r10d,target
                        call    X86_X64_call_gate
                end if
            ..return:
        \}

;...

X86_X64_call_gate:
        POP     r11
        push    r11d
        jmp     r10    
Then to create an int 0x80 gate
Code:
macro int value {
        if value = 0x80
                CALL    X86_X64_int_gate
        else
                int     value
        end if
}

;...

X86_X64_int_gate:
        mov     r12,rdi
        mov     r13,rsi
        mov     r14,rcx
        mov     r9,rbp
        mov     r8,rdi
        mov     r10,rsi
        mov     edi,ebx
        mov     esi,ecx
        movzx   eax,word[eax * 2 + X86_X64_syscall_translation_table]
        syscall
        mov     rcx,r14
        mov     rsi,r13
        mov     rdi,r12
        RET    
But still this wasn't enough. Although it does get a bit further before erroring so there is progress.

The next step is to move the line "include '..\exprcalc.inc'" up one place. Now it looks like this
Code:
;...
include '..\exprpars.inc'
include '..\exprcalc.inc'
include '..\assemble.inc'
include '..\formats.inc'
;...    
This gets further and leaves ~300 avx instructions that can't be reached with a 16-bit offset. But the first rule above is now broken, not to make changes to the code, hmm, oh well, what to do. This approach is more general IMO and less prone to bugs, so okay, just carry on and see how to get it working by making only minimal structure changes, no code logic changes.

The next step is to move formats.inc up one line. Now it looks like this
Code:
;...
include '..\exprpars.inc'
include '..\exprcalc.inc'
include '..\formats.inc'
include '..\assemble.inc'
include '..\x86_64.inc'
;...    
But with that include ordering we will break fasm because some of the labels are entry points from the instruction tables. It turns out there are only nine labels that are required to be in the correct place. So with a bit of EQU magic we have the final layout.
Code:
;...
include '..\exprpars.inc'
include '..\exprcalc.inc'

data_directive          equ moved_data_directive
heap_directive          equ moved_heap_directive
entry_directive         equ moved_entry_directive
extrn_directive         equ moved_extrn_directive
stack_directive         equ moved_stack_directive
format_directive        equ moved_format_directive
public_directive        equ moved_public_directive
section_directive       equ moved_section_directive
segment_directive       equ moved_segment_directive

include '..\formats.inc'

restore data_directive
restore heap_directive
restore entry_directive
restore extrn_directive
restore stack_directive
restore format_directive
restore public_directive
restore section_directive
restore segment_directive

include '..\assemble.inc'

data_directive:         jmp moved_data_directive
heap_directive:         jmp moved_heap_directive
entry_directive:        jmp moved_entry_directive
extrn_directive:        jmp moved_extrn_directive
stack_directive:        jmp moved_stack_directive
format_directive:       jmp moved_format_directive
public_directive:       jmp moved_public_directive
section_directive:      jmp moved_section_directive
segment_directive:      jmp moved_segment_directive

include '..\x86_64.inc'
;...    
And it assembles with this
Code:
fasm <prefix_file> -dTARGET="'fasm.asm'" fasm    
The file expands in size about 30kB. It works just fine. And there are no logic changes needed, just the layout changes to compensate for the 64kB limitation.

And the prefix_file is this
Code:
format elf64 executable 0 at 1 shl 16
entry X86_X64_begin
macro format ignore {}
macro entry address {X86_X64_ENTRY = address}

segment executable readable

macro int value {
        if value = 0x80
                CALL    X86_X64_int_gate
        else
                int     value
        end if
}
macro X86_X64_pushD [arg] {
        common
                local offset,total
                offset = 0
                lea esp,[esp-total]
        forward
                offset = offset + 4
                if arg eqtype eax
                        mov dword [esp+total-offset],arg
                else
                        mov r11d,dword arg
                        mov [esp+total-offset],r11d
                end if
        common
                total = offset
}
macro X86_X64_popD [arg] {
        common
                local offset
                offset = 0
        forward
                if arg eqtype [mem]
                        mov r11d,[esp+offset]
                        mov dword arg,r11d
                else
                        mov arg,dword [esp+offset]
                end if
                offset = offset + 4
        common
                lea esp,[esp+offset]
}
macro use32 {
        macro push args \{
                local list,arg,status
                define list
                define arg
                irps sym, args \\{
                        define status
                        match =dword, sym \\\{
                                define status :
                        \\\}
                        match [any, status arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previous, list \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previous, list \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arg, list \\{ X86_X64_pushD arg \\}
        \}
        macro pop args \{
                local list,arg,status
                define list
                define arg
                irps sym, args \\{
                        define status
                        match =dword, sym \\\{
                                define status :
                        \\\}
                        match [any, status arg sym \\\{
                                define arg [any
                                match [mem], arg \\\\{
                                        match previous, list \\\\\{ define list previous,[mem] \\\\\}
                                        match , list \\\\\{ define list [mem] \\\\\}
                                        define arg
                                \\\\}
                                define status :
                        \\\}
                        match [, status arg sym \\\{
                                define arg [
                                define status :
                        \\\}
                        match , status \\\{
                                match previous, list \\\\{ define list previous,sym \\\\}
                                match , list \\\\{ define list sym \\\\}
                        \\\}
                \\}
                match arg, list \\{ X86_X64_popD arg \\}
        \}
        macro pushfd \{
                pushfq
                POP     r11
                push    r11d
        \}
        macro popfd \{
                pop     r11d
                PUSH    r11
                popfq
        \}
        macro jmp target \{
                if target eqtype [0]
                        mov     r10d,target
                        JMP     r10
                else if target eqtype dword[0]
                        mov     r10d,target
                        JMP     r10
            match =near =dword[addr],target \\{
                else if 1
                        mov     r10d,[addr]
                        JMP     r10
            \\}
            irp reg,ax,bx,cx,dx,si,di,bp,sp \\{
                else if target eq near e\\#reg
                        JMP     r\\#reg
                else if target eq e\\#reg
                        JMP     r\\#reg
            \\}
                else
                        JMP     target
                end if
        \}
        macro call target \{
                \local ..return
                if target eqtype [0]
                        mov     r10d,target
                        call    X86_X64_call_gate
                else if target eqtype dword[0]
                        mov     r10d,target
                        call    X86_X64_call_gate
            match =near =dword[addr],target \\{
                else if 1
                        mov     r10d,[addr]
                        call    X86_X64_call_gate
            \\}
            irp reg,eax,ebx,ecx,edx,esi,edi,ebp,esp \\{
                else if target eq near reg
                        mov     r10d,reg
                        call    X86_X64_call_gate
                else if target eq reg
                        mov     r10d,reg
                        call    X86_X64_call_gate
            \\}
                else
                        mov     r10d,target
                        call    X86_X64_call_gate
                end if
            ..return:
        \}
        macro ret \{
                pop     r11d
                jmp     r11
        \}
        macro retn \{ret\}
        macro das \{
                \local  ..adjust_by_6,..clear_AF,..low_nibble_done,..adjust_by_60,..high_nibble_okay
                pushfw
                mov     r11b,al
                shl     r11,8
                mov     r11b,al
                and     r11b,0xf
                cmp     r11b,9
                ja      ..adjust_by_6
                test    byte[esp],X86_X64_AF
                jz      ..clear_AF
            ..adjust_by_6:
                sub     al,6
                setc    r11b
                or      r11b,X86_X64_AF
                assert  X86_X64_CF = 0x01
                or      byte[esp],r11b
                JMP     ..low_nibble_done
            ..clear_AF:
                and     byte[esp],not X86_X64_AF
            ..low_nibble_done:
                cmp     r11w,0x99ff
                ja      ..adjust_by_60
                test    byte[esp],X86_X64_CF
                jz      ..high_nibble_okay
            ..adjust_by_60:
                sub     al,0x60
                or      byte[esp],X86_X64_CF
            ..high_nibble_okay:
                popfw
        \}
        macro loop target \{
                \local  ..dont_go
                if $ + 2 - target > 128 | $ + 2 - target < -127
                        pushfw
                        dec     ecx
                        jnz     ..dont_go
                        popfw
                        jmp     target
                    ..dont_go:
                        popfw
                else
                        loop    target
                end if
        \}
        macro jcxz target \{
                \local  ..dont_go
                pushfw
                test    cx,cx
                jnz     ..dont_go
                popfw
                jmp     target
            ..dont_go:
                popfw
        \}
        macro salc \{
                pushfw
                setc    al
                neg     al
                popfw
        \}
        USE64
}
macro use16 {
        purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc
        use16
}
macro use64 {
        purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc
        use64
}

X86_X64_CF                      = 0x01
X86_X64_AF                      = 0x10
X86_X64_STACK_TARGET            = 1 shl 32
X86_X64_STACK_SIZE              = 1 shl 22
X86_X64_PAGE_SIZE               = 1 shl 12
X86_X64_ENOMEM                  = 12
X86_X64_AT_SYSINFO_EHDR         = 33
X86_X64_MADV_NORMAL             = 0
X86_X64_PROT_READ               = 0x1
X86_X64_PROT_WRITE              = 0x2
X86_X64_MAP_PRIVATE             = 0x2
X86_X64_MAP_FIXED               = 0x10
X86_X64_MAP_ANONYMOUS           = 0x20
X86_X64_MAP_GROWSDOWN           = 0x100
X86_X64_MAP_FIXED_NOREPLACE     = 0x100000

X86_X64_SYS32_EXIT              = 1
X86_X64_SYS32_READ              = 3
X86_X64_SYS32_WRITE             = 4
X86_X64_SYS32_OPEN              = 5
X86_X64_SYS32_CLOSE             = 6
X86_X64_SYS32_TIME              = 13
X86_X64_SYS32_LSEEK             = 19
X86_X64_SYS32_GETTIMEOFDAY      = 78
X86_X64_SYS32_MMAP              = 90
X86_X64_SYS32_BRK               = 45
X86_X64_SYS32_MADVISE           = 219

X86_X64_SYS64_READ              = 0
X86_X64_SYS64_WRITE             = 1
X86_X64_SYS64_OPEN              = 2
X86_X64_SYS64_CLOSE             = 3
X86_X64_SYS64_LSEEK             = 8
X86_X64_SYS64_MMAP              = 9
X86_X64_SYS64_BRK               = 12
X86_X64_SYS64_MADVISE           = 28
X86_X64_SYS64_EXIT              = 60
X86_X64_SYS64_GETTIMEOFDAY      = 96
X86_X64_SYS64_TIME              = 201

X86_X64_syscall_translation_table rw 376

macro SYS_TRANSLATE [func] {forward store word X86_X64_SYS64_#func at X86_X64_SYS32_#func * 2 + X86_X64_syscall_translation_table}
        SYS_TRANSLATE EXIT,READ,WRITE,OPEN,CLOSE,TIME,LSEEK,GETTIMEOFDAY,MMAP,BRK,MADVISE
purge SYS_TRANSLATE

X86_X64_begin:
        xor     r9,r9                   ;offset
        or      r8,-1                   ;fd
        mov     r10,X86_X64_MAP_PRIVATE or X86_X64_MAP_ANONYMOUS or X86_X64_MAP_FIXED or X86_X64_MAP_GROWSDOWN or X86_X64_MAP_FIXED_NOREPLACE
        mov     edx,X86_X64_PROT_READ or X86_X64_PROT_WRITE
        mov     esi,X86_X64_STACK_SIZE
        mov     edi,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE
        mov     eax,X86_X64_SYS64_MMAP
        syscall
        cmp     eax,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE
        jnz     .failed

        mov     rdi,rsp
        and     rdi,-X86_X64_PAGE_SIZE
    .loop_find_top_of_stack:
        add     rdi,X86_X64_PAGE_SIZE
        mov     edx,X86_X64_MADV_NORMAL
        mov     esi,1
        mov     eax,X86_X64_SYS64_MADVISE
        syscall
        cmp     rax,-X86_X64_ENOMEM
        jnz     .loop_find_top_of_stack

        ;copy the stack into low memory
        mov     rcx,rdi
        lea     rsi,[rdi - 8]
        mov     edi,X86_X64_STACK_TARGET - 8
        sub     rcx,rsp
        shr     rcx,3
        std
        rep     movsq
        cld

        sub     rsi,rdi         ;rsi = conversion offset
        neg     rsi
        add     edi,8
        mov     esp,edi
        mov     edx,edi
        mov     ebx,2
    .convert_argv_env:
        mov     rax,[edi]
        cmp     rax,rsp
        lea     rcx,[rax + rsi]
        cmovae  rax,rcx
        mov     [edx],eax
        add     edi,8
        add     edx,4
        test    eax,eax
        jnz     .convert_argv_env
        dec     ebx
        jnz     .convert_argv_env

    .convert_auxv:
        ;note that the AT_SYSINFO_EHDR value won't be valid, so it gets removed
        mov     rax,[rdi]
        mov     rbx,[rdi + 8]
        cmp     rbx,rsp
        lea     rcx,[rbx + rsi]
        cmovae  rbx,rcx
        mov     [edx],eax
        mov     [edx + 4],ebx
        add     edi,16
        lea     ecx,[edx + 8]
        cmp     eax,X86_X64_AT_SYSINFO_EHDR
        cmovnz  edx,ecx
        test    eax,eax
        jnz     .convert_auxv
        jmp     X86_X64_ENTRY
    .failed:
        or      rdi,-1
        mov     eax,X86_X64_SYS64_EXIT
        syscall

use32

X86_X64_int_gate:
        mov     r12,rdi
        mov     r13,rsi
        mov     r14,rcx
        mov     r9,rbp
        mov     r8,rdi
        mov     r10,rsi
        mov     edi,ebx
        mov     esi,ecx
        movzx   eax,word[eax * 2 + X86_X64_syscall_translation_table]
        syscall
        mov     rcx,r14
        mov     rsi,r13
        mov     rdi,r12
        RET

X86_X64_call_gate:
        POP     r11
        push    r11d
        jmp     r10

match f,TARGET {include f}    
Note that the int 0x80 gate needs more logic if the code uses kernel calls that return 64-bit data structures or pointers. Most likely would be mmap which would need to be caught in the gate function to force allocation to under 4GB. For fasm it only uses brk so we don't have that problem here.

Also we can't link to any external libraries without a lot more work. Quite probably many libraries simply can't be used without some extensive 32-bit/64-bit interfacing logic. Well beyond the scope of this code here.


Last edited by revolution on 14 Feb 2019, 05:59; edited 1 time in total
Post 13 Feb 2019, 11:12
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 13 Feb 2019, 17:46
I'm starting to feel that perhaps you should take over further development of fasm 1, it seems you have much more enthusiasm for tuning this old engine than me. Wink Working with fasmg is so much more satisfying to me that it is often hard to get back to do something in fasm 1.

What you did here seems like a natural extension of my ideas, starting with complete emulation of instructions like JCXZ and SALC (which I annotated as not requiring to preserve flags in fasm's core, as fasm does not depend on this feature anywhere) and so on.

On the other hand, what fascinates me even more is going in a very different direction - making a code that could have exactly the same binary form for 32-bit and 64-bit mode and do its job correctly in both cases. I talked about this on my stream and I demonstrated that it can applied at least partially to fasmg sources, when I modified instruction encoder so that it would not generate 67h prefixes for addresses. This works correctly only when code follows some rules - it cannot use negative offsets stored in registers (as they would require to be sign-extended when used in 64-bit addressing, but the "clearing upper part of register" rule only provides zero-extension). But the source of fasmg is written in a very specific style that allows for such tricks to be applied (it is very important, however, to have any such assumptions clearly documented in source, so that one can avoid breaking the rules when modifying the code).

But your, one could say "opposite", approach provides much more safety and freedom of coding style.
Post 13 Feb 2019, 17:46
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 13 Feb 2019, 19:07
Tomasz Grysztar wrote:
I'm starting to feel that perhaps you should take over further development of fasm 1, it seems you have much more enthusiasm for tuning this old engine than me. Wink Working with fasmg is so much more satisfying to me that it is often hard to get back to do something in fasm 1.
I haven't made any attempt to make my code additions fit into the fasm style. It would look really odd with the different styles. Confused
Tomasz Grysztar wrote:
What you did here seems like a natural extension of my ideas, starting with complete emulation of instructions like JCXZ and SALC (which I annotated as not requiring to preserve flags in fasm's core, as fasm does not depend on this feature anywhere) and so on.
I wanted something more generic. Not just for fasm. So I made sure the stack contents after a call were accurate. And the flags treatment also even for the rarely used das etc..
Tomasz Grysztar wrote:
On the other hand, what fascinates me even more is going in a very different direction - making a code that could have exactly the same binary form for 32-bit and 64-bit mode and do its job correctly in both cases. I talked about this on my stream and I demonstrated that it can applied at least partially to fasmg sources, when I modified instruction encoder so that it would not generate 67h prefixes for addresses. This works correctly only when code follows some rules - it cannot use negative offsets stored in registers (as they would require to be sign-extended when used in 64-bit addressing, but the "clearing upper part of register" rule only provides zero-extension). But the source of fasmg is written in a very specific style that allows for such tricks to be applied (it is very important, however, to have any such assumptions clearly documented in source, so that one can avoid breaking the rules when modifying the code).

But your, one could say "opposite", approach provides much more safety and freedom of coding style.
The negative offset thing is a good point. I missed that in my analysis. I suppose also a large positive offset is the same thing. And if the index register is scaled to shift out bits holding other flags or something. But removing the 0x67 prefix with the wrapper code would require lots of macros to identify all the memory accesses. Another radical option would be to coax the kernel into making shadow mappings for all the addresses up to 32GB. Just keep repeating the physical address every 4GB step. But x86 uses virtual address caching so perhaps this idea wouldn't work so well. The cache subsystem could have multiple values for the same physical address.


Last edited by revolution on 14 Feb 2019, 11:55; edited 1 time in total
Post 13 Feb 2019, 19:07
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 13 Feb 2019, 19:27
Another problem with this approach is if the target application is interested to know its EIP value and uses this
Code:
call $ + 5
pop eax    
The assumption is broken that the next instruction will be 5 bytes ahead.
Post 13 Feb 2019, 19:27
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8367
Location: Kraków, Poland
Tomasz Grysztar 13 Feb 2019, 19:41
revolution wrote:
Another problem with this approach is if the target application is interested to know its EIP value and uses this
Code:
call $ + 5
pop eax    
The assumption is broken that the next instruction will be 5 bytes ahead.
I personally always very disliked this kind of coding and never use such idioms myself. My approach to assembly language has always been focused on treating it as an abstraction over machine code that concerns itself with what instruction should do and not how it is encoded, leaving the encoding (with possible optimization) choices to the assembler. With this approach in mind:
Code:
call $ + 5    
is not a good way to express the same idea as:
Code:
call @f
@@:    
These two snippets differ semantically and only the second one really conveys what someone using the first one perhaps thought it should mean.
Post 13 Feb 2019, 19:41
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 14 Feb 2019, 06:15
Code1:
Code:
call $ + 5
pop eax    
Code2:
Code:
call @f
@@:
pop eax    
For code 1 above it will fail with both of the x64 approaches given in this thread.

For code 2 it will only succeed for the second approach where the native version of call is not used.

I remember seeing 8086 code from the era when delays between port accesses was needed.
Code:
jmp $+2    
Some of the old assemblers of that era would allocate three bytes for the jmp instruction on the first pass, and then replace it with a two byte jmp and one byte of nop on the second pass. This could also be seen on many .com files where the first two instructions were often a two byte jmp and a nop, followed by the global data bytes.
Post 14 Feb 2019, 06:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 15 Feb 2019, 15:32
revolution wrote:
The next step is to move formats.inc up one line ... But with that include ordering we will break fasm because some of the labels are entry points from the instruction tables. It turns out there are only nine labels that are required to be in the correct place.
Actually this is not correct. There is one other place that needs adjustment. The "end data" statement (used for manual PE relocations placement) fails because of this line
Code:
        mov     word [ebx],data_directive-instruction_handler    
The value placed into the structures buffer is the wrong offset. Sad So we can't use the code as-is if the data/end data block is used.
Post 15 Feb 2019, 15:32
View user's profile Send private message Visit poster's website Reply with quote
petelomax



Joined: 19 Jan 2013
Posts: 11
Location: London
petelomax 19 Feb 2020, 14:11
I know this is a bit late, but when I saw this
Code:
lea rsp,[rsp-4]
mov [rsp],eax    

my immediate though was wouldn't doing it this way round
Code:
mov [rsp-4],eax
lea rsp,[rsp-4]    

save an agi stall, at the cost of 1 byte? So I ran a quick test, and to my surprise they appear to run at the same speed... do modern processors not suffer agi stalls anymore?
Post 19 Feb 2020, 14:11
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 19 Feb 2020, 15:33
Would this effect outweigh effects caused by caching, context switching, paging? Did you take into account stuff like speculative/out-of-order execution? Reorder buffers are there for a reason.
Post 19 Feb 2020, 15:33
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 19 Feb 2020, 17:14
petelomax wrote:
So I ran a quick test ...
Thank you for testing and not simply assuming an outcome. Smile

For something as short running as fasm any such micro-optimisations will be completely swamped by the process start-up and IO overheads that you won't be able to measure such tiny differences in execution speed. So don't worry about it unless it takes many minutes to run and you need it to be 0.1 seconds faster. Wink
Post 19 Feb 2020, 17:14
View user's profile Send private message Visit poster's website Reply with quote
petelomax



Joined: 19 Jan 2013
Posts: 11
Location: London
petelomax 21 Feb 2020, 20:15
DUH: on coming back to this I realise that an agi stall only really matters on read, when it has to stop the whole world and wait, whereas it can just carry on and leave a write to finish in it's own sweet time.
Post 21 Feb 2020, 20:15
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20526
Location: In your JS exploiting you and your system
revolution 22 Feb 2020, 02:26
petelomax wrote:
DUH: on coming back to this I realise that an agi stall only really matters on read, when it has to stop the whole world and wait, whereas it can just carry on and leave a write to finish in it's own sweet time.
Yeah, there are many things that contemporary CPUs do internally. But each CPU is different so you will likely get different results with each system.
Post 22 Feb 2020, 02:26
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.