flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Author |
|
Furs 09 Jul 2017, 14:17
system error wrote: how about that nasty Y2038 issue? ![]() I wonder how many people back when this was widespread (don't tell me it still is?) actually set their computer clock to 2038 and see how a black hole is formed? I was pretty disappointed myself as nothing terrible happened, computer didn't even freeze. Though you should post that this version of Fasm should only really be used if your kernel can't run 32-bit executables (on very minimalist installs maybe), since it's quite bloated/slower with all the addressing mode prefixes and the "lea + mov" combination instead of push. |
|||
![]() |
|
Tomasz Grysztar 09 Jul 2017, 16:12
Furs wrote: Though you should post that this version of Fasm should only really be used if your kernel can't run 32-bit executables (on very minimalist installs maybe), since it's quite bloated/slower with all the addressing mode prefixes and the "lea + mov" combination instead of push. |
|||
![]() |
|
YONG 10 Jul 2017, 02:11
Tomasz Grysztar wrote:
![]() ![]() |
|||
![]() |
|
keantoken 01 Aug 2017, 07:29
I read that IA64 CPUs and some old VIA CPUs aren't backwards compatible with 32-bit code.
|
|||
![]() |
|
revolution 01 Aug 2017, 10:00
IA64 is the Itanium instruction set. It is not even close to IA32. Although the CPUs implementing IA64 do have a compatibility mode to run IA32 code (albeit slowly) within the IA64 OS.
|
|||
![]() |
|
revolution 13 Feb 2019, 01:42
I think the x64 code can be enhanced. The goal to have it compile an existing 32-bit application for x64 mode with absolutely ZERO changes to the original source. Sounds impossible at first reading, right?
This goal is achieveable, somewhat, but not at a 100% success rate. For applications that use the kernel in a simple manner (like fasm) this goal should actually be quite easy. But as you will see later it is only "easy" if the application itself doesn't have an internal limitation. So the plan is to use the "-d" feature on the command line to set things up in a manner suitable to compile the application as-is. So that means absolutely NO changes at all the the original source code. And if possible without any assumptions about what instructions the target application uses. An important point is to take care to ensure all the flags are correctly handled and not corrupted. To do this fasm is invoked like this: Code: fasm <prefix_file> -dTARGET="'awesome_app.asm'" awesome_app The prefix_file does all the setup and creates an environment with everything fitting into the 32-bit address space. It looks like this below. It is incomplete though. It is mostly a proof of concept. But it can be expanded to support more options if needed. Code: format elf64 executable 0 at 1 shl 16 entry X86_X64_begin macro format ignore {} macro entry address {X86_X64_ENTRY = address} segment executable readable macro int value { if value = 0x80 mov r12,rdi mov r13,rsi mov r14,rcx mov r9,rbp mov r8,rdi mov r10,rsi mov edi,ebx mov esi,ecx movzx eax,word[eax * 2 + X86_X64_syscall_translation_table] syscall mov rcx,r14 mov rsi,r13 mov rdi,r12 else int value end if } macro X86_X64_pushD [arg] { common local offset,total offset = 0 lea esp,[esp-total] forward offset = offset + 4 if arg eqtype eax mov dword [esp+total-offset],arg else mov r11d,dword arg mov [esp+total-offset],r11d end if common total = offset } macro X86_X64_popD [arg] { common local offset offset = 0 forward if arg eqtype [mem] mov r11d,[esp+offset] mov dword arg,r11d else mov arg,dword [esp+offset] end if offset = offset + 4 common lea esp,[esp+offset] } macro use32 { macro push args \{ local list,arg,status define list define arg irps sym, args \\{ define status match =dword, sym \\\{ define status : \\\} match [any, status arg sym \\\{ define arg [any match [mem], arg \\\\{ match previous, list \\\\\{ define list previous,[mem] \\\\\} match , list \\\\\{ define list [mem] \\\\\} define arg \\\\} define status : \\\} match [, status arg sym \\\{ define arg [ define status : \\\} match , status \\\{ match previous, list \\\\{ define list previous,sym \\\\} match , list \\\\{ define list sym \\\\} \\\} \\} match arg, list \\{ X86_X64_pushD arg \\} \} macro pop args \{ local list,arg,status define list define arg irps sym, args \\{ define status match =dword, sym \\\{ define status : \\\} match [any, status arg sym \\\{ define arg [any match [mem], arg \\\\{ match previous, list \\\\\{ define list previous,[mem] \\\\\} match , list \\\\\{ define list [mem] \\\\\} define arg \\\\} define status : \\\} match [, status arg sym \\\{ define arg [ define status : \\\} match , status \\\{ match previous, list \\\\{ define list previous,sym \\\\} match , list \\\\{ define list sym \\\\} \\\} \\} match arg, list \\{ X86_X64_popD arg \\} \} macro pushfd \{ pushfq POP r11 push r11d \} macro popfd \{ pop r11d PUSH r11 popfq \} macro X86_X64_do_target target,instr \{ if target eqtype [0] mov r10d,target instr JMP r10 else if target eqtype dword[0] mov r10d,target instr JMP r10 match =near =dword[addr],target \\{ else if 1 mov r10d,[addr] instr JMP r10 \\} irp reg,ax,bx,cx,dx,si,di,bp,sp \\{ else if target eq near e\\#reg instr JMP r\\#reg else if target eq e\\#reg instr JMP r\\#reg \\} else instr JMP target end if \} macro jmp target \{ X86_X64_do_target target \} macro call target \{ \local ..return X86_X64_do_target target,push ..return ..return: \} macro ret \{ pop r11d jmp r11 \} macro retn \{ret\} macro das \{ \local ..adjust_by_6,..clear_AF,..low_nibble_done,..adjust_by_60,..high_nibble_okay pushfw mov r11b,al shl r11,8 mov r11b,al and r11b,0xf cmp r11b,9 ja ..adjust_by_6 test byte[esp],X86_X64_AF jz ..clear_AF ..adjust_by_6: sub al,6 setc r11b or r11b,X86_X64_AF assert X86_X64_CF = 0x01 or byte[esp],r11b JMP ..low_nibble_done ..clear_AF: and byte[esp],not X86_X64_AF ..low_nibble_done: cmp r11w,0x99ff ja ..adjust_by_60 test byte[esp],X86_X64_CF jz ..high_nibble_okay ..adjust_by_60: sub al,0x60 or byte[esp],X86_X64_CF ..high_nibble_okay: popfw \} macro loop target \{ \local ..dont_go if $ + 2 - target > 128 | $ + 2 - target < -127 pushfw dec ecx jnz ..dont_go popfw jmp target ..dont_go: popfw else loop target end if \} macro jcxz target \{ \local ..dont_go pushfw test cx,cx jnz ..dont_go popfw jmp target ..dont_go: popfw \} macro salc \{ pushfw setc al neg al popfw \} USE64 } macro use16 { purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc use16 } macro use64 { purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc use64 } X86_X64_CF = 0x01 X86_X64_AF = 0x10 X86_X64_STACK_TARGET = 1 shl 32 X86_X64_STACK_SIZE = 1 shl 22 X86_X64_PAGE_SIZE = 1 shl 12 X86_X64_ENOMEM = 12 X86_X64_AT_SYSINFO_EHDR = 33 X86_X64_MADV_NORMAL = 0 X86_X64_PROT_READ = 0x1 X86_X64_PROT_WRITE = 0x2 X86_X64_MAP_PRIVATE = 0x2 X86_X64_MAP_FIXED = 0x10 X86_X64_MAP_ANONYMOUS = 0x20 X86_X64_MAP_GROWSDOWN = 0x100 X86_X64_MAP_FIXED_NOREPLACE = 0x100000 X86_X64_SYS32_EXIT = 1 X86_X64_SYS32_READ = 3 X86_X64_SYS32_WRITE = 4 X86_X64_SYS32_OPEN = 5 X86_X64_SYS32_CLOSE = 6 X86_X64_SYS32_TIME = 13 X86_X64_SYS32_LSEEK = 19 X86_X64_SYS32_MMAP = 90 X86_X64_SYS32_BRK = 45 X86_X64_SYS32_MADVISE = 219 X86_X64_SYS64_READ = 0 X86_X64_SYS64_WRITE = 1 X86_X64_SYS64_OPEN = 2 X86_X64_SYS64_CLOSE = 3 X86_X64_SYS64_LSEEK = 8 X86_X64_SYS64_MMAP = 9 X86_X64_SYS64_BRK = 12 X86_X64_SYS64_MADVISE = 28 X86_X64_SYS64_EXIT = 60 X86_X64_SYS64_TIME = 201 X86_X64_syscall_translation_table rw 376 macro SYS_TRANSLATE [func] {forward store word X86_X64_SYS64_#func at X86_X64_SYS32_#func * 2 + X86_X64_syscall_translation_table} SYS_TRANSLATE EXIT,READ,WRITE,OPEN,CLOSE,TIME,LSEEK,MMAP,BRK,MADVISE purge SYS_TRANSLATE X86_X64_begin: xor r9,r9 ;offset or r8,-1 ;fd mov r10,X86_X64_MAP_PRIVATE or X86_X64_MAP_ANONYMOUS or X86_X64_MAP_FIXED or X86_X64_MAP_GROWSDOWN or X86_X64_MAP_FIXED_NOREPLACE mov edx,X86_X64_PROT_READ or X86_X64_PROT_WRITE mov esi,X86_X64_STACK_SIZE mov edi,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE mov eax,X86_X64_SYS64_MMAP syscall cmp eax,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE jnz .failed mov rdi,rsp and rdi,-X86_X64_PAGE_SIZE .loop_find_top_of_stack: add rdi,X86_X64_PAGE_SIZE mov edx,X86_X64_MADV_NORMAL mov esi,1 mov eax,X86_X64_SYS64_MADVISE syscall cmp rax,-X86_X64_ENOMEM jnz .loop_find_top_of_stack ;copy the stack into low memory mov rcx,rdi lea rsi,[rdi - 8] mov edi,X86_X64_STACK_TARGET - 8 sub rcx,rsp shr rcx,3 std rep movsq cld sub rsi,rdi ;rsi = conversion offset neg rsi add edi,8 mov esp,edi mov edx,edi mov ebx,2 .convert_argv_env: mov rax,[edi] cmp rax,rsp lea rcx,[rax + rsi] cmovae rax,rcx mov [edx],eax add edi,8 add edx,4 test eax,eax jnz .convert_argv_env dec ebx jnz .convert_argv_env .convert_auxv: ;note that the AT_SYSINFO_EHDR value won't be valid, so it gets removed mov rax,[rdi] mov rbx,[rdi + 8] cmp rbx,rsp lea rcx,[rbx + rsi] cmovae rbx,rcx mov [edx],eax mov [edx + 4],ebx add edi,16 lea ecx,[edx + 8] cmp eax,X86_X64_AT_SYSINFO_EHDR cmovnz edx,ecx test eax,eax jnz .convert_auxv jmp X86_X64_ENTRY .failed: or rdi,-1 mov eax,X86_X64_SYS64_EXIT syscall use32 match f,TARGET {include f} And it works. However like I mentioned, it doesn't work for everything. Notably it fails for fasm because of a problem with the increase in size. Specifically this error occurs Code: flat assembler version 1.73.08 (4014015 kilobytes memory) ..\TABLES.INC [672]: dw adx_instruction-instruction_handler processed: dw adx_instruction-instruction_handler error: value out of range. ![]() Last edited by revolution on 14 Feb 2019, 05:58; edited 1 time in total |
|||
![]() |
|
revolution 13 Feb 2019, 11:12
It is possible to assemble fasm using this method. It just needs to be made a bit more memory efficient to overcome the 64kB limitation imposed by the structure of the TABLES.INC file.
So the first step is to make a common call gate which saves us a few bytes each time it is used. Code: macro call target \{ \local ..return if target eqtype [0] mov r10d,target call X86_X64_call_gate else if target eqtype dword[0] mov r10d,target call X86_X64_call_gate match =near =dword[addr],target \\{ else if 1 mov r10d,[addr] call X86_X64_call_gate \\} irp reg,eax,ebx,ecx,edx,esi,edi,ebp,esp \\{ else if target eq near reg mov r10d,reg call X86_X64_call_gate else if target eq reg mov r10d,reg call X86_X64_call_gate \\} else mov r10d,target call X86_X64_call_gate end if ..return: \} ;... X86_X64_call_gate: POP r11 push r11d jmp r10 Code: macro int value { if value = 0x80 CALL X86_X64_int_gate else int value end if } ;... X86_X64_int_gate: mov r12,rdi mov r13,rsi mov r14,rcx mov r9,rbp mov r8,rdi mov r10,rsi mov edi,ebx mov esi,ecx movzx eax,word[eax * 2 + X86_X64_syscall_translation_table] syscall mov rcx,r14 mov rsi,r13 mov rdi,r12 RET The next step is to move the line "include '..\exprcalc.inc'" up one place. Now it looks like this Code: ;... include '..\exprpars.inc' include '..\exprcalc.inc' include '..\assemble.inc' include '..\formats.inc' ;... The next step is to move formats.inc up one line. Now it looks like this Code: ;... include '..\exprpars.inc' include '..\exprcalc.inc' include '..\formats.inc' include '..\assemble.inc' include '..\x86_64.inc' ;... Code: ;... include '..\exprpars.inc' include '..\exprcalc.inc' data_directive equ moved_data_directive heap_directive equ moved_heap_directive entry_directive equ moved_entry_directive extrn_directive equ moved_extrn_directive stack_directive equ moved_stack_directive format_directive equ moved_format_directive public_directive equ moved_public_directive section_directive equ moved_section_directive segment_directive equ moved_segment_directive include '..\formats.inc' restore data_directive restore heap_directive restore entry_directive restore extrn_directive restore stack_directive restore format_directive restore public_directive restore section_directive restore segment_directive include '..\assemble.inc' data_directive: jmp moved_data_directive heap_directive: jmp moved_heap_directive entry_directive: jmp moved_entry_directive extrn_directive: jmp moved_extrn_directive stack_directive: jmp moved_stack_directive format_directive: jmp moved_format_directive public_directive: jmp moved_public_directive section_directive: jmp moved_section_directive segment_directive: jmp moved_segment_directive include '..\x86_64.inc' ;... Code: fasm <prefix_file> -dTARGET="'fasm.asm'" fasm And the prefix_file is this Code: format elf64 executable 0 at 1 shl 16 entry X86_X64_begin macro format ignore {} macro entry address {X86_X64_ENTRY = address} segment executable readable macro int value { if value = 0x80 CALL X86_X64_int_gate else int value end if } macro X86_X64_pushD [arg] { common local offset,total offset = 0 lea esp,[esp-total] forward offset = offset + 4 if arg eqtype eax mov dword [esp+total-offset],arg else mov r11d,dword arg mov [esp+total-offset],r11d end if common total = offset } macro X86_X64_popD [arg] { common local offset offset = 0 forward if arg eqtype [mem] mov r11d,[esp+offset] mov dword arg,r11d else mov arg,dword [esp+offset] end if offset = offset + 4 common lea esp,[esp+offset] } macro use32 { macro push args \{ local list,arg,status define list define arg irps sym, args \\{ define status match =dword, sym \\\{ define status : \\\} match [any, status arg sym \\\{ define arg [any match [mem], arg \\\\{ match previous, list \\\\\{ define list previous,[mem] \\\\\} match , list \\\\\{ define list [mem] \\\\\} define arg \\\\} define status : \\\} match [, status arg sym \\\{ define arg [ define status : \\\} match , status \\\{ match previous, list \\\\{ define list previous,sym \\\\} match , list \\\\{ define list sym \\\\} \\\} \\} match arg, list \\{ X86_X64_pushD arg \\} \} macro pop args \{ local list,arg,status define list define arg irps sym, args \\{ define status match =dword, sym \\\{ define status : \\\} match [any, status arg sym \\\{ define arg [any match [mem], arg \\\\{ match previous, list \\\\\{ define list previous,[mem] \\\\\} match , list \\\\\{ define list [mem] \\\\\} define arg \\\\} define status : \\\} match [, status arg sym \\\{ define arg [ define status : \\\} match , status \\\{ match previous, list \\\\{ define list previous,sym \\\\} match , list \\\\{ define list sym \\\\} \\\} \\} match arg, list \\{ X86_X64_popD arg \\} \} macro pushfd \{ pushfq POP r11 push r11d \} macro popfd \{ pop r11d PUSH r11 popfq \} macro jmp target \{ if target eqtype [0] mov r10d,target JMP r10 else if target eqtype dword[0] mov r10d,target JMP r10 match =near =dword[addr],target \\{ else if 1 mov r10d,[addr] JMP r10 \\} irp reg,ax,bx,cx,dx,si,di,bp,sp \\{ else if target eq near e\\#reg JMP r\\#reg else if target eq e\\#reg JMP r\\#reg \\} else JMP target end if \} macro call target \{ \local ..return if target eqtype [0] mov r10d,target call X86_X64_call_gate else if target eqtype dword[0] mov r10d,target call X86_X64_call_gate match =near =dword[addr],target \\{ else if 1 mov r10d,[addr] call X86_X64_call_gate \\} irp reg,eax,ebx,ecx,edx,esi,edi,ebp,esp \\{ else if target eq near reg mov r10d,reg call X86_X64_call_gate else if target eq reg mov r10d,reg call X86_X64_call_gate \\} else mov r10d,target call X86_X64_call_gate end if ..return: \} macro ret \{ pop r11d jmp r11 \} macro retn \{ret\} macro das \{ \local ..adjust_by_6,..clear_AF,..low_nibble_done,..adjust_by_60,..high_nibble_okay pushfw mov r11b,al shl r11,8 mov r11b,al and r11b,0xf cmp r11b,9 ja ..adjust_by_6 test byte[esp],X86_X64_AF jz ..clear_AF ..adjust_by_6: sub al,6 setc r11b or r11b,X86_X64_AF assert X86_X64_CF = 0x01 or byte[esp],r11b JMP ..low_nibble_done ..clear_AF: and byte[esp],not X86_X64_AF ..low_nibble_done: cmp r11w,0x99ff ja ..adjust_by_60 test byte[esp],X86_X64_CF jz ..high_nibble_okay ..adjust_by_60: sub al,0x60 or byte[esp],X86_X64_CF ..high_nibble_okay: popfw \} macro loop target \{ \local ..dont_go if $ + 2 - target > 128 | $ + 2 - target < -127 pushfw dec ecx jnz ..dont_go popfw jmp target ..dont_go: popfw else loop target end if \} macro jcxz target \{ \local ..dont_go pushfw test cx,cx jnz ..dont_go popfw jmp target ..dont_go: popfw \} macro salc \{ pushfw setc al neg al popfw \} USE64 } macro use16 { purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc use16 } macro use64 { purge push,pop,pushfd,popfd,jmp,call,ret,retn,das,loop,jcxz,salc use64 } X86_X64_CF = 0x01 X86_X64_AF = 0x10 X86_X64_STACK_TARGET = 1 shl 32 X86_X64_STACK_SIZE = 1 shl 22 X86_X64_PAGE_SIZE = 1 shl 12 X86_X64_ENOMEM = 12 X86_X64_AT_SYSINFO_EHDR = 33 X86_X64_MADV_NORMAL = 0 X86_X64_PROT_READ = 0x1 X86_X64_PROT_WRITE = 0x2 X86_X64_MAP_PRIVATE = 0x2 X86_X64_MAP_FIXED = 0x10 X86_X64_MAP_ANONYMOUS = 0x20 X86_X64_MAP_GROWSDOWN = 0x100 X86_X64_MAP_FIXED_NOREPLACE = 0x100000 X86_X64_SYS32_EXIT = 1 X86_X64_SYS32_READ = 3 X86_X64_SYS32_WRITE = 4 X86_X64_SYS32_OPEN = 5 X86_X64_SYS32_CLOSE = 6 X86_X64_SYS32_TIME = 13 X86_X64_SYS32_LSEEK = 19 X86_X64_SYS32_GETTIMEOFDAY = 78 X86_X64_SYS32_MMAP = 90 X86_X64_SYS32_BRK = 45 X86_X64_SYS32_MADVISE = 219 X86_X64_SYS64_READ = 0 X86_X64_SYS64_WRITE = 1 X86_X64_SYS64_OPEN = 2 X86_X64_SYS64_CLOSE = 3 X86_X64_SYS64_LSEEK = 8 X86_X64_SYS64_MMAP = 9 X86_X64_SYS64_BRK = 12 X86_X64_SYS64_MADVISE = 28 X86_X64_SYS64_EXIT = 60 X86_X64_SYS64_GETTIMEOFDAY = 96 X86_X64_SYS64_TIME = 201 X86_X64_syscall_translation_table rw 376 macro SYS_TRANSLATE [func] {forward store word X86_X64_SYS64_#func at X86_X64_SYS32_#func * 2 + X86_X64_syscall_translation_table} SYS_TRANSLATE EXIT,READ,WRITE,OPEN,CLOSE,TIME,LSEEK,GETTIMEOFDAY,MMAP,BRK,MADVISE purge SYS_TRANSLATE X86_X64_begin: xor r9,r9 ;offset or r8,-1 ;fd mov r10,X86_X64_MAP_PRIVATE or X86_X64_MAP_ANONYMOUS or X86_X64_MAP_FIXED or X86_X64_MAP_GROWSDOWN or X86_X64_MAP_FIXED_NOREPLACE mov edx,X86_X64_PROT_READ or X86_X64_PROT_WRITE mov esi,X86_X64_STACK_SIZE mov edi,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE mov eax,X86_X64_SYS64_MMAP syscall cmp eax,X86_X64_STACK_TARGET - X86_X64_STACK_SIZE jnz .failed mov rdi,rsp and rdi,-X86_X64_PAGE_SIZE .loop_find_top_of_stack: add rdi,X86_X64_PAGE_SIZE mov edx,X86_X64_MADV_NORMAL mov esi,1 mov eax,X86_X64_SYS64_MADVISE syscall cmp rax,-X86_X64_ENOMEM jnz .loop_find_top_of_stack ;copy the stack into low memory mov rcx,rdi lea rsi,[rdi - 8] mov edi,X86_X64_STACK_TARGET - 8 sub rcx,rsp shr rcx,3 std rep movsq cld sub rsi,rdi ;rsi = conversion offset neg rsi add edi,8 mov esp,edi mov edx,edi mov ebx,2 .convert_argv_env: mov rax,[edi] cmp rax,rsp lea rcx,[rax + rsi] cmovae rax,rcx mov [edx],eax add edi,8 add edx,4 test eax,eax jnz .convert_argv_env dec ebx jnz .convert_argv_env .convert_auxv: ;note that the AT_SYSINFO_EHDR value won't be valid, so it gets removed mov rax,[rdi] mov rbx,[rdi + 8] cmp rbx,rsp lea rcx,[rbx + rsi] cmovae rbx,rcx mov [edx],eax mov [edx + 4],ebx add edi,16 lea ecx,[edx + 8] cmp eax,X86_X64_AT_SYSINFO_EHDR cmovnz edx,ecx test eax,eax jnz .convert_auxv jmp X86_X64_ENTRY .failed: or rdi,-1 mov eax,X86_X64_SYS64_EXIT syscall use32 X86_X64_int_gate: mov r12,rdi mov r13,rsi mov r14,rcx mov r9,rbp mov r8,rdi mov r10,rsi mov edi,ebx mov esi,ecx movzx eax,word[eax * 2 + X86_X64_syscall_translation_table] syscall mov rcx,r14 mov rsi,r13 mov rdi,r12 RET X86_X64_call_gate: POP r11 push r11d jmp r10 match f,TARGET {include f} Also we can't link to any external libraries without a lot more work. Quite probably many libraries simply can't be used without some extensive 32-bit/64-bit interfacing logic. Well beyond the scope of this code here. Last edited by revolution on 14 Feb 2019, 05:59; edited 1 time in total |
|||
![]() |
|
Tomasz Grysztar 13 Feb 2019, 17:46
I'm starting to feel that perhaps you should take over further development of fasm 1, it seems you have much more enthusiasm for tuning this old engine than me.
![]() What you did here seems like a natural extension of my ideas, starting with complete emulation of instructions like JCXZ and SALC (which I annotated as not requiring to preserve flags in fasm's core, as fasm does not depend on this feature anywhere) and so on. On the other hand, what fascinates me even more is going in a very different direction - making a code that could have exactly the same binary form for 32-bit and 64-bit mode and do its job correctly in both cases. I talked about this on my stream and I demonstrated that it can applied at least partially to fasmg sources, when I modified instruction encoder so that it would not generate 67h prefixes for addresses. This works correctly only when code follows some rules - it cannot use negative offsets stored in registers (as they would require to be sign-extended when used in 64-bit addressing, but the "clearing upper part of register" rule only provides zero-extension). But the source of fasmg is written in a very specific style that allows for such tricks to be applied (it is very important, however, to have any such assumptions clearly documented in source, so that one can avoid breaking the rules when modifying the code). But your, one could say "opposite", approach provides much more safety and freedom of coding style. |
|||
![]() |
|
revolution 13 Feb 2019, 19:07
Tomasz Grysztar wrote: I'm starting to feel that perhaps you should take over further development of fasm 1, it seems you have much more enthusiasm for tuning this old engine than me. ![]() Tomasz Grysztar wrote: What you did here seems like a natural extension of my ideas, starting with complete emulation of instructions like JCXZ and SALC (which I annotated as not requiring to preserve flags in fasm's core, as fasm does not depend on this feature anywhere) and so on. Tomasz Grysztar wrote: On the other hand, what fascinates me even more is going in a very different direction - making a code that could have exactly the same binary form for 32-bit and 64-bit mode and do its job correctly in both cases. I talked about this on my stream and I demonstrated that it can applied at least partially to fasmg sources, when I modified instruction encoder so that it would not generate 67h prefixes for addresses. This works correctly only when code follows some rules - it cannot use negative offsets stored in registers (as they would require to be sign-extended when used in 64-bit addressing, but the "clearing upper part of register" rule only provides zero-extension). But the source of fasmg is written in a very specific style that allows for such tricks to be applied (it is very important, however, to have any such assumptions clearly documented in source, so that one can avoid breaking the rules when modifying the code). Last edited by revolution on 14 Feb 2019, 11:55; edited 1 time in total |
|||
![]() |
|
revolution 13 Feb 2019, 19:27
Another problem with this approach is if the target application is interested to know its EIP value and uses this
Code: call $ + 5 pop eax |
|||
![]() |
|
Tomasz Grysztar 13 Feb 2019, 19:41
revolution wrote: Another problem with this approach is if the target application is interested to know its EIP value and uses this Code: call $ + 5 Code: call @f
@@: |
|||
![]() |
|
revolution 14 Feb 2019, 06:15
Code1:
Code: call $ + 5 pop eax Code: call @f
@@:
pop eax For code 2 it will only succeed for the second approach where the native version of call is not used. I remember seeing 8086 code from the era when delays between port accesses was needed. Code: jmp $+2 |
|||
![]() |
|
revolution 15 Feb 2019, 15:32
revolution wrote: The next step is to move formats.inc up one line ... But with that include ordering we will break fasm because some of the labels are entry points from the instruction tables. It turns out there are only nine labels that are required to be in the correct place. Code: mov word [ebx],data_directive-instruction_handler ![]() |
|||
![]() |
|
petelomax 19 Feb 2020, 14:11
I know this is a bit late, but when I saw this
Code: lea rsp,[rsp-4] mov [rsp],eax my immediate though was wouldn't doing it this way round Code: mov [rsp-4],eax lea rsp,[rsp-4] save an agi stall, at the cost of 1 byte? So I ran a quick test, and to my surprise they appear to run at the same speed... do modern processors not suffer agi stalls anymore? |
|||
![]() |
|
DimonSoft 19 Feb 2020, 15:33
Would this effect outweigh effects caused by caching, context switching, paging? Did you take into account stuff like speculative/out-of-order execution? Reorder buffers are there for a reason.
|
|||
![]() |
|
revolution 19 Feb 2020, 17:14
petelomax wrote: So I ran a quick test ... ![]() For something as short running as fasm any such micro-optimisations will be completely swamped by the process start-up and IO overheads that you won't be able to measure such tiny differences in execution speed. So don't worry about it unless it takes many minutes to run and you need it to be 0.1 seconds faster. ![]() |
|||
![]() |
|
petelomax 21 Feb 2020, 20:15
DUH: on coming back to this I realise that an agi stall only really matters on read, when it has to stop the whole world and wait, whereas it can just carry on and leave a write to finish in it's own sweet time.
|
|||
![]() |
|
revolution 22 Feb 2020, 02:26
petelomax wrote: DUH: on coming back to this I realise that an agi stall only really matters on read, when it has to stop the whole world and wait, whereas it can just carry on and leave a write to finish in it's own sweet time. |
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.