flat assembler
Message board for the users of flat assembler.
Index
> Macroinstructions > efficient proc/endp and push/pop macros (32bit only) |
Are those macros overkill? | |||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Total Votes : 4 |
Author |
|
MCD 29 Oct 2007, 10:52
Hi, I was working on some more advanced and efficient "proc"/"endp" and "push"/"pop" macros the last weeks.
Especially in 32bit code, you have the problems of high register presure and wasting register "ebp" if you have some arguments in "stdcall", so I used to use "ebp" for other purposes and access parameters/local variables via "esp" instead of "ebp". Also, there is no code that saves and restores ebp in the prolog/epilog of a procedure. I used to call this calling convention "efficient call", because it is 1.) a little faster and 2.) smaller than stdcall. Unfortunately, you must keep track of the parameter by using different immediates when accessing them, so [esp+4] is the first paramter at the entry of the procedure, but if you do any "push", you will have to adjust this to [esp+i] with i>4. Why don't I use registers for such parameters? They actually cause more overhead because you would have to save the registers with the parameters anyway, this is at least true for bigger procedures. For smaller procedures, I usually inline them completely with macros or use jumps instead, which is again completely different. Another problem I had is that I wanted to switch between native register saving "push eax,edx.." and the more explicit register saving "sub esp,X mov [esp+x],eax mov [esp+x],edx..." and all the procedures in the code by changin just 1 preprocessor option for optimizing. These macros are intended for 32bit code only, this is mostly due to the fact that in 16bit mode you can't use "sp" to access the stack and in 64bit mode the register pressure is much lower, thus neglecting the need to free "rbp". also these macro extend the "push" and "pop" instructions to allow you to "push"/"pop" directly: 1.) immediates and memory of size qwords 2.) memory of size dqword 3.) MMX and SSE-registers and linear ranges of such registers Still these macros lack some important features, especially: 1.) allocating local variables inside a "proc endp" block must still be done manually 2.) the pushing and popping of multiple registers with non native "push"/"pop"s doesn't yet merge the "esp" incrementing into 1 instruction, which is quiet unoptimized But both of these are in the TODO list Another drawback is that those macros make the preprocessor and assembler consume an almost unacceptable amount of memory. Tomasz, isn't there any way to tell fasm to deallocate memory? Especially for unused symbolic constants (restore) and macros (purge), I uses them in the debugging version, but it turned that although they restore symbols and macros, they doesn't deallocate anything. This is especially bad for larger projects Code: ;General Options: ;================ ;show some general hints? define show_hints true ;Options for the push and pop macros: ;==================================== ;if set to "true", use native "push" and "pop" instructions when possible. ;if set to "false", code all "push" and "pop" instructions with "mov"s, ;which will increase code size, but might run faster on some CPUs. define native_pushpop true ;what sse mov instruction variant to use for non-native "push"/"pop"s. ;reasonable values are ;for SSE1+: movups ;for SSE2+: movups, movupd, movdqu ; ;the movaXX instructions would be possible, but this requires some ;stack aligning work, which is quiet difficult and inefficient on a per ;register basis define sse_mov movups ;don't modify the flags when working on the stack? ;this is only relevant for non-native "push"/"pop"s, ;as native ones always preserve the flags (except the popf). ;should be set to true, unless you know what you are doing and to save some ;code size. ;Note that yo can still save the flags with "pushf" at some point and restore ;them when needed with "popf", but the flags will be destroyed inside ;the "pushf" "popf" block from the beginning! define preserve_stack_flags true ;Options to the proc, save, return, endp and call macros: ;======================================================== ;If the code size of the restoring part is above this value, then all "return"s ;will simply perform jmps to the "endp" epilog. ;Else each "return" will have it's own register restoring part. ;Useful for doing performance/size tradeoffs. define max_code_size_inline_return 1 ;use mmx/sse instructions to optimize non-native prolog and epilog even when no ;mmx/sse registers have been specified. This is currently not implemented, ;but this may become useful in the future for optimizing savings/restorings ;of multiple large memory operands. ;mmx/sse instructions are nevertheless always used when explicitly saving ;mm? or xmm? registers ;define implicit_mmx off ;define implicit_sse off;off sse1 sse2 sse3 \t fix 0x9 \n fix 0xA macro displn [txt] { common match all, txt \{ forward display txt common \} display \n } virtual at 0 lldt [eax] load .prefix0 from 0 end virtual if .prefix0 = 0x67 displn 'error: program not compiled with use32!' end if macro _add_esp acc { ;;;TODO: merge the stack pointer increments of multiple register pushs/pops ;into 1 increment to save size and speed (if the registers are in 1 line only) ;I know how to do this, just haven't got enough time yet. Will be fixed ASAP if preserve_stack_flags eq true lea esp,[esp+acc] else if acc < 0 sub esp,-acc else add esp,acc end if end if } macro _pu arg { local op, size size equ op equ arg last_push_size= 4 match first rest, arg \{ size equ first op equ rest last_push_size= -1 \} match =dqword, size \{ last_push_size= 16 \} match =qword , size \{ last_push_size= 8 \} match =dword , size \{ last_push_size= 4 \} match =word , size \{ last_push_size= 2 \} ;WORD if op in <ax,cx,dx,bx,sp,bp,si,di>;,cs,ss,ds,es,fs,gs ;WORD last_push_size= 2 ;WORD end if ;WORD if last_push_size = 2 ;HINT & WORD match =true, show_hints \{ ;HINT & WORD displn 'hint: pushing words may cause stack-alignment'#\ ;HINT & WORD ' errors on some 32bit systems.' ;HINT & WORD \} ;HINT & WORD end if ;HINT & WORD if op eq flags if size eq | size eq dword pushf else if size eq word ;WORD pushfw ;WORD else;all other sizes aren't supported pushd arg end if rept 8 i:0 \{ else if op eq mm\#i last_push_size= 8 _add_esp -last_push_size movq size [esp],op else if op eq xmm\#i last_push_size= 16 _add_esp -last_push_size sse_mov size [esp],op \} else if op eqtype 0 | op eqtype "" | op eqtype 1.0 if last_push_size = -1 | size eq dqword pushd arg end if if (size eq & op >= -0x80000000 & op <= 0xFFFFFFFF) | arg eqtype 1.0 \ | size eq dword if native_pushpop eq true push arg else _add_esp -last_push_size mov dword [esp],op end if else if size eq word ;WORD if native_pushpop eq true ;WORD pushw op ;WORD else ;WORD _add_esp -last_push_size ;WORD mov word [esp],op ;WORD end if ;WORD else;if size eq qword last_push_size= 8 local tmp tmp= arg if native_pushpop eq true push tmp shr 32 push tmp and 0xFFFFFFFF else _add_esp -last_push_size mov dword [esp],tmp and 0xFFFFFFFF mov dword [esp+4],tmp shr 32 end if end if else if op eqtype [ebx] match addr], op \{ if size eq dqword push dword addr+12] push dword addr+8] push dword addr+4] push dword addr] else if size eq qword push dword addr+4] push dword addr] else if size eq word ;WORD pushw arg ;WORD else push arg end if \} else if native_pushpop eq false;if op eqtype eax _add_esp -last_push_size mov [esp],arg else push arg end if } macro _po arg { local op, size size equ op equ arg last_pop_size= 4 match first rest, arg \{ size equ first op equ rest last_pop_size= -1 \} match =dqword, size \{ last_pop_size= 16 \} match =qword , size \{ last_pop_size= 8 \} match =dword , size \{ last_pop_size= 4 \} match =word , size \{ last_pop_size= 2 \} ;WORD if op in <ax,cx,dx,bx,sp,bp,si,di>;,cs,ss,ds,es,fs,gs ;WORD last_pop_size= 2 ;WORD end if ;WORD if last_pop_size eq 2 ;HINT & WORD match =true, show_hints \{ ;HINT & WORD displn 'hint: popping words may cause stack-alignment'#\ ;HINT & WORD ' errors on some 32bit systems.' ;HINT & WORD \} ;HINT & WORD end if ;HINT & WORD if op eq flags if size eq | size eq dword popf else if size eq word ;WORD popfw ;WORD else;all other sizes aren't supported popd arg end if rept 8 i:0 \{ else if op eq mm\#i last_pop_size= 8 _add_esp last_pop_size movq op,size [esp] else if op eq xmm\#i last_pop_size= 16 _add_esp last_pop_size sse_mov op,size [esp] \} else if op eqtype [ebx] match addr], op \{ if size eq dqword pop dword addr] pop dword addr+4] pop dword addr+8] pop dword addr+12] else if size eq qword pop dword addr] pop dword addr+4] else if size eq word ;WORD popw arg ;WORD else pop arg end if \} else if native_pushpop eq false;if op eqtype eax _add_esp last_pop_size mov arg,[esp] else pop arg end if } macro _parse_args sngl_m,mult_m, [arg] { common ;here goes da big preprocessor parser, the match of your life! local not_symb, size_prefix, cat_stat, cat ;off flank_on on flank_off flank_tmp define cat_stat off irps i, arg \{ match =true, size_prefix \\{ define not_symb false match [, i \\\{ define not_symb true \\\} match =false, not_symb \\\{ cat equ cat i define cat_stat flank_off \\\} \\} define size_prefix false match =off, cat_stat \\{ match [ , i \\\{ define cat_stat flank_on \\\} match =dword , i \\\{ define cat_stat flank_tmp \\\} match =qword , i \\\{ define cat_stat flank_tmp \\\} match =dqword , i \\\{ define cat_stat flank_tmp \\\} ;not recommended though match =byte , i \\\{ define cat_stat flank_tmp \\\} match =word , i \\\{ define cat_stat flank_tmp \\\} match =fword , i \\\{ define cat_stat flank_tmp \\\} match =tword , i \\\{ define cat_stat flank_tmp \\\} \\} match =flank_tmp, cat_stat \\{ define size_prefix true define cat_stat flank_on \\} define not_symb false match =,, i \\{ define not_symb true \\} match =false, not_symb \\{ match =on, cat_stat \\\{ match all, cat \\\\{ cat equ all\\\\#i \\\\} \\\} match =flank_on, cat_stat \\\{ cat equ i define cat_stat on \\\} match ], i \\\{ define cat_stat flank_off \\\} match =off, cat_stat \\\{ sngl_m i \\\} match =flank_off, cat_stat \\\{ mult_m cat define cat_stat off \\\} \\} \} } ;offset of first argument within current stack frame at the beginning of proc define _proc_args_ofs 4 ;size of 1 pushed argument on stack define _proc_arg_size 4 ;name of current procedure ;setup by "proc" macro, should be treated readonly by actual code .proc_name equ .._proc_prolog= -0x100000000 macro proc name,[arg] { common name#: .proc_name equ name _args_size equ 0 .args_ofs= _proc_args_ofs .saved_regs equ .saved_regs_size= 0 match all, arg \{ forward ;args arg, the angry pirat! .\#arg equ esp+.args_ofs + _args_size _args_size equ _proc_arg_size + _args_size common \} .args_size= _args_size .._proc_prolog= $ if defined .epilog ;SYNTAX .saved_regs_code_size= ._after_epilog - .epilog - 1 else ;SYNTAX displn 'error: missing "endp"' ;SYNTAX end if ;SYNTAX } macro _save_sngl op { if op eqtype eax | op eq flags ;SYNTAX _pu op .saved_regs_size= .saved_regs_size + last_push_size .saved_regs equ op .saved_regs else ;SYNTAX displn 'error: can only save registers, memory '# \ ;SYNTAX 'or flags, skipping argument' ;SYNTAX end if ;SYNTAX } macro _save_mult op { local m if op eqtype dword [ebx] | cat eq dword flags | cat eq word flags \ | cat eqtype dword eax | op eqtype [ebx] ;SYNTAX _pu op .saved_regs_size= .saved_regs_size + last_push_size .saved_regs equ m .saved_regs match all, op \{ define m all \} else ;SYNTAX displn 'error: can only save registers, memory '# \ ;SYNTAX 'or flags, skipping argument' ;SYNTAX end if } macro save [arg] { common if $ - .._proc_prolog <> 0 displn 'error: "save" statement not directly after "proc"' ;SYNTAX ;"save" is not allowed in the middle of procedures 'cause neither ;SYNTAX ;preprocessor nor assembler would know how often each "save" would ;SYNTAX ;actually get executed. ;SYNTAX ;They wouldn't know how to change arg_ofs and thus ;SYNTAX ;all argument symbols may get broken! ;SYNTAX else _parse_args _save_sngl,_save_mult, arg @@: match name, .proc_name \{ name equ @b \} .args_ofs= .args_ofs + .saved_regs_size end if } macro _do_epilog { match all, .saved_regs \{ irps i, all \\{ if i eqtype eax | i eq flags | i eqtype dword [ebx] \ | i eq dword flags | i eq word flags | i eqtype dword eax | i eqtype [ebx] _po i end if \\} \} if .args_size = 0 ret else ret .args_size end if } macro return { if ~.proc_name eq if .saved_regs_code_size > max_code_size_inline_return jmp .epilog else _do_epilog end if else ;HINT match =true, show_hints \{ ;HINT displn 'hint: superfluous "return"' ;HINT \} ;HINT end if } macro endp { if ~.proc_name eq .epilog: _do_epilog ._after_epilog: .saved_regs equ .proc_name equ local tmp tmp: else ;HINT match =true, show_hints \{ displn 'hint: superfluous "endp"' \} ;HINT end if } macro call proc,[arg] { ;;;TODO: add optional parameter count checking common if ~ arg eq reverse _pu arg common end if call proc } pword equ fword tbyte equ tword oword equ dqword xword equ dqword ;some shortcuts, handy for pushing and popping registers ;we must use "fix" because of the commas mm0..0 fix mm0 ;OPTIONAL mm0..1 fix mm0,mm1 ;OPTIONAL mm0..2 fix mm0,mm1,mm2 mm0..3 fix mm0,mm1,mm2,mm3 mm0..4 fix mm0,mm1,mm2,mm3,mm4 mm0..5 fix mm0,mm1,mm2,mm3,mm4,mm5 mm0..6 fix mm0,mm1,mm2,mm3,mm4,mm5,mm6 mm0..7 fix mm0,mm1,mm2,mm3,mm4,mm5,mm6,mm7 mm1..1 fix mm1 ;OPTIONAL mm1..2 fix mm1,mm2 ;OPTIONAL mm1..3 fix mm1,mm2,mm3 mm1..4 fix mm1,mm2,mm3,mm4 mm1..5 fix mm1,mm2,mm3,mm4,mm5 mm1..6 fix mm1,mm2,mm3,mm4,mm5,mm6 mm1..7 fix mm1,mm2,mm3,mm4,mm5,mm6,mm7 mm2..2 fix mm2 ;OPTIONAL mm2..3 fix mm2,mm3 ;OPTIONAL mm2..4 fix mm2,mm3,mm4 mm2..5 fix mm2,mm3,mm4,mm5 mm2..6 fix mm2,mm3,mm4,mm5,mm6 mm2..7 fix mm2,mm3,mm4,mm5,mm6,mm7 mm3..3 fix mm3 ;OPTIONAL mm3..4 fix mm3,mm4 ;OPTIONAL mm3..5 fix mm3,mm4,mm5 mm3..6 fix mm3,mm4,mm5,mm6 mm3..7 fix mm3,mm4,mm5,mm6,mm7 mm4..4 fix mm4 ;OPTIONAL mm4..5 fix mm4,mm5 ;OPTIONAL mm4..6 fix mm4,mm5,mm6 mm4..7 fix mm4,mm5,mm6,mm7 mm5..5 fix mm5 ;OPTIONAL mm5..6 fix mm5,mm6 ;OPTIONAL mm5..7 fix mm5,mm6,mm7 mm6..6 fix mm6 ;OPTIONAL mm6..7 fix mm6,mm7 ;OPTIONAL mm7..7 fix mm7 ;OPTIONAL mm7..6 fix mm7,mm6 ;OPTIONAL mm7..5 fix mm7,mm6,mm5 mm7..4 fix mm7,mm6,mm5,mm4 mm7..3 fix mm7,mm6,mm5,mm4,mm3 mm7..2 fix mm7,mm6,mm5,mm4,mm3,mm2 mm7..1 fix mm7,mm6,mm5,mm4,mm3,mm2,mm1 mm7..0 fix mm7,mm6,mm5,mm4,mm3,mm2,mm1,mm0 mm6..5 fix mm6,mm5 ;OPTIONAL mm6..4 fix mm6,mm5,mm4 mm6..3 fix mm6,mm5,mm4,mm3 mm6..2 fix mm6,mm5,mm4,mm3,mm2 mm6..1 fix mm6,mm5,mm4,mm3,mm2,mm1 mm6..0 fix mm6,mm5,mm4,mm3,mm2,mm1,mm0 mm5..4 fix mm5,mm4 ;OPTIONAL mm5..3 fix mm5,mm4,mm3 mm5..2 fix mm5,mm4,mm3,mm2 mm5..1 fix mm5,mm4,mm3,mm2,mm1 mm5..0 fix mm5,mm4,mm3,mm2,mm1,mm0 mm4..3 fix mm4,mm3 ;OPTIONAL mm4..2 fix mm4,mm3,mm2 mm4..1 fix mm4,mm3,mm2,mm1 mm4..0 fix mm4,mm3,mm2,mm1,mm0 mm3..2 fix mm3,mm2 ;OPTIONAL mm3..1 fix mm3,mm2,mm1 mm3..0 fix mm3,mm2,mm1,mm0 mm2..1 fix mm2,mm1 ;OPTIONAL mm2..0 fix mm2,mm1,mm0 mm1..0 fix mm1,mm0 ;OPTIONAL xmm0..0 fix xmm0 ;OPTIONAL xmm0..1 fix xmm0,xmm1 ;OPTIONAL xmm0..2 fix xmm0,xmm1,xmm2 xmm0..3 fix xmm0,xmm1,xmm2,xmm3 xmm0..4 fix xmm0,xmm1,xmm2,xmm3,xmm4 xmm0..5 fix xmm0,xmm1,xmm2,xmm3,xmm4,xmm5 xmm0..6 fix xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6 xmm0..7 fix xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7 xmm1..1 fix xmm1 ;OPTIONAL xmm1..2 fix xmm1,xmm2 ;OPTIONAL xmm1..3 fix xmm1,xmm2,xmm3 xmm1..4 fix xmm1,xmm2,xmm3,xmm4 xmm1..5 fix xmm1,xmm2,xmm3,xmm4,xmm5 xmm1..6 fix xmm1,xmm2,xmm3,xmm4,xmm5,xmm6 xmm1..7 fix xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7 xmm2..2 fix xmm2 ;OPTIONAL xmm2..3 fix xmm2,xmm3 ;OPTIONAL xmm2..4 fix xmm2,xmm3,xmm4 xmm2..5 fix xmm2,xmm3,xmm4,xmm5 xmm2..6 fix xmm2,xmm3,xmm4,xmm5,xmm6 xmm2..7 fix xmm2,xmm3,xmm4,xmm5,xmm6,xmm7 xmm3..3 fix xmm3 ;OPTIONAL xmm3..4 fix xmm3,xmm4 ;OPTIONAL xmm3..5 fix xmm3,xmm4,xmm5 xmm3..6 fix xmm3,xmm4,xmm5,xmm6 xmm3..7 fix xmm3,xmm4,xmm5,xmm6,xmm7 xmm4..4 fix xmm4 ;OPTIONAL xmm4..5 fix xmm4,xmm5 ;OPTIONAL xmm4..6 fix xmm4,xmm5,xmm6 xmm4..7 fix xmm4,xmm5,xmm6,xmm7 xmm5..5 fix xmm5 ;OPTIONAL xmm5..6 fix xmm5,xmm6 ;OPTIONAL xmm5..7 fix xmm5,xmm6,xmm7 xmm6..6 fix xmm6 ;OPTIONAL xmm6..7 fix xmm6,xmm7 ;OPTIONAL xmm7..7 fix xmm7 ;OPTIONAL xmm7..6 fix xmm7,xmm6 ;OPTIONAL xmm7..5 fix xmm7,xmm6,xmm5 xmm7..4 fix xmm7,xmm6,xmm5,xmm4 xmm7..3 fix xmm7,xmm6,xmm5,xmm4,xmm3 xmm7..2 fix xmm7,xmm6,xmm5,xmm4,xmm3,xmm2 xmm7..1 fix xmm7,xmm6,xmm5,xmm4,xmm3,xmm2,xmm1 xmm7..0 fix xmm7,xmm6,xmm5,xmm4,xmm3,xmm2,xmm1,xmm0 xmm6..5 fix xmm6,xmm5 ;OPTIONAL xmm6..4 fix xmm6,xmm5,xmm4 xmm6..3 fix xmm6,xmm5,xmm4,xmm3 xmm6..2 fix xmm6,xmm5,xmm4,xmm3,xmm2 xmm6..1 fix xmm6,xmm5,xmm4,xmm3,xmm2,xmm1 xmm6..0 fix xmm6,xmm5,xmm4,xmm3,xmm2,xmm1,xmm0 xmm5..4 fix xmm5,xmm4 ;OPTIONAL xmm5..3 fix xmm5,xmm4,xmm3 xmm5..2 fix xmm5,xmm4,xmm3,xmm2 xmm5..1 fix xmm5,xmm4,xmm3,xmm2,xmm1 xmm5..0 fix xmm5,xmm4,xmm3,xmm2,xmm1,xmm0 xmm4..3 fix xmm4,xmm3 ;OPTIONAL xmm4..2 fix xmm4,xmm3,xmm2 xmm4..1 fix xmm4,xmm3,xmm2,xmm1 xmm4..0 fix xmm4,xmm3,xmm2,xmm1,xmm0 xmm3..2 fix xmm3,xmm2 ;OPTIONAL xmm3..1 fix xmm3,xmm2,xmm1 xmm3..0 fix xmm3,xmm2,xmm1,xmm0 xmm2..1 fix xmm2,xmm1 ;OPTIONAL xmm2..0 fix xmm2,xmm1,xmm0 xmm1..0 fix xmm1,xmm0 ;OPTIONAL ;this is supposed to be 32bit only macros, so we don't have xmm8..15 ;( ;is there any need for shortcuts to all GPRs? I don't considering defining those ;because of the myriad of different combinations out there macro _push [arg] { common _parse_args _pu,_pu, arg } macro _pop [arg] { common _parse_args _po,_po, arg } ;we can't just call the above macros "push" and "pop", or else we will get ;circular macro references if you do something like ; ;proc whatever ;save something ;endp push fix _push pop fix _pop ;;;TODO: add pushd, pushw, popd and popw macros ok, and here is the developpment version, with all kind of DEBUGging and TESTing stuff inside
_________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || Last edited by MCD on 29 Oct 2007, 11:47; edited 1 time in total |
|||||||||||
29 Oct 2007, 10:52 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.