flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution 20 Dec 2007, 12:48
Stack alignment can be tricky to get right. There is some discussion on ways to do it in the Intel manuals. Basically I think you are aware that the variables need to be aligned. If you don't know, or have no control over, the alignment from the calling function then you can make your own alignment by manipulating esp and using another register (ebx perhaps) to access the stack.
|
|||
![]() |
|
bitRAKE 20 Dec 2007, 16:35
The raw code would look like:
sub esp,Needed_Space + (alignment-1) and esp,0-alignment ; ESP is aligned Where alignment is a power of two. Problem is how to restore ESP? So, it's usually put in another register. If you know the stack is aligned prior to your routine then it might be better to pass some fake argument to force correct alignment. |
|||
![]() |
|
Kuemmel 20 Dec 2007, 17:56
Thanks, I think I'm getting into it...
I saw something in Xorpd!s 64bit-code like: Code: align 16 thread_draw_x64_x4: .stack_gap = 536 sub rsp, .stack_gap mov [rsp+.stack_gap+8], rdi mov [rsp+.stack_gap+16], rsi mov [rsp+.stack_gap+24], rbx mov [rsp+.stack_gap+32], rbp mov [rsp+.stack_gap-8], r15 mov [rsp+.stack_gap-16], r14 mov [rsp+.stack_gap-24], r13 mov [rsp+.stack_gap-32], r12 .rz_high equ rsp+.stack_gap-40 movaps [rsp+480], xmm8 movaps [rsp+464], xmm7 movaps [rsp+448], xmm6 movaps [rsp+432], xmm15 movaps [rsp+416], xmm14 movaps [rsp+400], xmm13 movaps [rsp+384], xmm12 movaps [rsp+368], xmm11 movaps [rsp+352], xmm10 movaps [rsp+336], xmm9 lea rsi, [rsp+208] .Re_start_4 equ rsi+112 .Re_start_3 equ rsi+96 .Re_start_2 equ rsi+80 .Re_start_1 equ rsi+64 ...snip xor eax, eax mov rdi, [rsp+.stack_gap+8] mov rsi, [rsp+.stack_gap+16] mov rbx, [rsp+.stack_gap+24] mov rbp, [rsp+.stack_gap+32] mov r15, [rsp+.stack_gap-8] mov r14, [rsp+.stack_gap-16] mov r13, [rsp+.stack_gap-24] mov r12, [rsp+.stack_gap-32] movaps xmm8, [rsp+480] movaps xmm7, [rsp+464] movaps xmm6, [rsp+448] movaps xmm15, [rsp+432] movaps xmm14, [rsp+416] movaps xmm13, [rsp+400] movaps xmm12, [rsp+384] movaps xmm11, [rsp+368] movaps xmm10, [rsp+352] movaps xmm9, [rsp+336] add rsp, .stack_gap ret So how would my current code with 32-bit look like ? Here is the current version: Code: proc thread_draw_sse2 uses ebx esi, plot_y:DWORD local rz_low:QWORD, rz_high:QWORD,\ iz_low:QWORD, dz:QWORD,\ iz_temp_fpu:QWORD, rz_temp_fpu:QWORD,\ plot_x:DWORD, plot_limit:DWORD,\ local_iter_count:DWORD, dummy:DWORD ...snip ret endp Of course rsp would be esp and so on, but I'm confused still with the passing of the variable 'plot_y' ? And the whole syntax...? |
|||
![]() |
|
revolution 20 Dec 2007, 18:05
Open up a debugger and examine the code that that proc macro is generating. It should make more sense to you if you understand the underlying code that is generated.
|
|||
![]() |
|
bitRAKE 20 Dec 2007, 18:50
Maybe something like...
Code: ALIGNMENT = 64 thread_draw_sse2: label .ploy_y dword at eax+20 ; label .return dword at eax+16 label .ebp dword at eax+12 label .edi dword at eax+8 label .esi dword at eax+4 label .ebx dword at eax label .rz_low qword at esp label .rz_high qword at esp+8 label .iz_low qword at esp+16 label .dz qword at esp+24 label .iz_temp_fpu qword at esp+32 label .rz_temp_fpu qword at esp+40 label .plot_x dword at esp+48 label .plot_limit dword at esp+52 label .local_iter_count dword at esp+56 label .dummy dword at esp+60 lea eax,[esp-4*5] ; register save space ; local space, register save space, and alignment sub esp,8*6+4*4 + 4*5 + (ALIGNMENT-1) mov [.ebp],ebp mov [.edi],edi mov [.esi],esi mov [.ebx],ebx and esp,0-ALIGNMENT ; ...don't change EAX or ESP - everything else is availible... mov ebp,[.ebp] mov edi,[.edi] mov esi,[.esi] mov ebx,[.ebx] lea esp,[eax+4*5] retn 4 Edit: had a lined doubled up. need to actually look at code before posting. Last edited by bitRAKE on 21 Dec 2007, 05:10; edited 3 times in total |
|||
![]() |
|
Kuemmel 20 Dec 2007, 23:05
Hi bitRAKE, thanks for the code and explanation ! Just can it be that isn't quite FASM syntax ? I only get errors with trying to assemble it...
In the meanwhile I modified a little the code and put it into a Disassembler as recommended, it looks like: FASM: Code: proc thread_draw_sse2 uses ebx esi, plot_y:DWORD locals rz_low dq 0 rz_high dq 0 iz_low dq 0 dz dq 0 iz_temp_fpu dq 0 rz_temp_fpu dq 0 plot_x dd 0 plot_limit dd 0 local_iter_count dd 0 dummy dd 0 endl DISASSEMBLED: Code: 00000A80 55 PUSH ebp 00000A81 89E5 MOV ebp,esp 00000A83 83EC40 SUB esp,0x40 00000A86 53 PUSH ebx 00000A87 56 PUSH esi 00000A88 C745C000000000 MOV [ebp-0x40],0x00000000 00000A8F C745C400000000 MOV [ebp-0x3C],0x00000000 00000A96 C745C800000000 MOV [ebp-0x38],0x00000000 00000A9D C745CC00000000 MOV [ebp-0x34],0x00000000 00000AA4 C745D000000000 MOV [ebp-0x30],0x00000000 00000AAB C745D400000000 MOV [ebp-0x2C],0x00000000 00000AB2 C745D800000000 MOV [ebp-0x28],0x00000000 00000AB9 C745DC00000000 MOV [ebp-0x24],0x00000000 00000AC0 C745E000000000 MOV [ebp-0x20],0x00000000 00000AC7 C745E400000000 MOV [ebp-0x1C],0x00000000 00000ACE C745E800000000 MOV [ebp-0x18],0x00000000 00000AD5 C745EC00000000 MOV [ebp-0x14],0x00000000 00000ADC C745F000000000 MOV [ebp-0x10],0x00000000 00000AE3 C745F400000000 MOV [ebp-0x0C],0x00000000 00000AEA C745F800000000 MOV [ebp-0x08],0x00000000 00000AF1 C745FC00000000 MOV [ebp-0x04],0x00000000 ...the plot_y goes to [epb+0x08] as can be seen when loaded later in the code 00000B52 8B4508 MOV eax,[ebp+0x08] So first esp ist moved to epb and the everything is an offset to that ? Sorry, I never made my mind much up with stack and base pointer...is it in the end the same like you've shown, just unaligned ? May be it's important to know that this function is called 16 times in the code and asigned to different core's of the CPU, even if not there, so all the time I guess there is different virtual code location, would this have any negative impact to a modified aligned code ? |
|||
![]() |
|
bitRAKE 21 Dec 2007, 00:53
Sorry, the correct syntax is label - instead of all those local's. I've corrected the above post, and should run okay on multiple processors. EAX is used instead of EBP - easier to use one that is disposable, rather than another push/pop.
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.