flat assembler
Message board for the users of flat assembler.

Index > Windows > SSE2 local variables and memory

Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 20 Dec 2007, 12:25
Hi there, recently I came back to work on my code for KMB benchmark. I want to use some local variables to store SSE-128bit varibales locally, but unfortunatelly something is wrong. I think it's something with alignment, but I can't figure it out:
I got the proc
Code:
proc thread_draw_sse2 uses ebx esi, plot_y:DWORD
    local   rz_low:QWORD,               
    ...snip...
    


First there seems to be no possibility to define a DQWORD as needed for SSE...so I just defined it like:
Code:
proc thread_draw_sse2 uses ebx esi, plot_y:DWORD
locals
    sse2_test_local  rb 16
    rz_low dq 0
...
endl    

Whereas I can use perfectly things like e.g. in the proc
Code:
fstp      [rz_low]    

Things like
Code:
movapd        xmm0,dqword[global_variable] ;"get some global data
movapd dqword[sse2_test_local],xmm0 ;"store locally for later use"    

just don't seem to work...any clue !?
Post 20 Dec 2007, 12:25
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20530
Location: In your JS exploiting you and your system
revolution 20 Dec 2007, 12:48
Stack alignment can be tricky to get right. There is some discussion on ways to do it in the Intel manuals. Basically I think you are aware that the variables need to be aligned. If you don't know, or have no control over, the alignment from the calling function then you can make your own alignment by manipulating esp and using another register (ebx perhaps) to access the stack.
Post 20 Dec 2007, 12:48
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4180
Location: vpcmpistri
bitRAKE 20 Dec 2007, 16:35
The raw code would look like:

sub esp,Needed_Space + (alignment-1)
and esp,0-alignment
; ESP is aligned

Where alignment is a power of two. Problem is how to restore ESP? So, it's usually put in another register.

If you know the stack is aligned prior to your routine then it might be better to pass some fake argument to force correct alignment.
Post 20 Dec 2007, 16:35
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 20 Dec 2007, 17:56
Thanks, I think I'm getting into it...
I saw something in Xorpd!s 64bit-code like:
Code:
align 16
thread_draw_x64_x4:
 .stack_gap = 536
   sub    rsp, .stack_gap
   mov    [rsp+.stack_gap+8], rdi
   mov    [rsp+.stack_gap+16], rsi
   mov    [rsp+.stack_gap+24], rbx
   mov    [rsp+.stack_gap+32], rbp
   mov    [rsp+.stack_gap-8], r15
   mov    [rsp+.stack_gap-16], r14
   mov    [rsp+.stack_gap-24], r13
   mov    [rsp+.stack_gap-32], r12
 .rz_high    equ rsp+.stack_gap-40
   movaps [rsp+480], xmm8
   movaps [rsp+464], xmm7
   movaps [rsp+448], xmm6
   movaps [rsp+432], xmm15
   movaps [rsp+416], xmm14
   movaps [rsp+400], xmm13
   movaps [rsp+384], xmm12
   movaps [rsp+368], xmm11
   movaps [rsp+352], xmm10
   movaps [rsp+336], xmm9
   lea    rsi, [rsp+208]
 .Re_start_4 equ rsi+112
 .Re_start_3 equ rsi+96
 .Re_start_2 equ rsi+80
 .Re_start_1 equ rsi+64

...snip

    xor    eax, eax
   mov    rdi, [rsp+.stack_gap+8]
   mov    rsi, [rsp+.stack_gap+16]
   mov    rbx, [rsp+.stack_gap+24]
   mov    rbp, [rsp+.stack_gap+32]
   mov    r15, [rsp+.stack_gap-8]
   mov    r14, [rsp+.stack_gap-16]
   mov    r13, [rsp+.stack_gap-24]
   mov    r12, [rsp+.stack_gap-32]
   movaps xmm8, [rsp+480]
   movaps xmm7, [rsp+464]
   movaps xmm6, [rsp+448]
   movaps xmm15, [rsp+432]
   movaps xmm14, [rsp+416]
   movaps xmm13, [rsp+400]
   movaps xmm12, [rsp+384]
   movaps xmm11, [rsp+368]
   movaps xmm10, [rsp+352]
   movaps xmm9, [rsp+336]
   add    rsp, .stack_gap
   ret
    

So how would my current code with 32-bit look like ? Here is the current version:
Code:
proc thread_draw_sse2 uses ebx esi, plot_y:DWORD
   local   rz_low:QWORD,                       rz_high:QWORD,\
                        iz_low:QWORD,                       dz:QWORD,\
                     iz_temp_fpu:QWORD,          rz_temp_fpu:QWORD,\
                    plot_x:DWORD,                       plot_limit:DWORD,\
                     local_iter_count:DWORD,     dummy:DWORD

...snip
      
   ret
endp    

Of course rsp would be esp and so on, but I'm confused still with the passing of the variable 'plot_y' ? And the whole syntax...?
Post 20 Dec 2007, 17:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20530
Location: In your JS exploiting you and your system
revolution 20 Dec 2007, 18:05
Open up a debugger and examine the code that that proc macro is generating. It should make more sense to you if you understand the underlying code that is generated.
Post 20 Dec 2007, 18:05
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4180
Location: vpcmpistri
bitRAKE 20 Dec 2007, 18:50
Maybe something like...
Code:
ALIGNMENT = 64
thread_draw_sse2:
        label .ploy_y           dword at eax+20
;       label .return           dword at eax+16
        label .ebp              dword at eax+12
        label .edi              dword at eax+8
        label .esi              dword at eax+4
        label .ebx              dword at eax

        label .rz_low           qword at esp
        label .rz_high          qword at esp+8
        label .iz_low           qword at esp+16
        label .dz               qword at esp+24
        label .iz_temp_fpu      qword at esp+32
        label .rz_temp_fpu      qword at esp+40
        label .plot_x           dword at esp+48
        label .plot_limit       dword at esp+52
        label .local_iter_count dword at esp+56
        label .dummy            dword at esp+60

        lea eax,[esp-4*5] ; register save space
; local space, register save space, and alignment
        sub esp,8*6+4*4 + 4*5 + (ALIGNMENT-1)
        mov [.ebp],ebp
        mov [.edi],edi
        mov [.esi],esi
        mov [.ebx],ebx
        and esp,0-ALIGNMENT

; ...don't change EAX or ESP - everything else is availible...

        mov ebp,[.ebp]
        mov edi,[.edi]
        mov esi,[.esi]
        mov ebx,[.ebx]
        lea esp,[eax+4*5]
        retn 4    
I usually not so nice to my windows threads, lol.

Edit: had a lined doubled up. need to actually look at code before posting.


Last edited by bitRAKE on 21 Dec 2007, 05:10; edited 3 times in total
Post 20 Dec 2007, 18:50
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 20 Dec 2007, 23:05
Hi bitRAKE, thanks for the code and explanation ! Just can it be that isn't quite FASM syntax ? I only get errors with trying to assemble it...

In the meanwhile I modified a little the code and put it into a Disassembler as recommended, it looks like:
FASM:
Code:
proc thread_draw_sse2 uses ebx esi, plot_y:DWORD
  locals
              rz_low                          dq 0
                rz_high                         dq 0
                iz_low                          dq 0
                dz                              dq 0
                iz_temp_fpu                     dq 0
                rz_temp_fpu                     dq 0
                plot_x                          dd 0
                plot_limit                      dd 0
                local_iter_count              dd 0
          dummy                           dd 0
        endl
    

DISASSEMBLED:
Code:
00000A80        55      PUSH    ebp
00000A81 89E5    MOV     ebp,esp
00000A83     83EC40  SUB     esp,0x40
00000A86    53      PUSH    ebx
00000A87 56      PUSH    esi
00000A88 C745C000000000  MOV     [ebp-0x40],0x00000000
00000A8F       C745C400000000  MOV     [ebp-0x3C],0x00000000
00000A96       C745C800000000  MOV     [ebp-0x38],0x00000000
00000A9D       C745CC00000000  MOV     [ebp-0x34],0x00000000
00000AA4       C745D000000000  MOV     [ebp-0x30],0x00000000
00000AAB       C745D400000000  MOV     [ebp-0x2C],0x00000000
00000AB2       C745D800000000  MOV     [ebp-0x28],0x00000000
00000AB9       C745DC00000000  MOV     [ebp-0x24],0x00000000
00000AC0       C745E000000000  MOV     [ebp-0x20],0x00000000
00000AC7       C745E400000000  MOV     [ebp-0x1C],0x00000000
00000ACE       C745E800000000  MOV     [ebp-0x18],0x00000000
00000AD5       C745EC00000000  MOV     [ebp-0x14],0x00000000
00000ADC       C745F000000000  MOV     [ebp-0x10],0x00000000
00000AE3       C745F400000000  MOV     [ebp-0x0C],0x00000000
00000AEA       C745F800000000  MOV     [ebp-0x08],0x00000000
00000AF1       C745FC00000000  MOV     [ebp-0x04],0x00000000
...the plot_y goes to [epb+0x08] as can be seen when loaded later in the code
00000B52      8B4508  MOV     eax,[ebp+0x08]    

So first esp ist moved to epb and the everything is an offset to that ? Sorry, I never made my mind much up with stack and base pointer...is it in the end the same like you've shown, just unaligned ?

May be it's important to know that this function is called 16 times in the code and asigned to different core's of the CPU, even if not there, so all the time I guess there is different virtual code location, would this have any negative impact to a modified aligned code ?
Post 20 Dec 2007, 23:05
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4180
Location: vpcmpistri
bitRAKE 21 Dec 2007, 00:53
Sorry, the correct syntax is label - instead of all those local's. I've corrected the above post, and should run okay on multiple processors. EAX is used instead of EBP - easier to use one that is disposable, rather than another push/pop.
Post 21 Dec 2007, 00:53
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.