flat assembler
Message board for the users of flat assembler.

Index > Windows > Hello 64! - Any idea why this fails?

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 26 Nov 2010, 10:38
There are probably directx/opengl functions that take floats as params.
Post 26 Nov 2010, 10:38
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4153
Location: vpcmpistri
bitRAKE 26 Nov 2010, 13:16
revolution wrote:
bitRAKE wrote:
The API uses SSE2 which requires alignment.
But which API? Do you have an example?
My experience has been that some API also use XMM as temp registers - it is not just API which seem SSE2 related. I just recall being surprised - not the exact APIs I was tracing through at the time, sorry.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 Nov 2010, 13:16
View user's profile Send private message Visit poster's website Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 514
Location: Czech republic, Slovak republic
Feryno 26 Nov 2010, 13:37
sample:

you call [MsgBoxA] with unaligned stack rsp = 16*x + 8

MsgBoxA (or its subprocedure) tries to erase some space at the stack (e.g. 2 qwords aligned at 16, but procedure doesn't know that you misaligned stack, procedure expects that you aligned stack properly)
win64 uses way like:
pxor xmm0,xmm0
movdqa [rsp + 16*y],xmm0
the last instruction leads into an exception because rsp is not aligned at 16
win uses that way (movdqa using xmm registers) to improve performance

the newer win version, the more APIs are sensitive for rsp=16*...

everything I found was only writes into stack using movdqa instruction, I have never found any other problem why rsp should be aligned at 16
Post 26 Nov 2010, 13:37
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 26 Nov 2010, 13:43
Feryno wrote:
pxor xmm0,xmm0
Shocked Don't let tom tobias see that! Laughing
Post 26 Nov 2010, 13:43
View user's profile Send private message Visit poster's website Reply with quote
Feryno



Joined: 23 Mar 2005
Posts: 514
Location: Czech republic, Slovak republic
Feryno 26 Nov 2010, 14:10
when you app crashes because of stack misaligment (you don't know yet why it crashes), then you can identy it quite easily:
run you app under debugger
debugger notifies you about an exception inside API (addresses like 00000000_7F...... or 00007FFF_........) and the buggy instruction is movdqa [rsp+...],xmm..
they you may be almost 100% sure that you misaligned stack
then check stack back trace to indentify the procedure which caused that (may be immediately the calling proc or few levels higher than the direct caller - stack back trace is a debugger feature which scans for return addresses from procedures recorded in stack and debugger then validates that some call procedure ends at the address recorded in the stack = sing of the call procedure, not 100% certain sign but may be very usefull to reverse the tree of calling when you went into some troubble deep inside of subprocedures)
Post 26 Nov 2010, 14:10
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 26 Nov 2010, 17:13
revolution wrote:
Because call only pushes one qword.
So is it not possible to have the return address on the odd part of 16 bytes? Or perhaps less efficient. idk

revolution wrote:
What about functions with more then four arguments? Using pre-allocation means that you can't use push arg, you would have to use mov [esp+offset],arg instead. A classic trade-off situation: sometimes you will gain and sometimes you will lose. No single solution will be optimal for everything.
Good answer, what are your thoughts to this method?
Code:
format PE64 GUI 5.0
entry start

struc  MEMORYSTATUSEX {
 .dwLength                   dd LenMEMORYSTATUSEX
 .dwMemoryLoad               dd ?
 .ullTotalPhys               dq ?
 .ullAvailPhys               dq ?
 .ullTotalPageFile           dq ?
 .ullAvailPageFile           dq ?
 .ullTotalVirtual            dq ?
 .ullAvailVirtual            dq ?
 .ullAvailExtendedVirtual    dq ?
}

include 'win64a.inc'

;=======================================
section '.text' code readable executable
;---------------------------------------
  start:
            sub     rsp,8*5

            mov     rcx,memstatex
            call    [GlobalMemoryStatusEx]
            shr     [memstatex.ullTotalPhys],20
            shr     [memstatex.ullAvailPhys],20
            shr     [memstatex.ullTotalPageFile],20
            shr     [memstatex.ullAvailPageFile],20
            shr     [memstatex.ullTotalVirtual],20
            shr     [memstatex.ullAvailVirtual],20

            push    [memstatex.ullAvailVirtual]
            push    [memstatex.ullTotalVirtual]
            push    [memstatex.ullAvailPageFile]
            push    [memstatex.ullTotalPageFile]
            push    [memstatex.ullAvailPhys]
            sub     rsp,8*4                        ;spillover space because of pushes
            mov     rcx,ResultBuff
            mov     rdx,ResultFormat2
            mov     r8d,[memstatex.dwMemoryLoad]
            mov     r9,[memstatex.ullTotalPhys]
            call    [wsprintf]
            add     rsp,8*9                        ;some cleaning up for wsprintf

            xor     rcx,rcx
            mov     rdx,ResultBuff
            mov     r8,Capt2
            mov     r9,rcx
            call    [MessageBox]

            xor     rcx,rcx
            call    [ExitProcess]

;=======================================
section '.data' data readable writeable
;---------------------------------------
  memstatex         MEMORYSTATUSEX
  LenMEMORYSTATUSEX = $-memstatex

  Capt2             db 'Memory Status 64-bit',0
  ResultFormat2     db 'Memory Load   ',9,'%u%%',10,10
                    db 'Total RAM   ',9,'%u MiB',10
                    db 'Avail RAM   ',9,'%u MiB',10,10
                    db 'Total Page  ',9,'%u MiB',10
                    db 'Avail Page  ',9,'%u MiB',10,10
                    db 'Total Virtual ',9,'%u MiB',10
                    db 'Avail Virtual ',9,'%u MiB',0
  ResultBuff        rb 200

;=======================================
section '.idata' import data readable
;---------------------------------------
  library kernel,'KERNEL32.DLL',\
          user,'USER32.DLL'

  import kernel,\
         GlobalMemoryStatusEx,'GlobalMemoryStatusEx',\
         ExitProcess,'ExitProcess'

  import user,\
         wsprintf,'wsprintfA',\
         MessageBox,'MessageBoxA'     


Tomasz Grysztar wrote:
You have to use "frame" macro to reduce stack allocations to a single one. And please also check the static RSP prologue/epilogue macros.
Yes, I like the look of the frame macro and will read up, and give it a try, thanks. Probably should update from 1.68 too. I know these double stack ops aren't really going to make much difference to the program but when I see it, for for some strange reason it irritates me.
Post 26 Nov 2010, 17:13
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20513
Location: In your JS exploiting you and your system
revolution 26 Nov 2010, 17:34
Alphonso wrote:
So is it not possible to have the return address on the odd part of 16 bytes? Or perhaps less efficient. idk
It is possible but no help. You now need to ensure that that stack is 16*x+8 before doing the call (rather than 16*x+0). Either way you will still have the mod 16 problem to consider.
Post 26 Nov 2010, 17:34
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 26 Nov 2010, 20:54
revolution,

Perhaps if MS have been set EFLAGS.AC earlier (it was available on Pentiums, at least), that make them learn it hard way. Wink
Post 26 Nov 2010, 20:54
View user's profile Send private message Reply with quote
Alphonso



Joined: 16 Jan 2007
Posts: 295
Alphonso 27 Nov 2010, 07:02
Just thought since the OS sets the stack up for us before the call it would have been simple to set the stack pointer but I guess if that were the case then perhaps it would have been done like that.
Post 27 Nov 2010, 07:02
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.