flat assembler
Message board for the users of flat assembler.

Index > Windows > bug in calling convention

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
I am making a compiler and the only communication with the OS are the dynamic libraries, for this I have routines to call these functions, I am doing something wrong here since when it returns on the stack where the return of the function should be, there is a 0 , so when the program returns it breaks.

I see this through x64dbg

Code:
PUSH RSP
PUSH qword [RSP]
ADD RSP,8
AND SPL,0F0h ; adjunst 16bytes align

sub RSP,$20 ; shaddow space in stack

mov rcx,[rbp] ; load parameters
mov [rsp],rcx
mov r9,[rbp-1*8]
mov r8,[rbp-2*8]
mov rdx,[rbp-3*8]
mov rcx,[rbp-4*8]

call rax  ; call windows

add RSP,$20 ; restore shadows
sub rbp,8*5 ; adjust parameters ( not related to call)
POP RSP ; restore align

RET ; ********* ret here crash **********
    


this is the code, in rax is the adress of WRITEFILE, the code works well, write in console. but then grash

any idea?
Post 11 Sep 2021, 15:20
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18469
Location: In your JS exploiting you and your system
revolution
Before you can alter RSP manually with "AND SPL,..." you need to save the original RSP somewhere.

Usually RBP is used to keep the original RSP value, so that later you can recover.
Code:
mov rbp, rsp ; save original rsp
and spl, ...
; ...
mov rsp, rbp ; recover original rsp
ret    
There may be other problems in your code also, this is just one thing to be aware of.
Post 11 Sep 2021, 15:32
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 480
Location: Russia
macomics
And it's nothing that the WriteFile function has 5 parameters, not 4. And the 5 parameter in your case is passed as rsp-lpOverlapped. Operations with the console go immediately and do not use postponed I/O, but with files it's just the opposite.
Post 11 Sep 2021, 16:10
View user's profile Send private message Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
I use rbp for other stack and the adress of stack in ret position is Ok, but be 0, is like window clear this position in stack.

with 2 parameters this work ok, but all pararameters are passed with registers.
Code:
; sys-SetConsoleMode
add rbp,8
mov [rbp],rax
mov rax,qword[w6]
; SYS2
PUSH RSP
PUSH qword [RSP]
ADD RSP,8
AND SPL,0F0h
sub RSP,$20
mov rdx,[rbp]
mov rcx,[rbp-1*8]
call rax
add RSP,$20
sub rbp,8*2
POP RSP
; DROP
mov rax,[rbp]
sub rbp,8
    
Post 11 Sep 2021, 16:11
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18469
Location: In your JS exploiting you and your system
revolution
You can't use "POP RSP" because the value of RSP has changed, so the value poped will come from a different addres, and now your code will crash.

Remember that PUSH/POP use RSP for the address pointer. So changing RSP changes the stack pointer, and your code crashes.
Post 11 Sep 2021, 16:15
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
@macominc,
It is not a specific call, it is only the structure for when the call has 5 parameters, in the interpreter that works correctly this is the code (gcc output)

Code:
loc_4032DE:             ; jumptable 0000000000402846 case 124
mov     rax, [r13+0]
mov     rdx, [r13-18h]
sub     r13, 28h ; '('
mov     rcx, [r13+8]
mov     r9, [r13+20h]
mov     r8, [r13+18h]
mov     [rsp+0D8h-0B8h], rax
call    r11
mov     r11, rax
jmp     loc_402859      ; jumptable 0000000000402846 case 130
    


you can see, not need adjust stack for align or shadow stack because is a bigloop without calls but this is made by GCC, when compile I use call then unalign stack then need realign when comunicate whit the OS.

@revolution
this is a macro for align the stack I use from this forum, I think Locodelassembly is the author, I test in the prior compiler and work ok
Post 11 Sep 2021, 16:30
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18469
Location: In your JS exploiting you and your system
revolution
Your instruction "AND SPL,0F0h" adjusts the stack pointer.
Post 11 Sep 2021, 16:37
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 480
Location: Russia
macomics
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile wrote:
Considerations for working with synchronous file handles:

If lpOverlapped is NULL, the write operation starts at the current file position and WriteFile does not return until the operation is complete, and the system updates the file pointer before WriteFile returns.
If lpOverlapped is not NULL, the write operation starts at the offset that is specified in the OVERLAPPED structure and WriteFile does not return until the write operation is complete. The system updates the OVERLAPPED Internal and InternalHigh fields before WriteFile returns.
Code:
typedef struct _OVERLAPPED {
  ULONG_PTR Internal;  // rewrite rsp
  ULONG_PTR InternalHigh;
  union {
    struct {
      DWORD Offset;
      DWORD OffsetHigh;
    } DUMMYSTRUCTNAME;
    PVOID Pointer;
  } DUMMYUNIONNAME;
  HANDLE    hEvent;
} OVERLAPPED, *LPOVERLAPPED;    
pabloreda wrote:

loc_4032DE: ; jumptable 0000000000402846 case 124
mov rax, [r13+0]
mov rdx, [r13-18h]
sub r13, 28h ; '('
mov rcx, [r13+8]
mov r9, [r13+20h]
mov r8, [r13+18h]
mov [rsp+0D8h-0B8h], rax ; mov [rsp + 32], rax <- lpOverlapped
call r11
mov r11, rax
jmp loc_402859 ; jumptable 0000000000402846 case 130
Post 11 Sep 2021, 16:43
View user's profile Send private message Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
ok, then 5 parameter is place bad in the shadow stack!!, thanks macomix !!
Post 11 Sep 2021, 17:19
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3387
Location: vpcmipstrm
bitRAKE
It's a cute trick to restore RSP:
Code:
PUSH RSP
PUSH qword [RSP] ; another copy
ADD RSP,8 ; could be superfluous Wink
AND SPL,0F0h ; adjunst 16bytes align    
... RSP value is put on stack twice. So, whatever the result of the alignment RSP can be restored.
Post 11 Sep 2021, 23:03
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
from here https://board.flatassembler.net/topic.php?t=11133

crazy things do now the code...I need research more
Post 12 Sep 2021, 01:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18469
Location: In your JS exploiting you and your system
revolution
bitRAKE wrote:
It's a cute trick to restore RSP:[code]PUSH RSP
Not really "cute". It's actually useless and confusing. Push/pop rsp should never exist in code unless the intent is to confuse the reader, or appear to be "clever", or something not related to getting code to run. Normal code doesn't ever need it.

It's much simpler to just follow the standard and then everything works fine. The standard itself, the fastcall for x64, is a bit silly IMO, but it does work. There isn't any need to make it worse with storing the stack pointer on the stack. If you already know where the stored stack pointer is on the stack then that means you already know where the stack is pointing, so no need to "restore" it, you had it all along.
Post 12 Sep 2021, 01:53
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3387
Location: vpcmipstrm
bitRAKE
I use ENTER/LEAVE almost everywhere because I like how the VIRTUAL block documents the stack frame. I don't use INVOKE because it doesn't track needed stack depth (maximum parameter count). I've tried a lot of different automatic methods and settled on manually defining the frame.
Code:
MainWindow.InitCloseButton: ;HWND (stack)
        virtual at RBP-.FRAME
                        rq 4
                repeat 13-4,k:5
                .P.k dq ?
                end repeat

                .tbar   dq ?,?

                .FRAME := $-$$
                        rq 2 ; RBP,ret
                .hWnd   dq ?
        end virtual
        enter .FRAME,0    
(Above .P.# are parameters for win64abi used by custom INVOKE type macro.) Some might think it overly verbose, but I can come back months later and quickly know what is going on, as well as adapt to weird calling conventions. Here is a another example:
Code:
SetWindowBlur: ; HWND (on stack)
        virtual at RBP-.FRAME
                        rq 4
                .hMod   dq ? ; HINSTANCE
                .FRAME := $-$$
                        dq ? ; RBP
                        dq ? ; ret
                .hWnd   dq ?
        end virtual
        enter .FRAME,0
        LoadLibraryA <ANSI 'user32'>
        xchg rcx,rax
        jrcxz .fail
        mov [.hMod],rcx
        ; Might be broken in the future due to this undocumented API
        ; https://www.google.com/search?q=SetWindowCompositionAttribute
        GetProcAddress rcx,<ANSI 'SetWindowCompositionAttribute'>
        test rax,rax
        jz @F
        call rax,[.hWnd],ADDR .data ; WINCOMPATTRDATA
@@:
        FreeLibrary [.hMod]
.fail:
        leave
        retn 8    
It's called by passing parameters on the stack and then uses win64abi internally. The alignment is implied, but we can see there is an even number of QWORDs in the VIRTUAL block.

This is just the model that works well for me.

Edit: I use the stack for dynamic storage frequently and being able to cleanup with LEAVE is convenient as well. More complex frames I add ASSERTs to insure I don't inadvertently mess up an alignment, but it's all well documented in the frame. Sections of a large frame can be redefined/repurposed - eliminating the need for changes in RSP/RBP.

How would that look without ENTER/LEAVE - almost the same in this case:
Code:
SetWindowBlur: ; HWND (on stack)
        virtual at RSP
                        rq 4
                .hMod   dq ? ; HINSTANCE
                        dq ?
                .FRAME := $-$$
                        dq ? ; ret
                .hWnd   dq ?
        end virtual
        sub rsp,.FRAME
        LoadLibraryA <ANSI 'user32'>
        xchg rcx,rax
        jrcxz .fail
        mov [.hMod],rcx
        GetProcAddress rcx,<ANSI 'SetWindowCompositionAttribute'>
        test rax,rax
        jz @F
        call rax,[.hWnd],ADDR .data ; WINCOMPATTRDATA
@@:
        FreeLibrary [.hMod]
.fail:
        add rsp,.FRAME
        retn 8    
(Note different base ($$) for VIRTUAL.) My ABI macro uses the definition of parameter space defined locally - where as, doing it automatically would make frame <-> frameless transition more difficult. If I have the need, I want that option with minimal work.

_________________
¯\(°_o)/¯ unlicense.org


Last edited by bitRAKE on 19 Sep 2021, 15:52; edited 2 times in total
Post 12 Sep 2021, 02:01
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 480
Location: Russia
macomics
revolution wrote:
It's much simpler to just follow the standard and then everything works fine. The standard itself, the fastcall for x64, is a bit silly IMO, but it does work. There isn't any need to make it worse with storing the stack pointer on the stack. If you already know where the stored stack pointer is on the stack then that means you already know where the stack is pointing, so no need to "restore" it, you had it all along.
But it should restore the stack pointer to cancel the alignment operation, and not because it is necessary. The pop rsp command at the output extracts 8 or 16 bytes from the stack, depending on the alignment, and does nothing else (just a short undo command). Although it is possible to check for the operability of the command
Code:
mov rsp, [rsp+32] ;  instead of add rsp, $20 / pop rsp     
Post 12 Sep 2021, 04:27
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18469
Location: In your JS exploiting you and your system
revolution
You can't use "pop rsp" when the rsp value used to push was a different value than to pop. You get a corrupted stack.
Code:
push rsp        ; rsp before = 0x1000, value after = 0x0ff8
and spl, 0xf0   ; rsp before = 0x0ff8, value after = 0x0ff0
; some code
push something  ; rsp before = 0x0ff0, value after = 0x0fe8
pop somthing    ; rsp before = 0x0fe8, value after = 0x0ff0
; some more code
pop rsp         ; rsp before = 0x0ff0, value after = unknown!!    
Post 12 Sep 2021, 07:26
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
thank'you all guys

The only comunication with the SO is using a LOADLIB, and GETPROC, then all the call are using this adress get in runtime.

I not have parameters and local var in stack (RSP) then the calls inside the program are only with address, this two thing go in other stack (is a forth)

I need somthing like this
Code:
invoke rax,qword[rbp],qword[rbp-1*8],qword[rbp-2*8],qword[rbp-3*8],qword[rbp-4*8]
    

any recomentation for doing this?

I'm workin in https://github.com/phreda4/r3/tree/main/asm
code.asm is the code
r3fasm.exe is the exe
Post 12 Sep 2021, 12:29
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 480
Location: Russia
macomics
revolution wrote:
You can't use "pop rsp" when the rsp value used to push was a different value than to pop. You get a corrupted stack.
Code:
push rsp        ; rsp before = 0x1000, value after = 0x0ff8
and spl, 0xf0   ; rsp before = 0x0ff8, value after = 0x0ff0
; some code
push something  ; rsp before = 0x0ff0, value after = 0x0fe8
pop somthing    ; rsp before = 0x0fe8, value after = 0x0ff0
; some more code
pop rsp         ; rsp before = 0x0ff0, value after = unknown!!    


Code:
push rsp        ; rsp before = 0x1000, value after = 0x0ff8, [rsp] = 0x1000
push qword [rsp]     ; rsp before = 0x0ff8, value after = 0x0ff0, [rsp] = 0x1000, [rsp+8] too
add rsp, 8                ; rsp before = 0x0ff0, value after = 0x0ff8, [rsp-8] = 0x1000, [rsp] = 0x1000
and spl, 0xf0   ; rsp before = 0x0ff8, value after = 0x0ff0, [rsp] = 0x1000
; some code
push something  ; rsp before = 0x0ff0, value after = 0x0fe8
pop somthing    ; rsp before = 0x0fe8, value after = 0x0ff0
; some more code
pop rsp         ; rsp before = 0x0ff0, value after = 0x1000    
Post 12 Sep 2021, 12:49
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3387
Location: vpcmipstrm
bitRAKE
pabloreda wrote:
I need somthing like this
Code:
invoke rax,qword[rbp],qword[rbp-1*8],qword[rbp-2*8],qword[rbp-3*8],qword[rbp-4*8]    

any recomentation for doing this?
Unfortunately, the parameters are in reverse order - otherwise ENTER instruction can do the parameter copy.

Weird: github is interpreting the ASM file as a single line on my end. Actually, looking at the code - your quote is incorrect - the parameters are in the correct order on the stack.
Code:
invoke rax,qword[rbp-4*8],qword[rbp-3*8],qword[rbp-2*8],qword[rbp-1*8],qword[rbp]    
... is what you want. Doesn't appear to be anything with a large enough number of parameters to make the ENTER trickery worth it.

The ideal solution is not to copy parameters around.
Code:
mov     rcx,[rbp+8*0]
mov     rdx,[rbp+8*1]
mov     r8,[rbp+8*2]
mov     r9,[rbp+8*3]
xchg    rsp,rbp
call    rax
xchg    rsp,rbp
add     rbp,params*8    
... but this would be an architectural shift. RBP currently is moving in the wrong direction, and params would always need to be padded to even count. If params are always <6 then the benefit is null. So, you have the best solution presently. (It's times like this that I wish MS stuck with the regular stack calling convention for 64bit - like it is in 32bit.)

_________________
¯\(°_o)/¯ unlicense.org
Post 12 Sep 2021, 22:37
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 114
Location: Argentina
pabloreda
rbp is not in the same STACK, I have 2 stack, one with RSP, the normal one, for return of call and in other place with rbp like pointer, this is the key part of any Forth lenguage
Post 12 Sep 2021, 23:13
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3387
Location: vpcmipstrm
bitRAKE
Of course, I understand that. Just because it's separate doesn't mean it can't be swapped out for external calls. This is why it should be a downward moving stack - especially for 32-bit - which would only need:
Code:
        xchg esp,ebp
        call eax
        xchg esp,ebp    
... and the C-calling convention would correct the EBP value - which would be put back into EBP before executing any more Forth.

_________________
¯\(°_o)/¯ unlicense.org
Post 13 Sep 2021, 01:28
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.