flat assembler
Message board for the users of flat assembler.

Index > Windows > x64 calling convention oddity

Author
Thread Post new topic Reply to topic
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
Here's one to add to the x64 calling convention...

I spent about 2hrs last night trying to figure out why repeated calls to certain API functions would cause an exception. (about the 3rd or 4th call the same function with the exact same args).

Triple checked the calling convention, and couldn't find anything wrong. Why would an API call fail on the fourth or fifth time, if you were just passing the same parameters to the API Call?

So just out of frustration, I ensured that the slack space ([rsp] -> [rsp+19h]) was zero, once I did that, the calls started working perfectly irrespective of the number of times I called the API function...

Moral of the story, if a call is failing and you are 100% certain that you are calling the function correctly (or have called it multiple times in the past, and fails after the 4th or 5th time), try setting the slack space to be zero.

eg.
Code:
sub rsp, 20h; slack space (adjust if more than 4 args).
xor rax, rax ;; I use rax, since the call trashes it
mov [rsp], rax
mov [rsp+08h], rax
mov [rsp+10h], rax
mov [rsp+18h], rax
call [API_Function]
add rsp, 20h    


All the MSDN documentation I've read doesn't mention this as a requirement (if it is, can someone forward a link)... so have no idea why it's needed sometimes and not others...

My guess is that the called API function is using the slack space for it's own purposes and assumed that that space will contain zero(s). But I could be totally wrong...

PS. ReadFile() is one of the affected API calls.
Post 18 May 2007, 00:09
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
the x64 fast call convention (for windows haven't got into the linux version if theirs much of a difference) is a pain to use and optimize.

Requirements for stack align, empty stack space more than 4 args, it's just not comfortable to use.

There's no problem using the invoke macro for win64 but if your goal is to create optimal code (the invoke for win64 fast call isn't) it's a real hassle.
MOV rax,[RANT]
RET 0

Re: Chewy, if it doesn't happen in the SECOND call why would the stack get corrupted on the THIRD or FOURTH call, what API's did you get this abnormal behavior with?
Post 18 May 2007, 01:41
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Moreover, are you sure that by zeroing [ESP+$18] you fix the problem? Perhaps RAX = 0 does the magic. I'm pretty sure that some part of RAX is used in the Linux ABI to indicate the number of SSE registers used or something like that (sorry, I don't remember it clearly), and perhaps the RAX random values are causing this mess.

[edit]
System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.98 wrote:
For calls that may call functions that use varargs or stdargs (prototype-less
calls or calls to functions containing ellipsis (. . . ) in the declaration) %al is used
as hidden argument to specify the number of SSE registers used. The contents of
%al do not need to match exactly the number of registers, but must be an upper
bound on the number of SSE registers used and is in the range 0–8 inclusive.
[/edit]
Post 18 May 2007, 02:17
View user's profile Send private message Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
The moral of many other stories has been: "If you have a problem and don't understand what is the going wrong, it does no good to report your perceptions of what the problem is because if your perceptions were correct, you would understand the problem and would be able to solve it yourself."
If you want help with the problem, provide a minimal but complete source that allows others to reproduce your problem so that we can try to determine what your error actually was. Description of the problem and a workaround are clearly insufficient here.
Post 18 May 2007, 02:52
View user's profile Send private message Visit poster's website Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
LocoDelAssembly wrote:
Moreover, are you sure that by zeroing [ESP+$18] you fix the problem?

When that's the only difference between a working example and a none working example... what else could it be?

PS. just assembly both and execute. The example displays argc/argv and reads a file called 'README' and displays it's output.

PPS. The abnormal behavior was noticed with the ReadFile() call.


Description: Fixed working version
Download
Filename: win64_working.asm
Filesize: 27.9 KB
Downloaded: 46 Time(s)

Description: Broken exe
Download
Filename: win64_broken.asm
Filesize: 27.63 KB
Downloaded: 51 Time(s)

Post 18 May 2007, 05:48
View user's profile Send private message Visit poster's website Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
The attachments are totally garbled at my end. Am I alone in experiencing difficulty in reading files attached to this forum? The attachments are invisible until I log on to the forum, and when I click on the download link, I get a garbled file. If I right-click and select "save file as", for some reason the file doesn't get saved.
If you want me to look at it, I don't know what your alternatives are. You could attempt to upload it again, or you could PM me, but that would be my first attempt to read a PM! You could derive my email address from first principles; I can't recall if it's in my profile...
Post 18 May 2007, 06:23
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Code:
;Register renaming

r0 equ rax
r0d equ eax
r0w equ ax
r0b equ al
r1 equ rbx
r1d equ ebx
r1w equ bx
r1b equ bl
r2 equ rcx
r2d equ ecx
r2w equ cx
r2b equ cl
r3 equ rdx
r3d equ edx
r3w equ dx
r3b equ dl
r4 equ rdi
r4d equ edi
r4w equ di
r4b equ dil
r5 equ rsi
r5d equ esi
r5w equ si
r5b equ sil
r6 equ rbp
r6d equ ebp
r6w equ bp
r6b equ bpl
r7 equ rsp
r7d equ esp
r7w equ sp
r7b equ spl

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;; NON-WORKING CODE ;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

_B0__fgetc:
        push r1
        push r2
        push r3
        push r4
        push r5
        push r8
        push r9
        push r10
        push r11
        push r12
        push r13
        push r14
        push r15
        mov r2, qword [r6+_B0__fgetc_handle]
        lea r3, [r6+_B0__fgetc_buffer]
        mov r8, 1
        lea r9, [r6+_B0__fgetc_buffer2]
        mov r0, 0
        push r6
        mov r6, r7
        and r7, -16
        sub r7, 30h
        call [ReadFile] 

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;; WORKING CODE ;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

_B0__fgetc:
        push r1
        push r2
        push r3
        push r4
        push r5
        push r8
        push r9
        push r10
        push r11
        push r12
        push r13
        push r14
        push r15
        mov r2, qword [r6+_B0__fgetc_handle]
        lea r3, [r6+_B0__fgetc_buffer]
        mov r8, 1
        lea r9, [r6+_B0__fgetc_buffer2]
        mov r0, 0
        push r6
        mov r6, r7
        and r7, -16
        sub r7, 30h
        mov [r7], r0
        mov [r7+08h], r0
        mov [r7+10h], r0
        mov [r7+18h], r0
        mov [r7+20h], r0
        mov [r7+28h], r0
        call [ReadFile]     


In the non-working code you are calling ReadFile as
Code:
invoke ReadFile, [r6+_B0__fgetc_handle], addr r6+_B0__fgetc_buffer, 1, [r6+_B0__fgetc_buffer2], GARBAGE!!!

    
and in the working code
Code:
invoke ReadFile, [r6+_B0__fgetc_handle], addr r6+_B0__fgetc_buffer, 1, [r6+_B0__fgetc_buffer2], NULL

    
Post 18 May 2007, 16:52
View user's profile Send private message Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
LocoDelAssembly, try making the last parameter null in the non-working code, eg add " mov [rsp+20h], r0" just before the call, and see what happens... Wink

(I must have snipped too much between the two uploads, as I had to recreate the non-working one from the working one). Sad
Post 19 May 2007, 05:38
View user's profile Send private message Visit poster's website Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
Chewy509 wrote:
LocoDelAssembly, try making the last parameter null in the non-working code, eg add " mov [rsp+20h], r0" just before the call, and see what happens... Wink

(I must have snipped too much between the two uploads, as I had to recreate the non-working one from the working one). Sad

Thinking about this last night, that last parameter is a pointer to the structure for overlapped I/O. If the file was opened for synchronise I/O only, why would the contents of that paramter matter anyway? And if it did, why wouldn't the call fail the first time? Only to fail on the 3rd or 4th call?

But since the parameter was set correctly during testing (accidently snipped before upload), I guess that point is null and void? But just something to think about???

A lot doesn't make sense?

PS. Anyone been able to confirm this issue on their own PC?
Post 19 May 2007, 23:41
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:
If hFile is not opened with FILE_FLAG_OVERLAPPED and lpOverlapped is not NULL, the read operation starts at the offset specified in the OVERLAPPED structure.


If you wonder why it doesn't crash the first N times check [RSP] each time to see where the garbage points.
Post 20 May 2007, 00:05
View user's profile Send private message Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
So I finally got around to running your code. Don't know why I should have done so because nobody has ever run my codes -- and they don't even crash!

I made one change to both versions because it seemed to make sense to me:
Code:
; db ((label2-label)/2)-3
;    db ((label2-label)/2)-3
     dw label2-label-3
    

Results:
Code:
test argc, argv application
Argc = 1
Argv = 4198935
Argv[0] = win64_broken

Open File test
File Handle = 0
Reading File Contents:
    

Code:
test argc, argv application
Argc = 1
Argv = 4198935
Argv[0] = win64_working

Open File test
File Handle = 0
Reading File Contents:
    
Post 21 May 2007, 23:57
View user's profile Send private message Visit poster's website Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
Xorpd! wrote:

I made one change to both versions because it seemed to make sense to me:
Code:
; db ((label2-label)/2)-3
;    db ((label2-label)/2)-3
     dw label2-label-3
    



if label2 = 16 and label = 2,
then:
((label2-label)/2)-3 = 4
label2-label-3 = 11
Post 22 May 2007, 03:22
View user's profile Send private message Visit poster's website Reply with quote
Chewy509



Joined: 19 Jun 2003
Posts: 297
Location: Bris-vegas, Australia
Chewy509
After digging through my CVS repository (we all use those don't we?), I think I found that the original code was incorrect, and wouldn't work irrespective if the slack space was zero or not. Either that, or the first call to ReadFile() would trash the stack (due to poor parameter passing on my part), but not enough to cause an exception, but the 3rd or 4th call would trash it a little too much, that an exception would be raised.

Anyway, I've got some working code (and my compiler now runs on a reported 6 OSs - 3 confirmed by me, and 3 as reported by others)! Very Happy

So I guess you can ignore this thread...
Post 22 May 2007, 03:28
View user's profile Send private message Visit poster's website Reply with quote
Xorpd!



Joined: 21 Dec 2006
Posts: 161
Xorpd!
Your length may be correct for UTF16_STRING, but the correction I made was for UTF8_STRING. If you look at the data structures in your executable, the lengths of the UTF8_STRINGs come out negative sometimes. Perhaps that is your intent, but it seems wierd to a reader of your uncommented code.
I expected your original code to have caused the fault. BTW, I tried Loco's correction and your code then worked.
Post 22 May 2007, 03:43
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.