flat assembler
Message board for the users of flat assembler.

Index > Windows > Stack problem with proc64

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 22 Aug 2009, 15:26
Azu wrote:
Sorry, but passing args via registers has nothing to do with the stack needing aligned. And the reserved space I mentioned IS the spill..
I agree, but I also agree with Bogdan Ontanu. I mean, I like to pass by registers if it ends up that way too (custom convention) but seriously fastcall64 is retarded.

EDIT: I meant that I like to pass registers as parameters if I don't need to push them later (to 'save' them), unless I want to optimize for size (in which case I would probably use 'pushad' since it's 1 byte even if it's slower... but this is only in 32-bit).

Otherwise (i.e if I don't use them immediately and have to 'save' them for later) pushing on the stack usually is smaller. Both have advantages, and combining them is useful. However the worst thing to do is to reserve the stack for that and then pass by registers. It is the most retarded thing I've seen.

Azu wrote:
The L1 cache is as fast as registers? Why do we even have registers, then? >_>
Well first of all, registers are encoded specially in instructions (smaller) so they could be faster because of THAT, but not because of the access itself. (of course pushing under 32-bit is even smaller than "mov"!). Secondly, you can't have instructions that move from memory to memory for example, for this reason (encoding). The registers are special because their encodings are special.

And of course they are ALWAYS in a special "L1 cache" (which the stack rarely is not -- if for example, you run a really intensive memory operation that doesn't use the stack at all -- i.e no local variables).

_________________
Previously known as The_Grey_Beast
Post 22 Aug 2009, 15:26
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 Aug 2009, 00:10
Borsuc wrote:
Azu wrote:
Sorry, but passing args via registers has nothing to do with the stack needing aligned. And the reserved space I mentioned IS the spill..
I agree, but I also agree with Bogdan Ontanu. I mean, I like to pass by registers if it ends up that way too (custom convention) but seriously fastcall64 is retarded.

EDIT: I meant that I like to pass registers as parameters if I don't need to push them later (to 'save' them), unless I want to optimize for size (in which case I would probably use 'pushad' since it's 1 byte even if it's slower... but this is only in 32-bit).

Otherwise (i.e if I don't use them immediately and have to 'save' them for later) pushing on the stack usually is smaller. Both have advantages, and combining them is useful. However the worst thing to do is to reserve the stack for that and then pass by registers. It is the most retarded thing I've seen.

Azu wrote:
The L1 cache is as fast as registers? Why do we even have registers, then? >_>
Well first of all, registers are encoded specially in instructions (smaller) so they could be faster because of THAT, but not because of the access itself. (of course pushing under 32-bit is even smaller than "mov"!). Secondly, you can't have instructions that move from memory to memory for example, for this reason (encoding). The registers are special because their encodings are special.

And of course they are ALWAYS in a special "L1 cache" (which the stack rarely is not -- if for example, you run a really intensive memory operation that doesn't use the stack at all -- i.e no local variables).
If there weren't registers, the encodings they used to use could be used for stack positions instead.

E.G.

rax could be dqword[rsp]
eax could be dword[rsp]
ax could be word[rsp]
al could be byte[rsp]
rcx could be dqword[rsp+8]

etc etc etc..



rsp, rip, EFLAGS and RFLAGS might have to be exceptions though.


P.S. it would always be in L1 cache since it would be being used in almost every opcode.
Post 23 Aug 2009, 00:10
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 23 Aug 2009, 00:17
But the stack gets pushed/popped and registers remain the same, so it's still a different area. Yes maybe it is a special "erb" (register pointer) as you described (actually it isn't obviously, but we can pretend it is), and the encodings for it.
Post 23 Aug 2009, 00:17
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 23 Aug 2009, 00:23
Yes, existing code would obviously need changed before it could run on such an architecture.
Post 23 Aug 2009, 00:23
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 23 Aug 2009, 18:22
I meant that, if you want a stack-like usage of registers, probably look at the FPU registers Razz
Post 23 Aug 2009, 18:22
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 24 Aug 2009, 03:18
The FPU way sucks.. you can only hold like 8 or 16 things in it and then it's stack is full.. worst of both worlds.
Post 24 Aug 2009, 03:18
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 24 Aug 2009, 09:43
I converted the simple opengl example 'hello.c' to fasm. I also compiled the 'hello.c' example using Visual studio express (64bit). The file below show dis-assembled code from the fasm and c versions. As you can see, fasm is doing some strange things stack wise. An example,
Code:
   ???  add     rsp, 32                                 ; 00402030 _ 48: 83. C4, 20
   ???  sub     rsp, 32                                 ; 00402034 _ 48: 83. EC, 20
        lea     rcx, [rbp-54H]                          ; 00402038 _ 48: 8D. 4D, AC
        mov     rdx, 0                                  ; 0040203C _ 48: C7. C2, 00000000
        mov     r8, 72                                  ; 00402043 _ 49: C7. C0, 00000048
; Note: Memory operand is misaligned. Performance penalty
        call    qword ptr [imp_memset]                  ; 0040204A _ FF. 15, 0000233C(rel)
   ???  add     rsp, 32                                 ; 00402050 _ 48: 83. C4, 20
   ???  sub     rsp, 32                                 ; 00402054 _ 48: 83. EC, 20
        mov     rcx, 0                                  ; 00402058 _ 48: C7. C1, 00000000
        call    qword ptr [imp_GetModuleHandleA]        ; 0040205F _ FF. 15, 00002093(rel)
    


Also, according to objconv, some code alignment problems are there too.


Description: Disassembly Code
Download
Filename: more fasm problems.rar
Filesize: 27.44 KB
Downloaded: 309 Time(s)

Post 24 Aug 2009, 09:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 24 Aug 2009, 09:47
The current fasm macros do not preallocate the stack at the procedure entry. Each function call will do it's own stack allocation and deallocation. This was previously discussed at the time that 64bit was first being introduced.
Post 24 Aug 2009, 09:47
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 24 Aug 2009, 09:56
revolution wrote:
The current fasm macros do not preallocate the stack at the procedure entry.

They do if you choose the static RSP prologue/epilogue variant. See Customizing the "proc" thread.
Post 24 Aug 2009, 09:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20453
Location: In your JS exploiting you and your system
revolution 24 Aug 2009, 10:09
Tomasz Grysztar wrote:
They do if you choose the static RSP prologue/epilogue variant. See Customizing the "proc" thread.
Oh nice, I missed that thread.

In that case it seems I left out two words from my previous post:

By default the current fasm macros do not preallocate the stack at the procedure entry.
Post 24 Aug 2009, 10:09
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 24 Aug 2009, 11:46
Tomasz Grysztar wrote:
revolution wrote:
The current fasm macros do not preallocate the stack at the procedure entry.

They do if you choose the static RSP prologue/epilogue variant. See Customizing the "proc" thread.


But isn't this still an error? That code looks like do-nothing code, in other cases though it seem's to work as intended.

[EDIT] And another question, is the proc64 macro passing floating point values correctly (single precision?), if you look at the dis-assembly, Visual C/C++ is using xmm0, xmm1, xmm2, etc. registers first, while fasm is using rcx, rdx, etc.

_________________
Gimme a sledge hammer! I'LL FIX IT!
Post 24 Aug 2009, 11:46
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 24 Aug 2009, 14:12
madmatt wrote:
But isn't this still an error? That code looks like do-nothing code, in other cases though it seem's to work as intended.

Do-nothing doesn't do any harm, does it? Why it was chosen for the default macro behavior to allocate stack frame each time separately, is explained in that oldest thread.

madmatt wrote:
And another question, is the proc64 macro passing floating point values correctly (single precision?), if you look at the dis-assembly, Visual C/C++ is using xmm0, xmm1, xmm2, etc. registers first, while fasm is using rcx, rdx, etc.

If you read that other thread carefully, you will notice that "float" prefix should be used in such case. Check out the "WIN64/OPENGL" example that comes with fasmw package, too.
Post 24 Aug 2009, 14:12
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 24 Aug 2009, 14:18
Wait a second, madmatt, it was you who asked this question in the other thread, where I demonstrated a disassembly of a procedure from OpenGL example with static RSP frame enabled.
madmatt wrote:
Not a macro question, but, looking at the dissassembly, are you passing parameters to opengl using xmm registers. I didn't know you could do that. Is this just for 64bit coding?

Or is someone stealing your identity? Wink
Post 24 Aug 2009, 14:18
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 24 Aug 2009, 15:14
Quote:
Do-nothing doesn't do any harm, does it? Why it was chosen for the default macro behavior to allocate stack frame each time separately, is explained in that oldest thread.

No it doesn't, but why allow it, if it's not needed?

Quote:
If you read that other thread carefully, you will notice that "float" prefix should be used in such case. Check out the "WIN64/OPENGL" example that comes with fasmw package, too.

Well, there's the problem right there Smile I didn't read it carefully, ok, I'll do that from now on.

Quote:
Wait a second, madmatt, it was you who asked this question in the other thread, where I demonstrated a disassembly of a procedure from OpenGL example with static RSP frame enabled.
Or is someone stealing your identity?

Nope! that was me then and now. Very Happy When I asked that question my mind was still very much in the 32bit way of doing things. I re-installed win7 64bits and will do much more learning about 64bit asm programming. I compiled the opengl example and it works good, so it must be a problem with my include's or something.

_________________
Gimme a sledge hammer! I'LL FIX IT!
Post 24 Aug 2009, 15:14
View user's profile Send private message Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 24 Aug 2009, 15:59
Ok, I got my opengl example working good now. Smile You pass floats differently in the proc64 macro than with the proc32 macro, 'float dword' for single precision, just 'float' for double precision.
Post 24 Aug 2009, 15:59
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 24 Aug 2009, 16:25
And if you want to get rid of redundant RSP operations, just use "frame" macro.
madmatt wrote:
Ok, I got my opengl example working good now. Smile You pass floats differently in the proc64 macro than with the proc32 macro, 'float dword' for single precision, just 'float' for double precision.
Yes. You can also use 'float qword' for double precision.
This is all going to be documented in the new manual on Win32/Win64 headers, but I have't started it yet.
Post 24 Aug 2009, 16:25
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 24 Aug 2009, 16:38
Tomasz Grysztar wrote:
And if you want to get rid of redundant RSP operations, just use "frame" macro.
madmatt wrote:
Ok, I got my opengl example working good now. Smile You pass floats differently in the proc64 macro than with the proc32 macro, 'float dword' for single precision, just 'float' for double precision.
Yes. You can also use 'float qword' for double precision.
This is all going to be documented in the new manual on Win32/Win64 headers, but I have't started it yet.


All right, sounds good. Thanks for your help. Smile Have you fixed the proc64 macro problem that I first mentioned in this post?

_________________
Gimme a sledge hammer! I'LL FIX IT!
Post 24 Aug 2009, 16:38
View user's profile Send private message Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt 27 Aug 2009, 00:36
It seems to be fixed now. Good work, Tomasz! Very Happy
Post 27 Aug 2009, 00:36
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 27 Aug 2009, 14:33
I have a question. Does the stack need to be aligned on 16 bytes for access, or is it just the stupid fastcall64 convention that needs that?

In other words, will a custom asm convention need that? (why would it???)
just thought I'd ask (since I'm not programming x64 yet).
Post 27 Aug 2009, 14:33
View user's profile Send private message Reply with quote
MazeGen



Joined: 06 Oct 2003
Posts: 977
Location: Czechoslovakia
MazeGen 27 Aug 2009, 15:19
Jeremy explains it well I think:
The stack pointer (RSP) must be 16-byte aligned when making a call to an API. With some APIs this does not matter, but with other APIs wrong stack alignment will cause an exception. Some APIs will handle the exception themselves and align the stack as required (this will, however, cause performance to suffer). Other APIs (at least on early builds of x64) cannot handle the exception and unless you are running the application under debug control, it will exit.
Post 27 Aug 2009, 15:19
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.