flat assembler
Message board for the users of flat assembler.

Index > Windows > Stack problem with proc64

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt
The proc64 macro doesn't seem to align the stack properly when I make a function that uses no local variables. When I try to run it, the program crashes. When I add a local 'qalign:QWORD', everything works like it should.

_________________
Gimme a sledge hammer! I'LL FIX IT!
Post 20 Aug 2009, 15:02
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Is the stack alignment requirement even described in fastcall64? I have already encountered case where it was required but not described (probably because everyone uses C, and that keeps it automatically)
Post 20 Aug 2009, 15:03
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7802
Location: Kraków, Poland
Tomasz Grysztar
Does your function take no parameters, too? Because I found a bug in the new prologue macro related to such case. Find this line:
Code:
if parmbytes ¦ localbytes    
and either remove this line, or attach "¦ ~fill" to the condition, but in the latter case you are going to have to update the epilogue macro as well. I will upload the update as soon as I can, but this may not be the case for the following few days.
Post 20 Aug 2009, 16:04
View user's profile Send private message Visit poster's website Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt
Tomasz: Yep, it takes no parameters and no locals. I'm in no hurry, so do what you gotta do first.
Vid (Tomasz?): Here is a webpage I found that explains how the 64bit stack is used and aligned: http://ntcore.com/Files/vista_x64.htm#x64_Assembly. The information there seems to be good.
Post 20 Aug 2009, 18:49
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
wow from what I read from that article, the x64 fastcall is really retarded. No pushes anymore? What has the stack become, global variables? Confused

Not to mention that cache efficiency is reduced.

_________________
Previously known as The_Grey_Beast
Post 20 Aug 2009, 21:18
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Why would the stack need aligned? Does the windows API access the stack using movdqa?? I don't get it.
Post 21 Aug 2009, 02:00
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17715
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Why would the stack need aligned?
Because MS say so
Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are aligned to 16 byte, in order to aid performance.
Post 21 Aug 2009, 02:07
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
Borsuc: Actually, I think the new way is much smarter than stdcall, considering that 99.99% of code is generated by compiler. And of course, good asm coder can do it same way. Try disassembling some fastcall64 code, maybe you will change opinion
Post 21 Aug 2009, 10:31
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
vid wrote:
Borsuc: Actually, I think the new way is much smarter than stdcall, considering that 99.99% of code is generated by compiler. And of course, good asm coder can do it same way. Try disassembling some fastcall64 code, maybe you will change opinion


I on the other side consider that this fastcall64 is stupid, wrong and pathetic.

An expression of people that do not know or understand ASM and CPU anymore and apply childish academic concepts. A copy paste from RISC wrong concepts.

This was bound to happen sooner or later because new people have lost the knowledge.

This is in fact a revert to dark ages of programing.

The funny part it that someday in "the future" they will re-invent STDCALL ... if they ever evolve back towards intelligence that is ... Smile)

Until then (if ever) of course we do have to use it as it is (with help from macros probably). There is no purpose in fighting against what is.

_________________
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius -- and a lot of courage --
to move in the opposite direction."


Last edited by bogdanontanu on 21 Aug 2009, 14:57; edited 1 time in total
Post 21 Aug 2009, 14:53
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
bogdanontanu wrote:
vid wrote:
Borsuc: Actually, I think the new way is much smarter than stdcall, considering that 99.99% of code is generated by compiler. And of course, good asm coder can do it same way. Try disassembling some fastcall64 code, maybe you will change opinion


I on the other side consider that this fastcall64 is stupid, wrong and pathetic.

An expression of people that do not know or understand ASM and CPU anymore and apply childish academic concepts. A copy paste from RISC wrong concepts.

This was bound to happen sooner or later because new people have lost the knowledge.

This is in fact a revert to dark ages of programing.

The funny part it that someday in "the future" they will re-invent STDCALL ... if they ever evolve back towards intelligence that is ... Smile)
What's wrong with using registers intead of stack? Most 64-bit CPUs can move things into and out of the registers much faster than stack..

My only complaint is that their choice of registers sucks; why isn't RAX one of them? :/
And don't say "because they return stuff in it so it gets trashed", RCX/RDX/R8/R9 get trashed too!

bogdanontanu wrote:
Until then (if ever) of course we do have to use it as it is (with help from macros probably). There is no purpose in fighting against what is.
Well, you could always just patch their functions to use your own convention instead. Or use the native syscall API which is stdcall still (just be sure to check what version of windows at startup-time via TIB and set the ordinals correctly, or it won't work on other versions of windows).


Last edited by Azu on 21 Aug 2009, 15:06; edited 1 time in total
Post 21 Aug 2009, 14:56
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
First of all, you can see the problem with that last statement of yours.
But really, pushing is much easier and elegant, and even for size optimization it's better, or at least it was in 32-bits, but the CPU designers may listen to "the crowd" unfortunately (and disabled much of pushing functionality).

NOTE: I'm NOT against using registers as parameters AT ALL. I'm against the stupidity of reserving STATIC stack space even for the registers. I mean I just dropped my jaw when I read it!


@vid: I dunno & don't care, I'm programming software, not cracking Wink
Post 21 Aug 2009, 15:05
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Borsuc wrote:
First of all, you can see the problem with that last statement of yours.
Whoops. Sorry about that, forgot to include the solution. It's there now.


Borsuc wrote:
NOTE: I'm NOT against using registers as parameters AT ALL. I'm against the stupidity of reserving STATIC stack space even for the registers. I mean I just dropped my jaw when I read it!
I think it's because the HLL used to compile windows sucks and doesn't have any macro for local variables.


Borsuc wrote:

@vid: I dunno & don't care, I'm programming software, not cracking Wink
How can you program using the windows API without disassembling it a little first? The MSDN documentation is shit!


Last edited by Azu on 21 Aug 2009, 15:19; edited 1 time in total
Post 21 Aug 2009, 15:08
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Azu wrote:
How can you program using the windows API without disassembling it a little first? The MSDN documentation is shit!
Wow I always followed it when programming in Windows, I thought system APIs are kinda complex for user-mode apps understanding to be honest, so that's why I never bothered. Razz

Plus, the documentation says how you should write even if your version of Windows doesn't have all functionality, like the crappy but so frequently found Microsoft "reserved" parameters -- ironically which almost never get used (but crash if you use them).

_________________
Previously known as The_Grey_Beast
Post 21 Aug 2009, 15:18
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
Borsuc wrote:
Azu wrote:
How can you program using the windows API without disassembling it a little first? The MSDN documentation is shit!
Wow I always followed it when programming in Windows, I thought system APIs are kinda complex for user-mode apps understanding to be honest, so that's why I never bothered. Razz

Plus, the documentation says how you should write even if your version of Windows doesn't have all functionality, like the crappy but so frequently found Microsoft "reserved" parameters -- ironically which almost never get used (but crash if you use them).
If you disassembled first, you'd be able to see what those parameters do, and do stuff with them.. ^^
Post 21 Aug 2009, 15:20
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
Azu wrote:
What's wrong with using registers intead of stack? Most 64-bit CPUs can move things into and out of the registers much faster than stack..


If you have experience in programming you could not ask such a stupid question! It is nothing wrong with using registers. It is wrong to use registers for passing parameters.

Here are a few answers so you can meditate upon:

1) Sending parameters by registers is primitive
===================================
A method from DOS era with 2-3 parameters. You have to use the stack anyway. Later on STDCALL was discovered out of experience.

2) It is faster but...
=================
Only for leaf functions and in heavy optimized functions. For general functions that call other functions that call other functions it is a huge mistake. And guess what this kind of functions are 90% of OS and applications body.

For your example you do need to "spill" those registers anyway in most scenarios beacuse you usually NEED arguments further in your code and you can not preserve (and thus loose) 4 registers.

The argument that you already have many registers and you can waste some of them is never valid. No matter how many registers you have you always need more Wink

3) You need the stack anyway
=======================
For more than 4 parameters an for other reasons.

Stack is in CPU L1 cache always after a function call and hence the speed improvement is not big but the problems created by using registers are big... as you can see from disassembly Wink

4) You need the stack for recursion.
==========================
Recursion is an fundamental programming concept. You can not do it with registers.

5) Writing to stack is NOT faster than PUSH.
================================
Or it should not be. Of course you can make a CPU where this is as you wish it but conceptually the PUSH is always faster... mov [esp+48h],xxx and mov [esp+40h],yyy is not faster to be computed than an implicit and fixed mov [esp],xxx and add esp,8... This is basic CPU logic.

6)Stdcall is uniform and can be unwinded easy.
====================================
Uniformity and ease of understanding and debugging is very important. It also scales nicely to any number of parameters.

Have you noticed the .pdata structures and the fixation of valid prologues and epilogues that are needed now in order to unwind stack and to provide exception handling?

This is the result of stupidity. They have to patch holes up in very complex and pathetic ways.

Think Occam's razor: when a solution is complex then it is wrong. There is no beauty in complexity and no simplicity inside complexity

Complex = stupid but loved by mind device.

7) It is NOT a new improved concept.
=============================
In fact it is very old MS-DOS like idea. Works for primitive CPU's and for primitive RISC CPU's.

They did copy cat it from what RISC's already have had because they are unable to understand and to create something new but wanted to "change something" Very Happy

8 ) It results in larger code
======================
And more complex code with no speed increase what so ever.

Fixings the stack after the function is already stupid because it adds code after each function call.

I agree that this is rarely needed for C calling and unknown number of parameters like in Sprintf ( ...) BUT to do this for every function is stupid.

AND to require to also align the stack BEFORE the functions is double stupidity that results in a lot of not needed code.

On inner loops an leaf functions ASM programmers did a lot of tricks and handcrafted optimizations anyway. You do not use STDCALL there. Instead you use handcrafted code. But to "handcraft" every function is priceless...


It is stupid to do a set of dummy non algorithmically tricks on every common layer of non critical functions making code bloateware with no gain in speed but with loose of simplicity because "the compiler can do it". The compiler can do a lot of stupid things if you put it to do it...



In conclusion:
===========
Oh dear... human race is going down Wink

Basically it is stupid and primitive BUT we do have to use it as it is... there is no purpose to cry about it.

Just do not fall for believing that it is a good or advanced idea. Think with your own mind instead.

"New" is not necessarily good.... and it is not even new.

It was chosen and "designed" by people with no knowledge or with bad intentions and desires to extract some minimal speed advantage out of instruction level tricks.

A mistake with damaging results because they have no independent brain and copy paster some "well established" RISC myths and have "heard" that using registers is "faster"....

Yes using registers is faster but NOT for this!


I have noticed that this concept is loved by people that want to use ASM only for a few tiny but complex and super optimized functions and promote HLL for the 99.99% rest of code because they think that this complexity will be too much for writing large applications in ASM while compilers do it for free.

I have seen "advanced" ASM people stating that ASM compilers should not do this because it is too complex and we should leave it to the C compilers or do it manually... ha ha ha

My guess it that with the help of a few macro's ASM will fix this "problem" and make fastcall64 as easy in ASM as it is in C or other HLL.

The difference being that if i want to I can use STDCALL in most of my code and use the stupid FASTCALL64 ONLY where it is needed as an interface with the OS.

_________________
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius -- and a lot of courage --
to move in the opposite direction."


Last edited by bogdanontanu on 21 Aug 2009, 15:53; edited 1 time in total
Post 21 Aug 2009, 15:28
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
bogdanontanu wrote:
Azu wrote:
What's wrong with using registers intead of stack? Most 64-bit CPUs can move things into and out of the registers much faster than stack..


If you have experience in programming you could not ask such a stupid question!

Here are a few answers so you can meditate upon:

1) Sending parameters by registers is a primitive method from DOS era
So is the use of pointers, and strings, and integers. All of those are very old things that were around way back in DOS. But you use them, don't you? So you are a hypocrite.
bogdanontanu wrote:
2) It is faster but only for leaf functions and in heavy optimized functions. For general functions that call other functions that call other functions it is a huge mistake. And guess what this kind of functions are 90% of OS and applications body.
Outside of heavy optimization, in functions that call many other functions (which is a design flaw most of the time), the performance of needing to save those args isn't going to matter either way.

bogdanontanu wrote:
For your example you do need to "spill" those registers anyway in most scenarios beacuse you usually NEED arguments further in your code and you can not preserve (and thus loose) 4 registers.

The argument that you already have many registers and you can waste some of them is never valid. No matter how many registers you have you always need more Wink
Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though.

bogdanontanu wrote:
3)You need the stack anyway for more than 4 parameters

4) You need the stack for recursion.

Recursion is an fundamental programming concept. You can not do it with registers.
See above..

bogdanontanu wrote:
5) Writing to stack is NOT faster than PUSH. or it should not be. Of course you can make a CPU where this is as you wish it but conceptually the PUSH is always faster... mov [esp+48h],xxx and mov [esp+40h],yyy is not faster to be computed than an implicit and fixed mov [esp],xxx and add esp,8... This is basic CPU logic.
Actually, the fastcall way is to move them into the registers themselves, not to use the register as a pointer to a memory location and move the values there.

bogdanontanu wrote:
6)Stdcall is uniform and can be unwinded easy. Uniformity and ease of understanding and debugging is very important. It also scales nicely to any number of parameters.
As does fastcall. o_o

bogdanontanu wrote:
have you noticed the .pdata structures and the fixation of valid prologues and epilogues that are needed now in order to unwind stack and to provide exception handling?
I don't know. I use a flat programming model (no .rdata, .pdata, .reloc, .resource, etc, just one section).

bogdanontanu wrote:
This is the result of stupidity. They have to patch holes up in very complex and pathetic ways.
Their own fault for implementing it in a crappy way >_>

bogdanontanu wrote:
Think Occam's razor: when a solution is complex then it is wrong. There is no beauty in complexity and no simplicity inside complexity

Complex = stupid.
x86-64 is not for you, then. It's a very CISCy (complex) instruction set overall.

bogdanontanu wrote:
7) It is NOT a new improved concept.
Bug testing isn't a new improved concept. Do you not bug test?

bogdanontanu wrote:
Cool It results in larger code and more complex code with no speed increase what so ever.
Weird. What processor did you time this on? On the E8400, it is much faster to move a bunch of values into some registers than it is to push them onto the stack. And they don't need popped off (since they are in registers already), so less code is needed.

bogdanontanu wrote:
On inner loops an leaf functions ASM programmers did a lot of tricks and handcrafted optimizations anyway. You do not use STDCALL there.
I thought we were talking about function calls. It would definitely be bad to put pushes and pops inside a tight inner loop unless absolutely necessary.


I'm really sleepy sorry, I'll reply to the rest of your post tomorrow. Unless somebody else already has.
Post 21 Aug 2009, 15:50
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
What I do hate very much is how varargs were implemented:
Varargs wrote:
If parameters are passed via varargs (for example, ellipsis arguments), then essentially the normal parameter passing applies including spilling the fifth and subsequent arguments. It is again the callee's responsibility to dump arguments that have their address taken. For floating-point values only, both the integer and the floating-point register will contain the float value in case the callee expects the value in the integer registers.
I think they should have used a stack-only approach for them. It is also interesting that you are required to pass the floating point values twice... (if them are between the firsts four parameters)
Post 21 Aug 2009, 16:36
View user's profile Send private message Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
Azu wrote:
So is the use of pointers, and strings, and integers. All of those are very old things that were around way back in DOS. But you use them, don't you?


I am not against OLD things if they are correct.

I am against using primitive things. Primitive= suboptimal solutions used only because at the time you can not or can not understand how to do better.

Like using a wooden blade instead of an steel blade for a weapon.

Pointers are not primitive.

They are an fundamental element of programming. You can hide them in HLL and pretend that you do not use them but they are there behind the scenes.

However sending arguments to a function via registers is primitive.

Quote:

So you are a hypocrite.


Thank you for your words. I am in fact much more that that.
Do you have anything else to say to me?

Quote:

Outside of heavy optimization, in functions that call many other functions (which is a design flaw most of the time), the performance of needing to save those args isn't going to matter either way.


Organizing and wrapping code correctly in hierarchical layers is NOT a design flaw. It is needed for large and real life projects.

It helps browsing, managing, understanding, debugging, porting and expanding the code base.

It is an intelligent solution. That is why FASM can run on multiple OS targets because it has such a minimal layer at least in some of its interfaces.

That is why my own assembler Sol_Asm can run on multiple OS targets and it is so easy for me to debug and maintain it as a huge ASM project.

Quote:

Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though.


Priceless Wink

Quote:

Actually, the fastcall way is to move them into the registers themselves, not to use the register as a pointer to a memory location and move the values there.


I never said that somebody is using a register as a pointer. I know how win64 fastcall works.

You move the values to registers but only for the first 4 parameters the rest are moved to stack ... and it uses up 4 registers that you will have to move to memory sooner or later.

For the rest of the parameters (and many functions have more than 4) you will have to move them to the memory / stack somehow. The PUSH is much harder to be done in this context (stack alignment issues and stack reuse) and compilers usually end up moving them as I have shown : mov [esp + 40h],argument[n]

This puts a problem if argument[n] is in fact a memory location because you can not move memory to memory ... unlike push where you can do: push [memory]

The fact that you do not understand this kind of things (the need for spill also) makes me believe that you have not studied this seriously and made up your mind from "hear say"

Quote:
As does fastcall. o_o


The essence of it is sending parameters via registers. This DOES NOT scale nicely because registers are a limited resource. They have to use a surrogate of STDCALL for more than 4 parameters anyway...

And about "unwinding" I think you have to study more again ....

Quote:
I don't know. I use a flat programming model (no .rdata, .pdata, .reloc, .resource, etc, just one section).


As I have said above you need to study more. Those sections serve a purpose... ".pdata" is relevant in this unwinding and exception handling discussion for win64 calling convention. It can not be ignored for long.

Quote:

Their own fault for implementing it in a crappy way >_>


Maybe... but this is the logical result of sending parameters in registers.... then adding rest on stack, then trying to align the stack before the call to have parameters aligned ...

In such a way one decision leads to another mistake and another mistake and then "shit happens" because your are trying to optimize at the expense of simplicity.

Quote:
x86-64 is not for you, then. It's a very CISCy (complex) instruction set overall.


I think they made some mistakes with x86-64 design but not many under the circumstances.

When I will design my own CPU I will make it more simple inside but still x-86 like IMHO. RISC CPU is a huge error but more easy to produce than CISC.

I do like x86 architecture hence it is for me. But I will not consider it's mistakes to be good things just because I like and use it. I am not lying myself.

I think the term CISC is designed by RISC zealots to be an insult for x-86 architecture.

I guess that x-86 is complex inside when compared with a RISC ... BUT it is SIMPLE to use outside from an ASM programmer's point of view and this is what I need.

I would have been better it this external simplicity of x86 architecture was obtained with more simplicity inside but you have to understand that simplicity dos not have to be implemented at all costs.


This is the only limitation to simplicity: not simpler than what is needed to get the job done. A RISC CPU is sub-optimal.


Anyway what we speak here is a software calling convention choice not a hardware choice. There is nothing in the x-86 hardware to requires or advocates such a calling convention as win64 fastcall.

It is a choice of some humans that made a mistake. A mistake we have to live with for a very long time but we do not have to "make believe" it is a good thing.

Such is life.

Quote:
Bug testing isn't a new improved concept. Do you not bug test?


I do unit testing, functional testing, fuzzy testing... almost every day at work as a programmer.

I am not against "old" I am against presenting "old" as "new" and "bad" as "better" when in fact they are not.

I also do not change things when there is no need to. When ther is no clear improvement. When new things are more complex and provide only doubtful benefits.

That is what I say: this change was not needed, it provides doubtful benefits, it is complex for no reasons other than to cover up conceptual mistakes and it is a copy cat from other architecture and older concepts.

Quote:

bogdanontanu wrote:
Cool It results in larger code and more complex code with no speed increase what so ever.
Weird. What processor did you time this on?


Intel Core 2 Duo at 2Ghz.

Quote:

On the E8400, it is much faster to move a bunch of values into some registers than it is to push them onto the stack.


Of course it is. But this in NOT what I say. I say that to use registers for ARGUMENTS in the context of win64 fastcall convetion results in larger code, more complex code with no speed increase overall.

Why?
a) Because you will most likely spill the arguments.
b) Because you need to move on the stack the "other" arguments by not using PUSH and this is slower and it generates more code with memory arguments and it is bigger overall.
c) because you need to align the stack additionally

Quote:

And they don't need popped off (since they are in registers already), so less code is needed.


Who pops arguments?

But you do need registers for your inner algorithm and loosing 4 is not acceptable. One advantage of x86-64 is having more registers. I do not like being robbed of 4 of them.

You will need to save them if you call other functions inside your function because the new function will use the very same registers...

You do not have to save arguments that are on stack... because they are already "saved" and still always ready and fresh for you to use... STDCALL is simply better.

Quote:

I thought we were talking about function calls. It would definitely be bad to put pushes and pops inside a tight inner loop unless absolutely necessary.


Yes of course. What I was trying to present is that nobody uses STDCALL inside heavy optimized inner loops. There you CAN and you SHOULD use registers at maximum...

But to extrapolate this for general arguments passing is stupid.

Quote:

I'm really sleepy sorry, I'll reply to the rest of your post tomorrow. Unless somebody else already has.


Have a nice sleep.

It takes time and experience to understand those kind of things don not worry too much about them. Sleep is of the essence.


I think I have presented my available logical arguments and debating this further is not going to offer more logical arguments from my side.... hence I will rest my case and not debate it anymore.

I guess it is ok to "love" and "believe" in win64 fast call convention.

Just keep an open mind and an eye open for the alternatives and possible mistakes.

_________________
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius -- and a lot of courage --
to move in the opposite direction."
Post 21 Aug 2009, 16:58
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Azu, the stack isn't slow at all because it's mostly in L1 cache so it is probably the same as registers.

Azu wrote:
Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though.
This is the biggest stupidity. What sense does it make to take registers are parameters only for the function to save them later on the stack? Why not just PUSH the values DIRECTLY on the stack? It avoids the "mov register, value".

For example:
Code:
Code 1: pushes on stack

push [eax]
call SomeFunction

SomeFunction:
...
; do something
...

;finally use the pushed value
add ebx, [esp+4]  ; assuming no other local variables

...
; do some more
...
ret 4    
Compared with:
Code:
mov eax, [eax]
call SomeFunction

SomeFunction:
mov [esp+4], eax

...
; do something
...

;finally use the pushed value
add ebx, [esp+4]  ; assuming no other local variables

...
; do some more
...
ret    
I count one more "mov eax, [eax]" than before (assuming "mov [esp+4], eax" is the same as "push [eax]" which IT IS NOT since it's BIGGER).

this is why fastcall64 is a stupidity. total FAIL.

_________________
Previously known as The_Grey_Beast
Post 21 Aug 2009, 17:55
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu
bogdanontanu wrote:
Azu wrote:
So is the use of pointers, and strings, and integers. All of those are very old things that were around way back in DOS. But you use them, don't you?


I am not against OLD things if they are correct.

I am against using primitive things. Primitive= suboptimal solutions used only because at the time you can not or can not understand how to do better.

Like using a wooden blade instead of an steel blade for a weapon.

Pointers are not primitive.

They are an fundamental element of programming. You can hide them in HLL and pretend that you do not use them but they are there behind the scenes.

However sending arguments to a function via registers is primitive.
Really? My idea of an ideal HLL would be one that optimizes everything so that the arguments always end up in the right registers when a function is called, without wasting time moving them from one another, or to the stack.

bogdanontanu wrote:
Quote:

So you are a hypocrite.


I am in fact much more that that.
I wanted to keep it PG, so I left the other things out. Wink

bogdanontanu wrote:
Quote:

Outside of heavy optimization, in functions that call many other functions (which is a design flaw most of the time), the performance of needing to save those args isn't going to matter either way.


Organizing and wrapping code correctly in hierarchical layers is NOT a design flaw. It is needed for large and real life projects.

It helps browsing, managing, understanding, debugging, porting and expanding the code base.

It is an intelligent solution. That is why FASM can run on multiple OS targets because it has such a minimal layer at least in some of its interfaces.

That is why my own assembler Sol_Asm can run on multiple OS targets and it is so easy for me to debug and maintain it as a huge ASM project.
Obviously you should only use fastcall for libraries that support it. If the API on the OS you're coding for is something else, like stdcall, or cdecl, then use that. Note that when you're using an OS API, portability is already gone, since that API does not exist in other OSs.

bogdanontanu wrote:
Quote:

Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though.


Priceless Wink
Yay.

bogdanontanu wrote:
Quote:

Actually, the fastcall way is to move them into the registers themselves, not to use the register as a pointer to a memory location and move the values there.


I never said that somebody is using a register as a pointer. I know how win64 fastcall works.
But you were saying stuff like mov [reg+offset],value.. that is using it as a pointer..

bogdanontanu wrote:
You move the values to registers but only for the first 4 parameters the rest are moved to stack ... and it uses up 4 registers that you will have to move to memory sooner or later.

For the rest of the parameters (and many functions have more than 4) you will have to move them to the memory / stack somehow. The PUSH is much harder to be done in this context (stack alignment issues and stack reuse) and compilers usually end up moving them as I have shown : mov [esp + 40h],argument[n]

This puts a problem if argument[n] is in fact a memory location because you can not move memory to memory ... unlike push where you can do: push [memory]

The fact that you do not understand this kind of things (the need for spill also) makes me believe that you have not studied this seriously and made up your mind from "hear say"
Sorry, but passing args via registers has nothing to do with the stack needing aligned. And the reserved space I mentioned IS the spill..

bogdanontanu wrote:
Quote:
As does fastcall. o_o


The essence of it is sending parameters via registers. This DOES NOT scale nicely because registers are a limited resource. They have to use a surrogate of STDCALL for more than 4 parameters anyway...

And about "unwinding" I think you have to study more again ....
You should take a closer look at the fastcall standard. Like I already said a few times, you have to reserve some extra space on the stack, and the called process will back the registers up in that when/if necessary.

bogdanontanu wrote:
Quote:
I don't know. I use a flat programming model (no .rdata, .pdata, .reloc, .resource, etc, just one section).


As I have said above you need to study more. Those sections serve a purpose... ".pdata" is relevant in this unwinding and exception handling discussion for win64 calling convention. It can not be ignored for long.
I'm sure that whatever goes it in could also go in a single .flat section.

bogdanontanu wrote:
Quote:

Their own fault for implementing it in a crappy way >_>


Maybe... but this is the logical result of sending parameters in registers.... then adding rest on stack, then trying to align the stack before the call to have parameters aligned ...

In such a way one decision leads to another mistake and another mistake and then "shit happens" because your are trying to optimize at the expense of simplicity.
Microsoft's fault, not something inseparable from the concept of passing via register..

bogdanontanu wrote:
Quote:
x86-64 is not for you, then. It's a very CISCy (complex) instruction set overall.


I think they made some mistakes with x86-64 design but not many under the circumstances.

When I will design my own CPU I will make it more simple inside but still x-86 like IMHO. RISC CPU is a huge error but more easy to produce than CISC.

I do like x86 architecture hence it is for me. But I will not consider it's mistakes to be good things just because I like and use it. I am not lying myself.

I think the term CISC is designed by RISC zealots to be an insult for x-86 architecture.

I guess that x-86 is complex inside when compared with a RISC ... BUT it is SIMPLE to use outside from an ASM programmer's point of view and this is what I need.

I would have been better it this external simplicity of x86 architecture was obtained with more simplicity inside but you have to understand that simplicity dos not have to be implemented at all costs.


This is the only limitation to simplicity: not simpler than what is needed to get the job done. A RISC CPU is sub-optimal.


Anyway what we speak here is a software calling convention choice not a hardware choice. There is nothing in the x-86 hardware to requires or advocates such a calling convention as win64 fastcall.

It is a choice of some humans that made a mistake. A mistake we have to live with for a very long time but we do not have to "make believe" it is a good thing.

Such is life.
Actually it is simple inside. Most of the modern x86/x64 CPUs actually operate on uops which are pretty much RISC. The instruction set you use runs on top of this and is CISC.

bogdanontanu wrote:
Quote:
Bug testing isn't a new improved concept. Do you not bug test?


I do unit testing, functional testing, fuzzy testing... almost every day at work as a programmer.

I am not against "old" I am against presenting "old" as "new" and "bad" as "better" when in fact they are not.

I also do not change things when there is no need to. When ther is no clear improvement. When new things are more complex and provide only doubtful benefits.

That is what I say: this change was not needed, it provides doubtful benefits, it is complex for no reasons other than to cover up conceptual mistakes and it is a copy cat from other architecture and older concepts.
You see no benefit in only executing "mov reg,arg" rather then "push arg" "pop reg"?

bogdanontanu wrote:
Quote:

bogdanontanu wrote:
Cool It results in larger code and more complex code with no speed increase what so ever.
Weird. What processor did you time this on?


Intel Core 2 Duo at 2Ghz.

Quote:

On the E8400, it is much faster to move a bunch of values into some registers than it is to push them onto the stack.


Of course it is. But this in NOT what I say. I say that to use registers for ARGUMENTS in the context of win64 fastcall convetion results in larger code, more complex code with no speed increase overall.

Why?
a) Because you will most likely spill the arguments.
b) Because you need to move on the stack the "other" arguments by not using PUSH and this is slower and it generates more code with memory arguments and it is bigger overall.
c) because you need to align the stack additionally
I really doubt that windows being crappy and requiring the stack to be aligned, and HLLs being crappy and using mov[esp+offset],arg for pushing and spilling, are essential for passing values in registers.

bogdanontanu wrote:
Quote:

And they don't need popped off (since they are in registers already), so less code is needed.


Who pops arguments?
Oh, that's right. You're guilty of your own critque again. Your beloved stdcall convention doesn't use the fast pop, but rather the slower "mov reg,[esp+offset]".. hehe.

bogdanontanu wrote:
Quote:

I thought we were talking about function calls. It would definitely be bad to put pushes and pops inside a tight inner loop unless absolutely necessary.


Yes of course. What I was trying to present is that nobody uses STDCALL inside heavy optimized inner loops. There you CAN and you SHOULD use registers at maximum...
Yay. We agree.

bogdanontanu wrote:

Quote:

I'm really sleepy sorry, I'll reply to the rest of your post tomorrow. Unless somebody else already has.


Have a nice sleep.

It takes time and experience to understand those kind of things don not worry too much about them. Sleep is of the essence.


I think I have presented my available logical arguments and debating this further is not going to offer more logical arguments from my side.... hence I will rest my case and not debate it anymore.

I guess it is ok to "love" and "believe" in win64 fast call convention.

Just keep an open mind and an eye open for the alternatives and possible mistakes.
I hate the way it is implemented, but I really do think there are benefits to passing in registers. Just not for nested functions.



Borsuc wrote:
Azu, the stack isn't slow at all because it's mostly in L1 cache so it is probably the same as registers.
The L1 cache is as fast as registers? Why do we even have registers, then? >_>

Borsuc wrote:
Azu wrote:
Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though.
This is the biggest stupidity. What sense does it make to take registers are parameters only for the function to save them later on the stack? Why not just PUSH the values DIRECTLY on the stack? It avoids the "mov register, value".

For example:
Code:
Code 1: pushes on stack

push [eax]
call SomeFunction

SomeFunction:
...
; do something
...

;finally use the pushed value
add ebx, [esp+4]  ; assuming no other local variables

...
; do some more
...
ret 4    
Compared with:
Code:
mov eax, [eax]
call SomeFunction

SomeFunction:
mov [esp+4], eax

...
; do something
...

;finally use the pushed value
add ebx, [esp+4]  ; assuming no other local variables

...
; do some more
...
ret    
I count one more "mov eax, [eax]" than before (assuming "mov [esp+4], eax" is the same as "push [eax]" which IT IS NOT since it's BIGGER).

this is why fastcall64 is a stupidity. total FAIL.
I think the point is for functions that don't call other functions, in which case they don't need this spilling stuff.
Also I think most compilers optimize it so that not even a mov reg,arg is moved (i.e. arrange it so the argument ends up being in the right register to begin with).. if they don't, they suck.




LocoDelAssembly wrote:
What I do hate very much is how varargs were implemented:
Varargs wrote:
If parameters are passed via varargs (for example, ellipsis arguments), then essentially the normal parameter passing applies including spilling the fifth and subsequent arguments. It is again the callee's responsibility to dump arguments that have their address taken. For floating-point values only, both the integer and the floating-point register will contain the float value in case the callee expects the value in the integer registers.
I think they should have used a stack-only approach for them. It is also interesting that you are required to pass the floating point values twice... (if them are between the firsts four parameters)
Definitely. Varargs have no place in registers. Except maybe a counter that says how many varargs there are.
Post 22 Aug 2009, 03:28
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.