flat assembler
Message board for the users of flat assembler.
Index
> Windows > Stack problem with proc64 Goto page 1, 2, 3 Next |
Author |
|
vid 20 Aug 2009, 15:03
Is the stack alignment requirement even described in fastcall64? I have already encountered case where it was required but not described (probably because everyone uses C, and that keeps it automatically)
|
|||
20 Aug 2009, 15:03 |
|
Tomasz Grysztar 20 Aug 2009, 16:04
Does your function take no parameters, too? Because I found a bug in the new prologue macro related to such case. Find this line:
Code: if parmbytes ¦ localbytes |
|||
20 Aug 2009, 16:04 |
|
madmatt 20 Aug 2009, 18:49
Tomasz: Yep, it takes no parameters and no locals. I'm in no hurry, so do what you gotta do first.
Vid (Tomasz?): Here is a webpage I found that explains how the 64bit stack is used and aligned: http://ntcore.com/Files/vista_x64.htm#x64_Assembly. The information there seems to be good. |
|||
20 Aug 2009, 18:49 |
|
Borsuc 20 Aug 2009, 21:18
wow from what I read from that article, the x64 fastcall is really retarded. No pushes anymore? What has the stack become, global variables?
Not to mention that cache efficiency is reduced. _________________ Previously known as The_Grey_Beast |
|||
20 Aug 2009, 21:18 |
|
Azu 21 Aug 2009, 02:00
Why would the stack need aligned? Does the windows API access the stack using movdqa?? I don't get it.
|
|||
21 Aug 2009, 02:00 |
|
revolution 21 Aug 2009, 02:07
Azu wrote: Why would the stack need aligned? Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are aligned to 16 byte, in order to aid performance. |
|||
21 Aug 2009, 02:07 |
|
vid 21 Aug 2009, 10:31
Borsuc: Actually, I think the new way is much smarter than stdcall, considering that 99.99% of code is generated by compiler. And of course, good asm coder can do it same way. Try disassembling some fastcall64 code, maybe you will change opinion
|
|||
21 Aug 2009, 10:31 |
|
bogdanontanu 21 Aug 2009, 14:53
vid wrote: Borsuc: Actually, I think the new way is much smarter than stdcall, considering that 99.99% of code is generated by compiler. And of course, good asm coder can do it same way. Try disassembling some fastcall64 code, maybe you will change opinion I on the other side consider that this fastcall64 is stupid, wrong and pathetic. An expression of people that do not know or understand ASM and CPU anymore and apply childish academic concepts. A copy paste from RISC wrong concepts. This was bound to happen sooner or later because new people have lost the knowledge. This is in fact a revert to dark ages of programing. The funny part it that someday in "the future" they will re-invent STDCALL ... if they ever evolve back towards intelligence that is ... ) Until then (if ever) of course we do have to use it as it is (with help from macros probably). There is no purpose in fighting against what is. _________________ "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." Last edited by bogdanontanu on 21 Aug 2009, 14:57; edited 1 time in total |
|||
21 Aug 2009, 14:53 |
|
Azu 21 Aug 2009, 14:56
bogdanontanu wrote:
My only complaint is that their choice of registers sucks; why isn't RAX one of them? :/ And don't say "because they return stuff in it so it gets trashed", RCX/RDX/R8/R9 get trashed too! bogdanontanu wrote: Until then (if ever) of course we do have to use it as it is (with help from macros probably). There is no purpose in fighting against what is. Last edited by Azu on 21 Aug 2009, 15:06; edited 1 time in total |
|||
21 Aug 2009, 14:56 |
|
Borsuc 21 Aug 2009, 15:05
First of all, you can see the problem with that last statement of yours.
But really, pushing is much easier and elegant, and even for size optimization it's better, or at least it was in 32-bits, but the CPU designers may listen to "the crowd" unfortunately (and disabled much of pushing functionality). NOTE: I'm NOT against using registers as parameters AT ALL. I'm against the stupidity of reserving STATIC stack space even for the registers. I mean I just dropped my jaw when I read it! @vid: I dunno & don't care, I'm programming software, not cracking |
|||
21 Aug 2009, 15:05 |
|
Azu 21 Aug 2009, 15:08
Borsuc wrote: First of all, you can see the problem with that last statement of yours. Borsuc wrote: NOTE: I'm NOT against using registers as parameters AT ALL. I'm against the stupidity of reserving STATIC stack space even for the registers. I mean I just dropped my jaw when I read it! Borsuc wrote:
Last edited by Azu on 21 Aug 2009, 15:19; edited 1 time in total |
|||
21 Aug 2009, 15:08 |
|
Borsuc 21 Aug 2009, 15:18
Azu wrote: How can you program using the windows API without disassembling it a little first? The MSDN documentation is shit! Plus, the documentation says how you should write even if your version of Windows doesn't have all functionality, like the crappy but so frequently found Microsoft "reserved" parameters -- ironically which almost never get used (but crash if you use them). _________________ Previously known as The_Grey_Beast |
|||
21 Aug 2009, 15:18 |
|
Azu 21 Aug 2009, 15:20
Borsuc wrote:
|
|||
21 Aug 2009, 15:20 |
|
bogdanontanu 21 Aug 2009, 15:28
Azu wrote: What's wrong with using registers intead of stack? Most 64-bit CPUs can move things into and out of the registers much faster than stack.. If you have experience in programming you could not ask such a stupid question! It is nothing wrong with using registers. It is wrong to use registers for passing parameters. Here are a few answers so you can meditate upon: 1) Sending parameters by registers is primitive =================================== A method from DOS era with 2-3 parameters. You have to use the stack anyway. Later on STDCALL was discovered out of experience. 2) It is faster but... ================= Only for leaf functions and in heavy optimized functions. For general functions that call other functions that call other functions it is a huge mistake. And guess what this kind of functions are 90% of OS and applications body. For your example you do need to "spill" those registers anyway in most scenarios beacuse you usually NEED arguments further in your code and you can not preserve (and thus loose) 4 registers. The argument that you already have many registers and you can waste some of them is never valid. No matter how many registers you have you always need more 3) You need the stack anyway ======================= For more than 4 parameters an for other reasons. Stack is in CPU L1 cache always after a function call and hence the speed improvement is not big but the problems created by using registers are big... as you can see from disassembly 4) You need the stack for recursion. ========================== Recursion is an fundamental programming concept. You can not do it with registers. 5) Writing to stack is NOT faster than PUSH. ================================ Or it should not be. Of course you can make a CPU where this is as you wish it but conceptually the PUSH is always faster... mov [esp+48h],xxx and mov [esp+40h],yyy is not faster to be computed than an implicit and fixed mov [esp],xxx and add esp,8... This is basic CPU logic. 6)Stdcall is uniform and can be unwinded easy. ==================================== Uniformity and ease of understanding and debugging is very important. It also scales nicely to any number of parameters. Have you noticed the .pdata structures and the fixation of valid prologues and epilogues that are needed now in order to unwind stack and to provide exception handling? This is the result of stupidity. They have to patch holes up in very complex and pathetic ways. Think Occam's razor: when a solution is complex then it is wrong. There is no beauty in complexity and no simplicity inside complexity Complex = stupid but loved by mind device. 7) It is NOT a new improved concept. ============================= In fact it is very old MS-DOS like idea. Works for primitive CPU's and for primitive RISC CPU's. They did copy cat it from what RISC's already have had because they are unable to understand and to create something new but wanted to "change something" 8 ) It results in larger code ====================== And more complex code with no speed increase what so ever. Fixings the stack after the function is already stupid because it adds code after each function call. I agree that this is rarely needed for C calling and unknown number of parameters like in Sprintf ( ...) BUT to do this for every function is stupid. AND to require to also align the stack BEFORE the functions is double stupidity that results in a lot of not needed code. On inner loops an leaf functions ASM programmers did a lot of tricks and handcrafted optimizations anyway. You do not use STDCALL there. Instead you use handcrafted code. But to "handcraft" every function is priceless... It is stupid to do a set of dummy non algorithmically tricks on every common layer of non critical functions making code bloateware with no gain in speed but with loose of simplicity because "the compiler can do it". The compiler can do a lot of stupid things if you put it to do it... In conclusion: =========== Oh dear... human race is going down Basically it is stupid and primitive BUT we do have to use it as it is... there is no purpose to cry about it. Just do not fall for believing that it is a good or advanced idea. Think with your own mind instead. "New" is not necessarily good.... and it is not even new. It was chosen and "designed" by people with no knowledge or with bad intentions and desires to extract some minimal speed advantage out of instruction level tricks. A mistake with damaging results because they have no independent brain and copy paster some "well established" RISC myths and have "heard" that using registers is "faster".... Yes using registers is faster but NOT for this! I have noticed that this concept is loved by people that want to use ASM only for a few tiny but complex and super optimized functions and promote HLL for the 99.99% rest of code because they think that this complexity will be too much for writing large applications in ASM while compilers do it for free. I have seen "advanced" ASM people stating that ASM compilers should not do this because it is too complex and we should leave it to the C compilers or do it manually... ha ha ha My guess it that with the help of a few macro's ASM will fix this "problem" and make fastcall64 as easy in ASM as it is in C or other HLL. The difference being that if i want to I can use STDCALL in most of my code and use the stupid FASTCALL64 ONLY where it is needed as an interface with the OS. _________________ "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." Last edited by bogdanontanu on 21 Aug 2009, 15:53; edited 1 time in total |
|||
21 Aug 2009, 15:28 |
|
Azu 21 Aug 2009, 15:50
bogdanontanu wrote:
bogdanontanu wrote: 2) It is faster but only for leaf functions and in heavy optimized functions. For general functions that call other functions that call other functions it is a huge mistake. And guess what this kind of functions are 90% of OS and applications body. bogdanontanu wrote: For your example you do need to "spill" those registers anyway in most scenarios beacuse you usually NEED arguments further in your code and you can not preserve (and thus loose) 4 registers. bogdanontanu wrote: 3)You need the stack anyway for more than 4 parameters bogdanontanu wrote: 5) Writing to stack is NOT faster than PUSH. or it should not be. Of course you can make a CPU where this is as you wish it but conceptually the PUSH is always faster... mov [esp+48h],xxx and mov [esp+40h],yyy is not faster to be computed than an implicit and fixed mov [esp],xxx and add esp,8... This is basic CPU logic. bogdanontanu wrote: 6)Stdcall is uniform and can be unwinded easy. Uniformity and ease of understanding and debugging is very important. It also scales nicely to any number of parameters. bogdanontanu wrote: have you noticed the .pdata structures and the fixation of valid prologues and epilogues that are needed now in order to unwind stack and to provide exception handling? bogdanontanu wrote: This is the result of stupidity. They have to patch holes up in very complex and pathetic ways. bogdanontanu wrote: Think Occam's razor: when a solution is complex then it is wrong. There is no beauty in complexity and no simplicity inside complexity bogdanontanu wrote: 7) It is NOT a new improved concept. bogdanontanu wrote: It results in larger code and more complex code with no speed increase what so ever. bogdanontanu wrote: On inner loops an leaf functions ASM programmers did a lot of tricks and handcrafted optimizations anyway. You do not use STDCALL there. I'm really sleepy sorry, I'll reply to the rest of your post tomorrow. Unless somebody else already has. |
|||
21 Aug 2009, 15:50 |
|
LocoDelAssembly 21 Aug 2009, 16:36
What I do hate very much is how varargs were implemented:
Varargs wrote: If parameters are passed via varargs (for example, ellipsis arguments), then essentially the normal parameter passing applies including spilling the fifth and subsequent arguments. It is again the callee's responsibility to dump arguments that have their address taken. For floating-point values only, both the integer and the floating-point register will contain the float value in case the callee expects the value in the integer registers. |
|||
21 Aug 2009, 16:36 |
|
bogdanontanu 21 Aug 2009, 16:58
Azu wrote: So is the use of pointers, and strings, and integers. All of those are very old things that were around way back in DOS. But you use them, don't you? I am not against OLD things if they are correct. I am against using primitive things. Primitive= suboptimal solutions used only because at the time you can not or can not understand how to do better. Like using a wooden blade instead of an steel blade for a weapon. Pointers are not primitive. They are an fundamental element of programming. You can hide them in HLL and pretend that you do not use them but they are there behind the scenes. However sending arguments to a function via registers is primitive. Quote:
Thank you for your words. I am in fact much more that that. Do you have anything else to say to me? Quote:
Organizing and wrapping code correctly in hierarchical layers is NOT a design flaw. It is needed for large and real life projects. It helps browsing, managing, understanding, debugging, porting and expanding the code base. It is an intelligent solution. That is why FASM can run on multiple OS targets because it has such a minimal layer at least in some of its interfaces. That is why my own assembler Sol_Asm can run on multiple OS targets and it is so easy for me to debug and maintain it as a huge ASM project. Quote:
Priceless Quote:
I never said that somebody is using a register as a pointer. I know how win64 fastcall works. You move the values to registers but only for the first 4 parameters the rest are moved to stack ... and it uses up 4 registers that you will have to move to memory sooner or later. For the rest of the parameters (and many functions have more than 4) you will have to move them to the memory / stack somehow. The PUSH is much harder to be done in this context (stack alignment issues and stack reuse) and compilers usually end up moving them as I have shown : mov [esp + 40h],argument[n] This puts a problem if argument[n] is in fact a memory location because you can not move memory to memory ... unlike push where you can do: push [memory] The fact that you do not understand this kind of things (the need for spill also) makes me believe that you have not studied this seriously and made up your mind from "hear say" Quote: As does fastcall. o_o The essence of it is sending parameters via registers. This DOES NOT scale nicely because registers are a limited resource. They have to use a surrogate of STDCALL for more than 4 parameters anyway... And about "unwinding" I think you have to study more again .... Quote: I don't know. I use a flat programming model (no .rdata, .pdata, .reloc, .resource, etc, just one section). As I have said above you need to study more. Those sections serve a purpose... ".pdata" is relevant in this unwinding and exception handling discussion for win64 calling convention. It can not be ignored for long. Quote:
Maybe... but this is the logical result of sending parameters in registers.... then adding rest on stack, then trying to align the stack before the call to have parameters aligned ... In such a way one decision leads to another mistake and another mistake and then "shit happens" because your are trying to optimize at the expense of simplicity. Quote: x86-64 is not for you, then. It's a very CISCy (complex) instruction set overall. I think they made some mistakes with x86-64 design but not many under the circumstances. When I will design my own CPU I will make it more simple inside but still x-86 like IMHO. RISC CPU is a huge error but more easy to produce than CISC. I do like x86 architecture hence it is for me. But I will not consider it's mistakes to be good things just because I like and use it. I am not lying myself. I think the term CISC is designed by RISC zealots to be an insult for x-86 architecture. I guess that x-86 is complex inside when compared with a RISC ... BUT it is SIMPLE to use outside from an ASM programmer's point of view and this is what I need. I would have been better it this external simplicity of x86 architecture was obtained with more simplicity inside but you have to understand that simplicity dos not have to be implemented at all costs. This is the only limitation to simplicity: not simpler than what is needed to get the job done. A RISC CPU is sub-optimal. Anyway what we speak here is a software calling convention choice not a hardware choice. There is nothing in the x-86 hardware to requires or advocates such a calling convention as win64 fastcall. It is a choice of some humans that made a mistake. A mistake we have to live with for a very long time but we do not have to "make believe" it is a good thing. Such is life. Quote: Bug testing isn't a new improved concept. Do you not bug test? I do unit testing, functional testing, fuzzy testing... almost every day at work as a programmer. I am not against "old" I am against presenting "old" as "new" and "bad" as "better" when in fact they are not. I also do not change things when there is no need to. When ther is no clear improvement. When new things are more complex and provide only doubtful benefits. That is what I say: this change was not needed, it provides doubtful benefits, it is complex for no reasons other than to cover up conceptual mistakes and it is a copy cat from other architecture and older concepts. Quote:
Intel Core 2 Duo at 2Ghz. Quote:
Of course it is. But this in NOT what I say. I say that to use registers for ARGUMENTS in the context of win64 fastcall convetion results in larger code, more complex code with no speed increase overall. Why? a) Because you will most likely spill the arguments. b) Because you need to move on the stack the "other" arguments by not using PUSH and this is slower and it generates more code with memory arguments and it is bigger overall. c) because you need to align the stack additionally Quote:
Who pops arguments? But you do need registers for your inner algorithm and loosing 4 is not acceptable. One advantage of x86-64 is having more registers. I do not like being robbed of 4 of them. You will need to save them if you call other functions inside your function because the new function will use the very same registers... You do not have to save arguments that are on stack... because they are already "saved" and still always ready and fresh for you to use... STDCALL is simply better. Quote:
Yes of course. What I was trying to present is that nobody uses STDCALL inside heavy optimized inner loops. There you CAN and you SHOULD use registers at maximum... But to extrapolate this for general arguments passing is stupid. Quote:
Have a nice sleep. It takes time and experience to understand those kind of things don not worry too much about them. Sleep is of the essence. I think I have presented my available logical arguments and debating this further is not going to offer more logical arguments from my side.... hence I will rest my case and not debate it anymore. I guess it is ok to "love" and "believe" in win64 fast call convention. Just keep an open mind and an eye open for the alternatives and possible mistakes. _________________ "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." |
|||
21 Aug 2009, 16:58 |
|
Borsuc 21 Aug 2009, 17:55
Azu, the stack isn't slow at all because it's mostly in L1 cache so it is probably the same as registers.
Azu wrote: Um.. this is what they use that extra space you reserve for them. I think it's really stupid that they don't do it themselves as local variables though. For example: Code: Code 1: pushes on stack push [eax] call SomeFunction SomeFunction: ... ; do something ... ;finally use the pushed value add ebx, [esp+4] ; assuming no other local variables ... ; do some more ... ret 4 Code: mov eax, [eax] call SomeFunction SomeFunction: mov [esp+4], eax ... ; do something ... ;finally use the pushed value add ebx, [esp+4] ; assuming no other local variables ... ; do some more ... ret this is why fastcall64 is a stupidity. total FAIL. _________________ Previously known as The_Grey_Beast |
|||
21 Aug 2009, 17:55 |
|
Azu 22 Aug 2009, 03:28
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote: You move the values to registers but only for the first 4 parameters the rest are moved to stack ... and it uses up 4 registers that you will have to move to memory sooner or later. bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
bogdanontanu wrote:
Borsuc wrote: Azu, the stack isn't slow at all because it's mostly in L1 cache so it is probably the same as registers. Borsuc wrote:
Also I think most compilers optimize it so that not even a mov reg,arg is moved (i.e. arrange it so the argument ends up being in the right register to begin with).. if they don't, they suck. LocoDelAssembly wrote: What I do hate very much is how varargs were implemented: |
|||
22 Aug 2009, 03:28 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.