flat assembler
Message board for the users of flat assembler.

flat assembler > Main > Stack Realignment "Techniques"

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Author
Thread Post new topic Reply to topic
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16149
Location: Hyperborea
system error wrote:
You really are CLUELESS ...
You were already asked to cease with the personal name calling.
Post 31 Aug 2017, 05:16
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
Furs wrote:
I'm asking since I don't know if I should try to even suggest this to GCC patches, but knowing some RETARDS there (who also happen to have broken english), it would probably get ignored. I mean if there's a reasoning for and before sub, or whatever. And it's not only GCC.


Furs, please behave yourself. Calling GCC maintainers RETARDS is not permissible on this board. This is your first post and you've already calling people names. Don't be such a hypocrite -_-
Post 31 Aug 2017, 08:12
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
Furs wrote:
The part where you use 2 functions when there should only be 1 working in both cases, MORON. ONE function.


Furs, calling people MORON won't do you much good. Please behave and watch your language. This is not your home -_-
Post 31 Aug 2017, 08:15
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
^ ok, officially, dealing with an 8 year old.

And yes, I call some of the GCC maintainers retarded, because they are (only 2 of them btw). You know, I have to deal with their bs often (nothing about programming, they're retarded for management reasons so to speak, and in fact that's why I get so annoyed, cause I spend less time programming having to deal with their bullshit). That's one reason I won't give you link to the patch when it'll be ready, because I want to keep the anonymity. And not like I have to prove anything to an 8 year old.

This thread was a question; I didn't want to waste my time with a possible patch if it had a reason. Question was long answered. You're the only one who thinks it's something else.
Post 31 Aug 2017, 11:51
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
@Furs

Of course, you'll stay anonymous when sending the "patch". We all know what's coming xD

I am getting the impressions that you're a very important people in GCC development and maintenance. You know, things like;

1 - "I OFTEN spend MOST of my time dealing with them rather than my own code". Sounds like enormous amount of work, time and sacrifices! Surprised

2 - "Go over the top management of GCC and pointing fingers at them and calling them all retards". Sounds like a true boss of GCC! They deserve it, right?

Wow!! From those two PROOF of strong words, We believe you. We are impressed! You are a very important people in GCC. Millions of programmers will depend on you! Surprised

Now get yourself a good book on assembly language and start practicing STACK PROGRAMMING like I asked you to. And don't forget to wash my car after that. Got it, Joe?
Post 31 Aug 2017, 13:21
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
I'm not staying anonymous when sending the patch, since that's not even possible (man, you need to grow up for once). Clearly, you have never sent a single patch in your entire life, obvious as fuck. You even have to sign to the FSF that all contributions will be legally owned by FSF, otherwise they won't let you send any patches. (you know, legally sign it, a tough world out there for wannabes like you). Derp.

I'm staying anonymous on this board so the connection (between my real identity with GCC) and here is not linked. Because then I wouldn't be able to call them freely retarded, as it could backfire. Use your brain, it helps.

I think people are perfectly able to read what I post, no need to edit it and twist it based on what you're personally thinking. I said I spend less time programming, not LESS THAN.
Post 31 Aug 2017, 13:50
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
Furs wrote:
First, in the worst case it will use 64 bytes, however in the best case, it will use only 36. The first one uses at least 64 bytes, and that's its best case.


No Furs. There's no worse case best case scenario. Are you playing with statistical theory here? If you know basic stack programming, you can allocate exactly 36 bytes for both data on the stack. This is where you fail big time in interpreting low-level codes. Your incompetency is exposed wide open from the way you described and interpreted them no matter how hard you try to hide it from people with your "impression". Caught you red-handed so many times, pal!

And what's the chance that you would call Tomasz Gryzstar a retard like you do to those brilliant GCC maintainers? Judging from the way you were "teaching" him about those "asm thingies" in other threads, I guess it's just a matter of time before you show the same attitude against FASM/FASMG maintainer as well.
Post 31 Aug 2017, 15:18
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
Huh? 36 is best case when it doesn't require realignment. 64 is when it requires realignment. You can't control that, because you can't control the incoming alignment. Otherwise, you wouldn't need to realign the stack, so what the hell is your point?

Anyway, back to serious topic (and no, not at you, feel free to ignore, I don't care).

There's another problem I discovered with GCC. When it saves the XMM registers on the stack (the bullshit x64 MS ABI being the only one in existence requiring so, none other ABI (x64 or 32-bit) need it, I fucking hate it)... it emits shit code like (translated from AT&T):

Code:
push rbp
mov rbp, rsp
sub rsp, 64  ; space for XMM regs saved (4 of them in this case)
and rsp, -32 ; realign stack to 32-bytes (yes, regs are saved unaligned!)
sub rsp, 32  ; stack frame (random stuff, one AVX2 vector in this case)    


This seems stupid, right? Well, maybe it's a missed optimization, I thought, which is easily fixable with an extra pass in the gen_prologue insn hook. So I thought, why can't it be like this:
Code:
push rbp
mov rbp, rsp
sub rsp, 96  ; space for XMM regs saved and stack frame
and rsp, -32 ; realign stack to 32-bytes    
But no, in the source code, they say it's intentional, this is the excerpt:
Code:
      /* The computation of the size of the re-aligned stack frame means
         that we must allocate the size of the register save area before
         performing the actual alignment.  Otherwise we cannot guarantee
         that there's enough storage above the realignment point.  */
    
Is there something I'm missing here? At this point I already merged them with my plugin but I don't know if I did something bad or not.


Last edited by Furs on 31 Aug 2017, 15:51; edited 1 time in total
Post 31 Aug 2017, 15:49
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
After reviewing Furs' "expert" comment in the first page, I am pretty sure that Furs is trying to apply some sort of statistical theorem (best case, worse case etc) against an innocent and naive stack. This is unprecedented and a true discovery indeed! I am crying in joy!
Post 31 Aug 2017, 15:50
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16149
Location: Hyperborea
Furs wrote:
Code:
      /* The computation of the size of the re-aligned stack frame means
         that we must allocate the size of the register save area before
         performing the actual alignment.  Otherwise we cannot guarantee
         that there's enough storage above the realignment point.  */
    
Is there something I'm missing here? At this point I already merged them with my plugin but I don't know if I did something bad or not.
The quote is true, but the implementation seems inefficient. There is no reason to adjust RSP twice, a single adjustment is all that is needed. But you then have to change the RSP offsets in the code that saves/restores the XMM registers.
Post 31 Aug 2017, 20:12
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
Well I forgot to mention, they use rbp to save the XMM regs (that's what I meant by "unaligned") -- i.e. they're relative to the saved frame pointer, not the "aligned" stack pointer (rbp's value is before any alignment, obviously).

Since rbp's value does not change, the comment doesn't make much sense to me to be honest. I mean, that comment is in the same "if block" and its sub (the XMM save reg size, 64 in my example) is exactly before the alignment operation emitted. I mean, why does it have to be enough storage *above* the realignment spot?

It's as if they treat the "frame" (i.e. below the alignment spot) and the "stuff above the frame" (saved regs, etc) as totally different memory pages that can't coexist together and are in totally separate parts of memory. What the hell, lol. (this is even from the x86-specific hooks, so it's machine-specific, not generic code to cater to other machines also)
Post 31 Aug 2017, 20:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16149
Location: Hyperborea
Is using RBP and unaligned stores necessary for the stack unwinding to be correct? I'd guess that the debugger expects certain things in certain places. But even so I still don't see any reason to adjust RSP twice.
Post 31 Aug 2017, 20:45
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
Hmm now that you mention it, it might be related to the stupid way x64 Windows does SEH (exception handling), does it even need to restore the registers when doing unwinding? However it still doesn't make sense since that space, well, exists even if the subtraction is done for the frame. (the XMM registers would be placed at the exact same offsets).

But since x64 SEH is so wacky it might be that it probably needs to keep a "record" of saved registers at exact instruction boundaries, and it has to do this before allocating the frame (or something like that). If the tables describe the frame allocation as "after" the saved registers (in the instruction stream), then the instructions have to be that way. I've never implemented it manually, only seen GCC use some ".cfi" directives to implement it, so I don't know exact specifics.

The thing is, GCC doesn't even support SEH with MinGW (at least not with mine), only seems to support it when build via Cygmin. And I am 100% sure I disabled exceptions (because if I need them, I'll make it manually) since it doesn't emit any .cfi directives for this test-case.

But they probably just use a generic version without optimizing for specific settings/cases, to "keep the compiler simpler", bleh. Thankfully GCC supports plugins and the gen_prologue is an insn hook so I can redirect it to my function in the plugin without patching GCC for personal use, cause I'm sure they wouldn't accept a patch that "complicates the compiler" unless it provides "obvious benefits" (benefit like idk, not using SEH when it's fucking disabled). [see, this is why I hate 2 of the maintainers, the only cool guy there in this respect seems to be Uros]
Post 31 Aug 2017, 21:23
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
Why don't u smarty pants leave the GCC alone. There must be some valid reason why the do it that way. Reasons that are beyond your shallow understanding of how GCC works. They could place some specific registers via RBP or things that we don't know. For example, GCC interrupt handlers are all callee-saver. And as I recall it, the volatiles must reside exactly at function prologue prior to any locals. They're not used to save the XMM or YMM registers. So WTF they need alignment for saving volatiles?

I am not interested in GCC's internals so I maybe wrong but GCC people are not stupid. They are just as good or even better at machine instructions than some "master" I know xD

Why don't you people help Tomasz improve FASM's internals instead of GCC? Hmmm.?? hmmmmm????
Post 01 Sep 2017, 03:39
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
system error wrote:
And as I recall it, the volatiles must reside exactly at function prologue prior to any locals. They're not used to save the XMM or YMM registers. So WTF they need alignment for saving volatiles?
??? XMM6+ are required to be saved due to MS x64 ABI. In fact, that's the entire reason for Intel adding the "vzeroupper" crap and complicating themselves with it -- instead of just doing it, by default, zero-extension when they added ymm registers.

(MS ABI, being "old", only requires the XMM regs to be saved, as in the 128-bit lowerpart since that's what existed when it was designed, this is why any ABI reliant on piece of shit SSE is garbage and not future-proof; Linux x64 ABI has no problems at all here)

I know you were totally clueless about it (when we argued about MS ABI long ago and you proved you don't know a single thing)... but man, after all this time, you still don't know the ABI. No hope for you I guess. I'm seriously starting to think you're a chatbot who learns to parrot from random posts online.

Here's the "full" prologue of the example (GCC generated):
Code:
push rbp
mov rbp, rsp
sub rsp, 64
and rsp, -32
sub rsp, 32
vmovups [rbp-64], xmm6
vmovups [rbp-48], xmm7
vmovups [rbp-32], xmm8
vmovups [rbp-16], xmm9    
Also btw, MS calls these regs NON volatile. Volatile are those which don't need to be saved, like xmm0-xmm5.

system error wrote:
but GCC people are not stupid.
You'd be surprised. Older (ancient) GCC versions sometimes produce much better code, so yes, they *are* stupid to have ruined their own code with junk.

Also, stop telling me what to do. I don't care, especially coming from you.

And why would I help improve FASM? FASM is a good enough tool for me. I'm not happy with the obvious crap code produced by GCC in some cases. FASM, on the other hand, is perfectly fine. I do it out of "necessity".
Post 01 Sep 2017, 14:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16149
Location: Hyperborea
Furs wrote:
... (GCC generated):
Code:
;...
sub rsp, 64
and rsp, -32
sub rsp, 32
;...    
Yeah, that is kinda dumb. "Should" be:
Code:
;...
sub rsp, 64+32
and rsp, -32
;...    
Gives exactly the same result.
Post 01 Sep 2017, 15:49
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
Quote:
??? XMM6+ are required to be saved due to MS x64 ABI. In fact, that's the entire reason for Intel adding the "vzeroupper" crap and complicating themselves with it -- instead of just doing it, by default, zero-extension when they added ymm registers.

(MS ABI, being "old", only requires the XMM regs to be saved, as in the 128-bit lowerpar


There is no requirement to save the XMM registers in 64-bit ABI. Your INCOMPETENCY is showing right from your first sentence! Just because they are volatiles / non-volaties, that doesn't mean it is a requirement to save their states. You really are a genuine INCOMPETENT aren't you? xD

Code:
vmovups [rbp-64], xmm6 
vmovups [rbp-48], xmm7 
vmovups [rbp-32], xmm8 
vmovups [rbp-16], xmm9

Also btw, MS calls these regs NON volatile.... (garbage)    


MS 64-bit ABI does not involve AVX instructions set, you dumbfcuk! You are so caught this time, with your pants down! hahahaha xD
Post 01 Sep 2017, 16:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 16149
Location: Hyperborea
system error wrote:
... you dumbfcuk!
Personal insults are not acceptable. In future I might just decide to delete your entire message if you persist.
Post 01 Sep 2017, 17:03
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1318
system error wrote:
There is no requirement to save the XMM registers in 64-bit ABI.
Microsoft say otherwise.

Microsoft wrote:
Integer arguments are passed in registers RCX, RDX, R8, and R9. Floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte arguments are passed by reference. Parameter passing is described in detail in Parameter Passing. In addition to these registers, RAX, R10, R11, XMM4, and XMM5 are considered volatile. All other registers are non-volatile.
Here's a table from Microsoft. See last entry:
Microsoft wrote:
XMM6:XMM15, YMM6:YMM15 Nonvolatile (XMM), Volatile (upper half of YMM)
Whether your function uses AVX or not, the CALLER expects XMM6-XMM15 to NOT BE CHANGED. Because of the ABI, and only the lower half of them must not be changed (thus only the lower half, the XMM reg, not full YMM, must be saved).

Furthermore, GCC saves them if they are used, so no matter what you say, it does it. I thought you said they aren't idiots...? So they must have a reason for saving them then? Either way, you shot yourself in the foot with this one. Wink (in the example it only saved XMM6-XMM9 cause those were the only ones used in the function)

@revolution: Yeah, I think it's 2 cases here:

1) GCC insists that the top of the stack frame be aligned to the same alignment as [esp]. I don't know why, maybe because of stupid machines where the stack grows upward (since the thing that does this is in reload, not in the x86 prologue stuff -- the sub esp, 64 in my first case, 64 comes from LRA/reload which is a "generic" register allocator for all machines, using hooks for specific regs)

2) It's something related to x64 SEH. I mean, they *on purpose* save all the XMM regs before the alignment using rbp, for some reason. It emits "unaligned" moves, because I told it to not assume anything about the stack. So it could have non-16 byte boundary on entry (for example, Wine needs this in Windows API functions, because some applications break the ABI and don't align the stack before an API call, can't blame them since the ABI sucks -- apparently Windows is also tolerant, or not using SSE, or realigns the stack maybe just to be compatible with those apps, since they work).

So it emits unaligned moves because it insists to put them "up there" instead of from rsp (which is why the two subs must be there for some stupid reason). Of course this doesn't answer why the two subs can't be combined -- I think it's either SEH, or they just don't care.
Post 01 Sep 2017, 17:21
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
@Furs

1. I suspect that you been reading the manual upside down No wonder....hahahaha xD

Non-Volatiles simply means the API makes no use of them! They are saved on as-needed basis only. It is not compulsory or a requirement of the ABI. These are USER's own registers for USER's own use and responsibility. The ABI's don't give a fcuk what,, when and how you wanna use them xD

Volatiles = are clobbered / scratch registers. It us UP TO YOU whether you want to save them or not before calling an API. It is not a MS 64-bit requirement! the API simply don't care. APIs use them as they please.

So in both cases, preserving them are never part of the MS-64bit requirements. It us up to the users, IF such requirements exist. You are confusing ABI requirements vs User's requirements!

See, this is how you should read a technical documentation xD


3. And NO. MS-64 bit ABI does not involve any AVX instructions like VMOVUPS because Windows runs on non-AVX computers too! Please stop embarrassing yourself Furs!

Is this to technical for you??? I bet it is! hahaha xD
Post 01 Sep 2017, 18:21
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.