flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Heap > Furs & system error: BFF

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
You got schooled, proven wrong 4 times (including on micro ops where you were flat-out WRONG, bloat, and alignment now), and you still think you're a hot shot?

Go retreat with your tail between your legs, you're like a lost puppy trying to act tough without realizing nobody cares of your wimping -- you should learn to actually program instead of talk.

You haven't even used functions with more than 4 parameters in MS ABI. Christ. You really don't know anything, do you?

You see, unlike you, I've got years of experience using this shitty MS ABI, so you'd best be quiet and start learning from people with actual knowledge.

I've probably coded more AVX functions with this shitty ABI than you have coded ALL asm functions (not just vector ones) in your life, just get lost thanks.

You know, I obviously complain because I have to actually use it. Because unlike you, some of us do code in asm.



You mean what INCOMPETENT IDIOT? You've proven that you're INCOMPETENT all along.

Are you suggesting your IMAGINARY CALLING CONVENTION and IMAGINARY LIBRARY again with BLOATED function prologues?

HAHAHAHAHA Very Happy
Post 02 Apr 2017, 18:26
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361
Ah, now i know why you can't write any assembly. To you, anything you write is "imaginary". Only Microsoft's code exists and their designs. Clearly, you tried hard to write your own code instead of using Microsoft's, but in the end it was just imaginary Sad

You need more willpower to turn it into real code like I do Wink That's the difference between you and me here -- I write real code, you write imaginary code. Not every one of us is slave of others' (Microsoft) code tho Wink

I don't even care what you say anymore, to me this thread should be locked (to prevent you spamming it with "HAHAHAHAHA" and "INCOMPETENT IDIOT" obviously) but never deleted just so everyone on this board can make their own opinion about you. Well that's if they can stand your cringe-worthy posts, that is.

@revolution please lock the thread cause he can't control his hysteria, thanks (he even double posts just to use that spam)
Post 02 Apr 2017, 18:37
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
Ah, now i know why you can't write any assembly. To you, anything you write is "imaginary". Only Microsoft's code exists and their designs. Clearly, you tried hard to write your own code instead of using Microsoft's, but in the end it was just imaginary Sad

You need more willpower to turn it into real code like I do Wink That's the difference between you and me here -- I write real code, you write imaginary code. Not every one of us is slave of others' (Microsoft) code tho Wink

I don't even care what you say anymore, to me this thread should be locked (to prevent you spamming it with "HAHAHAHAHA" and "INCOMPETENT IDIOT" obviously) but never deleted just so everyone on this board can make their own opinion about you. Well that's if they can stand your cringe-worthy posts, that is.

@revolution please lock the thread cause he can't control his hysteria, thanks (he even double posts just to use that spam)



Ahhh,, you suddenly become a nice person after such hard schooling! Surprise, surprise. No more name-calling in this post I am quoting? Wooww. My reflective-style schooling technique must be very effective then Very Happy
Post 02 Apr 2017, 19:30
View user's profile Send private message Reply with quote
Trinitek



Joined: 06 Nov 2011
Posts: 255
Great show, guys. Laughing
Post 03 Apr 2017, 01:46
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477
I think I should write a short article on MS 64 ABI, explaining to the 32-bit invokers on how to deal with it correctly. Or else they'll keep on blaming it for no reasons, out of their own ignorance and incompetency. But this one we have in here is a special kind of incompetence - he didn't understand it AND calling the designers idiot at the same time! What a BIG JOKE! Hahahaha Very Happy
Post 03 Apr 2017, 16:04
View user's profile Send private message Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 5861
Location: Poland

system error wrote:
I think I should write a short article on MS 64 ABI, explaining to the 32-bit invokers on how to deal with it correctly.

I think it's a good idea and I would be glad to see it in "Examples and Tutorials" forum section.
Post 03 Apr 2017, 16:49
View user's profile Send private message Send e-mail Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

MHajduk wrote:

system error wrote:
I think I should write a short article on MS 64 ABI, explaining to the 32-bit invokers on how to deal with it correctly.

I think it's a good idea and I would be glad to see it in "Examples and Tutorials" forum section.



A portion of them are already in this thread, if the readers can ignore the harsh words. Chained Alignment is the key to understanding it. Since the target audience are not beginners, that should be easily digested by 32-bit invokers.
Post 03 Apr 2017, 17:33
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361

MHajduk wrote:
I think it's a good idea and I would be glad to see it in "Examples and Tutorials" forum section.

You should read this thread and realize you're asking someone who

1) Never coded AVX in his life
2) Mixes up MS ABI (Windows) with Linux ABI for 64-bit (he said RAX is used for variable arguments)
3) Never used a function with more than 4 parameters (that is, never used any function with a parameter on the stack)
4) Claimed PUSH is slower than MOV on "modern CPUs" and proven wrong by 2 people with said "modern CPUs" (me and zhak), while he himself is using crappy CPU
5) MISALIGNED THE STACK FOR AVX and when proven wrong all he could say was "OMG HAHAHAHAHAHA"
6) Needs to learn more words in the dictionary than "incompetent" and "hahahaha"

You really want Tutorial from this guy? To do what, mislead newbies?

Just so you know, here's an example of his hysteria (previous page): https://board.flatassembler.net/topic.php?p=195341#195341

He throws words like "Chained alignment" while still failing to align a 32-byte vector without even understanding why (he fails at math, since this is simple math at play).

I'm willing to bet he'll quote this and use his 3 dictionary words, and still ignore the fact he misaligned that vector, just watch.

Even newbies know the only way to align something that is not aligned to at least that multiple (i.e. 16-byte to 32-byte, or 8-byte to 32-byte, or X-byte to 32-byte, all are identical) need to use bitwise AND, except system error. He's a special kind of newbie.
Post 03 Apr 2017, 20:00
View user's profile Send private message Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 5861
Location: Poland
Probably most of the people active on this board will agree that the section "Examples and Tutorials" should be something in a form of a library targeted especially to fasm programmers (because fasm has got some individual, unique features absent in other assemblers) hence if someone is willing to write a tutorial, let he / she do it. The form of article / paper / tutorial imposes some logical structure on the text and forces use of language avoiding hyper-emotional phrases. This makes all arguments easier to accept or reject basing only on the content not the form.

I have seen many interesting articles published on WASM forum (https://wasm.in/blogs/) and think that it would be good to have something similar here.
Post 03 Apr 2017, 20:43
View user's profile Send private message Send e-mail Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 14797
Location: Lost in translation

Furs wrote:
You really want Tutorial from this guy?

Yes. We welcome contributions from everyone.

Furs wrote:
To do what, mislead newbies?

There is no such thing as "the one correct" way to do something in programming.
Post 03 Apr 2017, 21:02
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361

revolution wrote:
There is no such thing as "the one correct" way to do something in programming.

The results have to be correct though. Otherwise what's the point of a tutorial? For example, failing to align something is definitely not "correct" no matter how you do it. If the goal was to align it, of course. (what else would he even write about then)

Also, he can write anything he wants and nobody is going to stop him (not like anyone should, so I agree). My question is if you wanted tutorial from him. Personally, if I were to go back in time before I knew much of asm I wouldn't exactly want to read "wrong" tutorials, no matter how well-intentioned the author was.

Though if he at least admitted for being clueless and tried to constructively improve his knowledge, he'd probably be able to write correct code by now.

(and I don't mean just the alignment stuff -- I'm talking about the entire ABI thing, since that's what the tutorial would be -- he doesn't know the ABI, MS ABI doesn't use RAX as input at all, he didn't know 5+ parameters go on stack, etc... that's misleading to newbies who want to learn the ABI IMO; but that's just me Razz)
Post 03 Apr 2017, 23:17
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
I only know to use bitwise AND but boasting about working around AVX programs for years.



Advanced programmers use MOD.

See, I easily throw your life's work into the Recycle Bin despite you claiming to be expert in AVX. You don't know how to use the MOD operator??

ANDing the RSP will result in the use of BLOATED of function prologues and epilogues. An AVX/AVX512 expert like you should be able to find an integer solution around it without using AND. I suspect you're also INCOMPETENT in simple Integer math.
Post 04 Apr 2017, 01:54
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

MHajduk wrote:
Probably most of the people active on this board will agree that the section "Examples and Tutorials" should be something in a form of a library targeted especially to fasm programmers (because fasm has got some individual, unique features absent in other assemblers) hence if someone is willing to write a tutorial, let he / she do it. The form of article / paper / tutorial imposes some logical structure on the text and forces use of language avoiding hyper-emotional phrases. This makes all arguments easier to accept or reject basing only on the content not the form.

I have seen many interesting articles published on WASM forum (https://wasm.in/blogs/) and think that it would be good to have something similar here.



FASM community has a big pool of talents. I think it's the language problems that make them a bit shy to write a structured tutorial. I think the best approach is to have a Q&A session kind of tutorial where sentences are short and more to the point. I am also bad at English (not first language). My spoken English is even worse (not as worse as Wayne Rooney though. He talks part Gaelic part Sanskrit).

If there's one thing I like about Stackoverflow, is the feature that allows a higher-ranked member to edit a questioner's written question at least to a sane comprehensible tone.
Post 04 Apr 2017, 02:12
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
mommy, mommy, close this thread please! I can't take it anymopre ;(



What?
Post 04 Apr 2017, 02:18
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361
^ You see what I mean? You want this type of child to make a tutorial?

There is no MOD operator in x86, only div which calculates modulus as a side-effect. I obviously implied in one instruction, smartass. (you can do with with branches in many more instructions, but that's beside the point obviously, and super bloated/slow). You're such an expert in using MOD so how about show some code using it compared to bitwise AND? Let's look at your great solution.

I'm sure we can all learn a great deal from you, weren't you going to write a tutorial? Tutorials are not written with fake quotes, but with code that works. Show us the x86 code then Wink (needless to say, we are talking about asm, right? Confused)

You see, it's hard for you to accept the fact I haven't written a single piece of code you could improve on. That's a fact, sorry. Not that my code is the best, but that you haven't improved on anything I posted in this thread.

If you manage to improve my code so that (1) it has the *exact same* output in *all* accepted inputs, (2) performs better on *my* CPU (or any CPU that supports AVX, which means has a stack machine), or (3) compiles to *less bytes* than my code, then you'll have a point and we can have a civil discussion.

(please note that "my CPU" has a meaning: you know this was a rant initially, and you decided to butt in; if an ABI makes poor use of my CPU I will rant, if you have a problem with that then keep it to yourself?)

Please note that (1) is the most important here. After all, you can do the following:

Code:
mov rcxsome_buffer
call super_fast_sort_algorithm

Which is fine, but my solution is "better" if we ignore (1) like you do:

Code:
mov rcxsome_buffer
call super_fast_sort_algorithm

;...

super_fast_sort_algorithm:
ret

See? My code is instant! And it works because I tested it when some_buffer was already sorted! Now do you see how your code looks?

Wink
Post 04 Apr 2017, 11:05
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
^ You see what I mean? You want this type of child to make a tutorial?

There is no MOD operator in x86, only div which calculates modulus as a side-effect. I obviously implied in one instruction, smartass. (you can do with with branches in many more instructions, but that's beside the point obviously, and super bloated/slow). You're such an expert in using MOD so how about show some code using it compared to bitwise AND? Let's look at your great solution.

I'm sure we can all learn a great deal from you, weren't you going to write a tutorial? Tutorials are not written with fake quotes, but with code that works. Show us the x86 code then Wink (needless to say, we are talking about asm, right? Confused)

You see, it's hard for you to accept the fact I haven't written a single piece of code you could improve on. That's a fact, sorry. Not that my code is the best, but that you haven't improved on anything I posted in this thread.

If you manage to improve my code so that (1) it has the *exact same* output in *all* accepted inputs, (2) performs better on *my* CPU (or any CPU that supports AVX, which means has a stack machine), or (3) compiles to *less bytes* than my code, then you'll have a point and we can have a civil discussion.

(please note that "my CPU" has a meaning: you know this was a rant initially, and you decided to butt in; if an ABI makes poor use of my CPU I will rant, if you have a problem with that then keep it to yourself?)

Please note that (1) is the most important here. After all, you can do the following:

Code:
mov rcxsome_buffer
call super_fast_sort_algorithm

Which is fine, but my solution is "better" if we ignore (1) like you do:

Code:
mov rcxsome_buffer
call super_fast_sort_algorithm

;...

super_fast_sort_algorithm:
ret

See? My code is instant! And it works because I tested it when some_buffer was already sorted! Now do you see how your code looks?

Wink



I wasn't asking about how to call a function. Nobody did. Not to offend or anything but "mov rcx, some_buffer" is not a solution. It's a textbook material Chapter 1: How to Use a Register.

You're not that bright, are you?
Post 04 Apr 2017, 13:06
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361
Huh? You completely missed the point. It wasn't about calling the function or even about calling conventions.

It was about showing you code that doesn't work (well, it does work only if the buffer is sorted already Wink) and is "much faster" and "less bloated" because it doesn't do anything. The point wasn't the function call but the function itself -- I omitted it in the first case, cause I didn't want to show a real sort implementation, obviously.

Look, it was the same as your code which misaligned the 32-byte vector unless the stack was 32-byte aligned to begin with (i.e. it basically didn't align anything at all and worked if the stack was already aligned to 32-byte upon entry).

Of course it is smaller/less bloated/faster (no bitwise AND, no frame pointer), because it doesn't do ANYTHING in respect to 32-byte alignment. That doesn't mean it's better, since it fails to do what it should.

Nothing to do with calling conventions here.

The reason I ranted about the alignment for MS and Linux ABIs is simply because it is bloated and doesn't help me at all, since I have only two types of functions:

1) functions that don't use vectors, for which the alignment is useless and wastes stack space, these functions are also by far the most common
2) functions that use AVX, for which I have to realign the stack anyway

(note that MS ABI is much worse than Linux ABI, due to Shadow Space and also only "saving" certain XMM registers, not YMM -- that's why Intel probably had to add vzeroupper for AVX, because of how stupidly short-sighted MS ABI was conceived)

But now I was only talking about your code which failed to realign the stack, nothing else and not ABIs. If you have a better method to realign the vectors to 32-bytes on the stack, then be my guest, it would be interesting to see.

By "better" I of course mean either smaller or faster, though you can't really go smaller/faster than one bitwise AND.

(note that the frame pointer is needed regardless of HOW you realign the stack, because you *need* the former version of the stack, though you can use any register instead of rbp, even a XMM register, to hold it!)
Post 04 Apr 2017, 14:43
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 477

Furs wrote:
The reason I ranted about the alignment for MS and Linux ABIs is simply because it is bloated and doesn't help me at all, since I have only two types of functions:

1) functions that don't use vectors, for which the alignment is useless and wastes stack space, these functions are also by far the most common
2) functions that use AVX, for which I have to realign the stack anyway




MS64 ABI is applying the same technique. What do you mean? MOV RCX, SOMESTRING is exactly applied to MSVCRT printf. No vectors. No alignment. Are you sure you're talking about programming, and not cooking?

Wasting spaces is inevitable in X86 family. It's been around since the eternity. PUSH is a complete waste of stack too!

I don't understand what you mean by that. The limitation of MS64 ABI is not because of poor design, but as a result of MEMORY LIMITATION of X64 CPU. All memory-based instructions larger than 2^64 are using pointers anyway regardless of SSE, AVX or even Knights Landing. Maybe you got the wrong impression about the instuction like MOVDQA/VMOVDQA. These are all pointer-based instructions. So suggesting that

MOV RCX,SOME_256_OFFSET

to replace AVX addressing is rather shallow in my opinion because that's exactly what instructions like VMOVDQA is doing. Now look,

mov rcx,offset_256
vmovdqa ymm0,qqword[rcx]

has exactly the same effect as

vmovdqa ymm0,qqword[offset_256]

You are suggesting nothing new here.


Furs wrote:
But now I was only talking about your code which failed to realign the stack, nothing else and not ABIs. If you have a better method to realign the vectors to 32-bytes on the stack, then be my guest, it would be interesting to see.

By "better" I of course mean either smaller or faster, though you can't really go smaller/faster than one bitwise AND.

(note that the frame pointer is needed regardless of HOW you realign the stack, because you *need* the former version of the stack, though you can use any register instead of rbp, even a XMM register, to hold it!)



No frame or frame pointer is needed if you know how to align the stack by using simple integer math via SUB RSP. Some people use MOD and some use AND to create a complex integer expression to align the stack to certain boundaries. And not many people are willing to share that know-how, including myself. Are you expecting some free lunch from me? You claim to be a master in AVX/AVX2, so I think no problem for you to come up with an integer expression like that.
Post 04 Apr 2017, 15:59
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361
Dude ignore the mov rcx, ffs! The point is the sort function itself, not how it is called.

Ok, let's inline the function so there's literally no call.

Here's my "bloated" code in this analogy:

Code:
; sort function body here operating on big_buffer

Here's yours:

Code:


See? Nothing.

Clearly yours is much smaller than mine, right? Since it has zero instructions. But guess what? It doesn't work unless the buffer is already sorted (because it does nothing). So what is the point of saying mine is bloated? You never even implemented one that works.

I'm saying this cause you keep saying how my frame pointer + AND rsp is "bloated" but you failed to provide an alternative that works for 32-byte vectors on the stack. So I'm stuck with my "bloated option" until you provide me a better one, since you claim it's bloated, clearly you must have a better method available?


system error wrote:
No frame or frame pointer is needed if you know how to align the stack by using simple integer math via SUB RSP. Some people use MOD and some use AND to create a complex integer expression to align the stack to certain boundaries. And not many people are willing to share that know-how, including myself. Are you expecting some free lunch from me? You claim to be a master in AVX/AVX2, so I think no problem for you to come up with an integer expression like that.

What are you talking about?

You cannot align the stack with SUB rsp unless you know the incoming stack alignment exactly and it has to be at least a multiple of what your alignment desires are.

So if the incoming stack is 16-bytes, you *cannot* align it to 32-bytes with "sub" only without using AND or MOD. The reasoning is purely mathematical.

You can have rsp be any multiple of 16, but there's TWO multiples of 16 (values) for EACH 32-byte multiple.

For example, rsp can be 0, which is aligned for both 16 and 32. But rsp can be also 16, which is aligned for 16 but not for 32. rsp can be 32, which is aligned for both, again. But rsp can also be 48, which is aligned only for 16 and not 32.

Do you understand this simple concept?

Subtracting only serves as an offset. For example, here's two cases:


Code:
1if the incoming rsp is 48you can subtract 16 to get a 32-byte aligned vector! but this FAILS if rsp is already 32-byte alignedi.e. if rsp is 32 or 64

2soyou remove the sub rsp -- this works if rsp is 32-byte aligned (3264etc) -- but now it FAILS for rsp 1648etc

You can adjust the sub's constant all you want, it is *impossible* to solve this with just a linear sub operation. Impossible. Mathematically.

You need either to:

1) sub 16 bytes if rsp is not aligned to 32 or
2) sub 0 bytes if rsp is aligned to 32

Such "if" conditional logic cannot be done with a simple subtraction. You can do with a branch, but that's stupid when you can just use AND.

So why do you need a frame pointer? Well, because it is a conditional operation.

You see it subtracts from rsp 16 extra, or zero. But that's based on a condition (its value at entry to the function) that you won't have access to anymore once the condition is done, unless you store the original rsp's value (or whatever) somewhere.

How can you restore the stack or access parameters on the stack (if any) if that depends on a condition? That condition isn't even available anymore for testing, btw. You lost the original rsp's value -- you cannot test anything, you need to store it somewhere to be able to return from the function (restore the stack).

Most people/compilers use rbp / frame pointer for this purpose, but really you can use anything you want. You can even use a global variable (though that is not thread-safe and is a little stupid), or you can use an XMM register or even store it in the x87 FPU if you want (as long as your function doesn't use it), but I don't know if that works on Windows, you need full 80-bits of precision to be able to store a 64-bit pointer in the FPU, since we'll need a 64-bit significand, and I think Windows lowers the precision by default, you'll have to change the FPU Control Word yourself.


Your "alignment chain" works by propagating the alignment through the entire stack (which is a waste of stack space for functions not using vectors btw), but the problem is it has to propagate ONE alignment, and they chose 16-bytes for this. This means it *cannot* align to 32-bytes without using AND or MOD or whatever.

This method is not scalable to the future, it is not extensible.

If they were to chose stack to be 32-byte alignment and propagate that, then you wouldn't need to realign the stack for AVX, at the expense of more wasted stack space (even for SSE functions which need only 16-byte alignment). However what about AVX512 or future vectors?

You see what I'm saying? Such a design is stupid because it is *fixed* and in the future you'll have to "realign" the stack anyway with new vector sizes. That's why the design is shit/flawed. It is flawed at the CORE in concept.

Now, if they were to "propagate" a 512-byte alignment on the stack, that would be a huge waste for functions not using vectors or AVX512. That's why it is obvious that such "solution" to propagate a fixed alignment is stupid.

Just let the damn vector functions realign the stack as they see fit.

What sense does it make to cater an ABI to a particular instruction set that you know will get replaced with better vectors?

Any program that uses

1) no vectors
2) better vectors than SSE

Will suffer from this. It is such a stupid design, seriously. But we're stuck with it and I understand I have to use it if programs are compiled for this ABI already. I understand it is way too late to change it now, that is WHY I am/was ranting originally.

Rants mean complaints that you don't really expect to solve anything you know Wink Just things that piss you off to rant about.
Post 04 Apr 2017, 16:57
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 361
Actually, you can even store the old rsp on the stack itself, if you don't need it to access parameters on the stack (i.e. function uses less than 5, if MS ABI, or whatever). Something like:


Code:
mov raxrsp
and rsp, -32
sub rsplocal_vars_size_misaligned_by_8
push rax

;...

pop rsp
ret

This is less bloated, you don't need add rsp at the end, you only need 1 pop, and no rbp used. You still need the AND though Wink

local_vars_size_misaligned_by_8 simply means the 'size' of your local vars, but such that if you add 8 to them, you get 32-byte alignment. Or whatever, you get the idea, that isn't important. Downside is you can't access anything "above" the return address -- that is, no parameters on the stack.

The point is, you have to store the original rsp just so you can return from the function, at the very least.
Post 04 Apr 2017, 17:34
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2005 phpBB Group.

Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2016, Tomasz Grysztar.