flat assembler
Message board for the users of flat assembler.

Index > Heap > Furs & system error: BFF

Goto page Previous  1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
lmao this kid.

system error wrote:
Furs, so called smart calling convention:

Code:
push qword [var10] 
push qword [var9] 
push qword [var8] 
push qword [var7] 
push qword [var6] 
push qword [var5] 
push qword [var4] 
push qword [var3] 
push qword [var2] 
push qword [var1]    


Which in CISC definitions, translate to this MAMMOTH, SUPERBLOATED micro-ops, just like his brain! ;D
That's not mine, that's stdcall. YOU said you only care of size now, we're not talking about performance (hey that is what you said!). You're such a bait joke. You asked for stdcall, I provided it, which is even smaller than mine (but doesn't use registers).

Also, prove your bullshit with a link. Prove what you say is true.

I did, two links to be exact. Go read them and get schooled.

For those who haven't read it: Proof why PUSH is faster than MOV: http://stackoverflow.com/questions/36631576/what-is-the-stack-engine-in-the-sandybridge-microarchitecture

You don't have to tell people what to do. They'll judge it for themselves. Doing otherwise is a sign of desperation like certain religious zealots. Wink

I'm sure they'll listen to a kid who doesn't know what he's talking about because he gets owned by actual proof. Where are your links to prove what you claim for newer processors (use any that supports AVX)?
Post 31 Mar 2017, 15:48
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
Excuse me, what? I really don't understand what you are saying.

Straight from the horse's mouth, so to speak: https://msdn.microsoft.com/en-us/library/ms235286.aspx

Any parameters beyond the first four must be stored on the stack, above the shadow store for the first four, prior to the call.


Oh, before I leave there's one more hard schooling to bitchslap your INCOMPETENT face:

"Any parameters beyond the first four must be stored on the stack"

MS didn't specify that you must use PUSH instruction to store extra params on the stack, YOU INCOMPETENT IDIOT

Checkmate and cheers! Very Happy
Post 31 Mar 2017, 16:02
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Yeah, except I didn't use PUSH for MS ABI. I used MOV. Why do you think it is so bloated? (you can't use PUSH properly in MS ABI, because of the Shadow Space, you'll have to PUSH 4 dummy values for that on every call).

Seriously, you can't even read assembly code?

I used PUSH for my ABI (which is similar to Linux ABI, minus alignment and passing 5 params in registers not 6, and caller clean ofc which is much more sane), and for stdcall.

You need to read the rules of chess first before claiming checkmate Wink


I mean yeah I'm at fault for getting mad over a clueless kid like you, that's because I didn't expect this forum to have losers, I thought only smarter people end up here, who can at least read your points and make some sense. My bad.
Post 31 Mar 2017, 16:05
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
I plagiarized MS64-ABI and call it my own. Now I regret my sorry life for picking on the wrong person to show off my INCOMPETENCY


Never mind. I understand how'd feel right now Very Happy
Now back to my original challenge:

Code:
func_fastcall:
    sub rsp,8
    ...
    add rsp,8
    ret    


vs your'e sacred INCOMPETENT stdcall

Code:
funct_furr_BLOATED_MICROOOOPPS:    
    ;PUSH RBP
    IF StackAddrSize = 64 
    THEN 
    IF OperandSize = 64 
    THEN 
    RSP ← RSP – 8; 
    Memory[SS:RSP] ← SRC; (* push quadword *) 
    ELSE IF OperandSize = 32 
    THEN 
    RSP ← RSP – 4; 
    Memory[SS:RSP] ← SRC; (* push dword *) 
    ELSE (* OperandSize = 16 *) 
    RSP ← RSP – 2; 
    Memory[SS:RSP] ← SRC; (* push word *) 
    FI; 
    ELSE IF StackAddrSize = 32 
    THEN 
    IF OperandSize = 64 
    THEN 
    ESP ← ESP – 8; 
    Memory[SS:ESP] ← SRC; (* push quadword *) 
    ELSE IF OperandSize = 32 
    THEN 
    ESP ← ESP – 4; 
    Memory[SS:ESP] ← SRC; (* push dword *) 
    ELSE (* OperandSize = 16 *) 
    ESP ← ESP – 2; 
    Memory[SS:ESP] ← SRC; (* push word *) 
    FI; 
    ELSE (* StackAddrSize = 16 *) 
    IF OperandSize = 32 
    THEN 
    SP ← SP – 4; 
    Memory[SS:SP] ← SRC; (* push dword *) 
    ELSE (* OperandSize = 16 *) 
    SP ← SP – 2; 
    Memory[SS:SP] ← SRC; (* push word *) 
    FI; 
    FI; 

    mov rbp,rsp
    sub rsp,INCOMPETENT LOCALS
    and rsp,-16
    .... do your INCOMPETENT things
   
    mov rsp,rbp
   
    ;POP RBP
     IF SS is loaded; 
     POP—Pop a Value from the Stack 
     THEN 
     IF segment selector is NULL 
     THEN #GP(0); 
     FI; 
     IF segment selector index is outside descriptor table limits 
     or segment selector's RPL ≠ CPL 
     or segment is not a writable data segment 
     or DPL ≠ CPL 
     THEN #GP(selector); 
     FI; 
     IF segment not marked present 
     THEN #SS(selector); 
     ELSE 
     SS ← segment selector; 
     SS ← segment descriptor; 
     FI; 
     FI; 
     IF DS, ES, FS, or GS is loaded with non-NULL selector; 
     THEN 
     IF segment selector index is outside descriptor table limits 
     or segment is not a data or readable code segment 
     or ((segment is a data or nonconforming code segment) 
     and (both RPL and CPL > DPL)) 
     THEN #GP(selector); 
     FI; 
     IF segment not marked present 
     THEN #NP(selector); 
     ELSE 
     SegmentRegister ← segment selector; 
     SegmentRegister ← segment descriptor; 
     FI; 
     FI; 
     IF DS, ES, FS, or GS is loaded with a NULL selector 
     THEN 
     SegmentRegister ← segment selector; 
     SegmentRegister ← segment descriptor; 
     FI;

     ret 8*INCOMPETENT IDIOT    


Bloated where? Twisted Evil
Post 01 Apr 2017, 03:54
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
^ Hopeless.

I'm sure you'll end up great in life ignoring facts and proofs while reiterating your childish nonsense over and over again. Because at the end of the day it doesn't matter how you believe the CPU works, sorry. Programming is not a religion.

Read this and understand why nothing of what you say has any value until you back it up with FACTS: https://en.wikipedia.org/wiki/Philosophical_burden_of_proof

My CPU for reference (which supports even AVX2) is Xeon E3-1241 v3, so it would be best if you would, for instance, do that for CPU of this generation (Haswell), I'm not interested in CPUs from 2003 sorry.

(if you insist on using 2003 CPUs, then you admit defeat; I did intentionally say MS ABI is trash and short-sighted which literally means "not future-proof" or "designed well for the future", for multiple reasons, not just AVX)
Post 01 Apr 2017, 11:01
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
^ Hopeless.

I'm sure you'll end up great in life ignoring facts and proofs while reiterating your childish nonsense over and over again. Because at the end of the day it doesn't matter how you believe the CPU works, sorry. Programming is not a religion.

Read this and understand why nothing of what you say has any value until you back it up with FACTS: https://en.wikipedia.org/wiki/Philosophical_burden_of_proof

My CPU for reference (which supports even AVX2) is Xeon E3-1241 v3, so it would be best if you would, for instance, do that for CPU of this generation (Haswell), I'm not interested in CPUs from 2003 sorry.

(if you insist on using 2003 CPUs, then you admit defeat; I did intentionally say MS ABI is trash and short-sighted which literally means "not future-proof" or "designed well for the future", for multiple reasons, not just AVX)


What do you mean I am ignoring facts and proofs?

On the contrary I am throwing in your INCOMPETENT face the machine-level micro ops semantics, even lower level than the encodings / mnemonics abstraction and there's no proofs and facts more valid than that.

You are simply an I.N.C.O.M.P.E.T.E.N.T big mouth, displaying all the known symptoms of "blaming the uneven floor" syndrome.

Case closed.
Post 01 Apr 2017, 15:39
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Your micro ops are IMAGINARY. Do you also believe in Flying Pink Unicorns or what? I gave you a link 3 times that newer CPUs have a stack machine that makes PUSH faster than MOV because it renames the stack pointer. Here educate yourself: https://en.wikipedia.org/wiki/Register_renaming. Idiot.

That's what you are ignoring. PROVE that your pseudo-code is anything but gibberish on my Haswell CPU (or anything Sandy Bridge+ which support AVX)! Prove that your "micro ops" are REAL on AVX+ CPUs.


Now, as a friendly tip: I hope you don't take the "instruction explanation" in the Intel Manuals as how they are actually implemented in hardware. That's for illustrative purposes only. You know, it's called pseudo-code (go ahead and click it, since you love ignoring facts).

It's showing you what its operation does, not how it is implemented in hardware. How gullible can you be?

You're a fool if you think hardware is just a different kind of software. Some operations in hardware (especially bit-related operations) can be done orders of magnitude faster than software.

Now as for PUSH, the link you ignored on purpose 3 times explicitly says the hardware renames the stack pointer on a PUSH. So bring up your sources that it is slower, i.e. prove your stupid micro ops are REAL or shut the fuck up already.

Because I've proven with that link that Sandy Bridge architecture uses one less micro op for a PUSH than a MOV.


Lastly: micro ops have *nothing* to do with bloat anyway. But with performance. Rolling Eyes

This entire time you haven't provided a single link. Stop claiming shit, I don't give a fuck what you CLAIM, i want PROOF
Post 01 Apr 2017, 16:00
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
Your micro ops are IMAGINARY. Lastly: micro ops have *nothing* to do with bloat anyway. But with performance. Rolling Eyes


Micro-Ops / Microcode are Imaginary? OMG!! Are you really that INCOMPETENT? It's in Intel/AMD manuals! Look at this guy! He makes Intel/AMD Manual on par with Fairy Tale books!!

HAHAHA. OMG I'am so loving this! Very Happy

So multiple 1 byte PUSH is faster than MOV on the grounds that PUSH do not have microcode or have 'stack engine'? Are you really that STUPID?

Here's some BASIC EDUCATION for you:

What does a PUSH do?

1. SUB current RSP
2. Save the content to the new RSP.

Where do you think this semantic gets implemented? ==>> MICROCODE. Micrcodes are not imaginary. They are just as real as your INCOMPETENCY in understanding the Intel/AMD manuals! Because it is a complex/CISC instruction. It doesn't have the simplicity of RISC direct implementations of

SUB RSP,8
MOV [RSP],RCX
...

On whatever processors (64-bit), PUSH-based instructions are generally slower by 30-40 percent than their RISC counterparts.

Try to time this code (You know how to use 64-bit windows QueryPerformcanceCounter, right? right? ;D )

Code:
funct_fastcall:
    sub rsp,8
    ...
    add rsp,8
    ret    


Loop it 10_000_000 times if you wished. Use SandyBridge, LSD or whatever you have on you so-called state-of-the-art PC. And compare the result to your sacred stdcall 1 byte multiple PUSHes (that you claimed don't have imaginary microcodes)

Code:
funct_furr_SUPERSLOW
    push rbp
    mov rbp,rsp
    sub rsp,INCOMPETENT LOCALS
    and rsp,-16 
     ...
    mov rsp,rbp
    pop rbp
    ret 8*INCOMPETENT BRAIN
        


Go time it, and see how INCOMPETENT you are. I encouraged you to use SANDYBRIDGE to help you with your incompetency too!

Hahahaha Very Happy
Post 01 Apr 2017, 17:11
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Are you retarded? I said YOUR micro ops are imaginary. YOUR is the keyword. Not all micro ops in general, but the ones YOU think are there. YOU do not know ANY micro op in current processors because Intel do not give this information, idiot.

They only give information about the operation of their processors and instructions, not the detailed implementation. That's a trade secret, obviously. A dummy like you would never get close to know this, lmao.

The rest of your post doesn't make any sense. I thought you wanted to benchmark PUSH vs MOV, not the function realignment of the stack? Don't sidetrack.



So because you are really so desperate, here's a benchmark moving 11 parameters to the stack and back to see the differences between PUSH/POP and MOV (I have to read back from the stack what was written, else the CPU would just execute the next write without waiting for the first write, which would nullify the point of the benchmark):

Code:
include 'include\win64ax.inc'

.code

start:

  invoke GetTickCount
  mov [pushes], eax

  mov ecx, 1 shl 28
  @@:
    push rax
    push rbx
    push rcx
    push rdx
    push rsi
    push rdi
    push rbp
    push rsp
    push r8
    push r9
    push r10
    pop r10
    pop r9
    pop r8
    pop rsp
    pop rbp
    pop rdi
    pop rsi
    pop rdx
    pop rcx
    pop rbx
    pop rax
    dec ecx
    jnz @b

  invoke GetTickCount
  sub eax, [pushes]
  mov [pushes], eax

  invoke GetTickCount
  mov [movs_], eax

  mov ecx, 1 shl 28
  sub rsp, 8*11
  @@:
    mov [rsp], rax
    mov [rsp+8], rbx
    mov [rsp+8*2], rcx
    mov [rsp+8*3], rdx
    mov [rsp+8*4], rsi
    mov [rsp+8*5], rdi
    mov [rsp+8*6], rbp
    mov [rsp+8*7], rsp
    mov [rsp+8*8], r8
    mov [rsp+8*9], r9
    mov [rsp+8*10], r10
    mov r10, [rsp+8*10]
    mov r9, [rsp+8*9]
    mov r8, [rsp+8*8]
    mov rsp, [rsp+8*7]
    mov rbp, [rsp+8*6]
    mov rdi, [rsp+8*5]
    mov rsi, [rsp+8*4]
    mov rdx, [rsp+8*3]
    mov rcx, [rsp+8*2]
    mov rbx, [rsp+8]
    mov rax, [rsp]
    dec ecx
    jnz @b
  add rsp, 8*11

  invoke GetTickCount
  sub eax, [movs_]

  invoke wsprintf,buf,str_fmt,[pushes],rax
  invoke MessageBox,HWND_DESKTOP,buf,"Test",MB_OK
  invoke ExitProcess,0

.end start

.data

pushes dd 0
movs_ dd 0
str_fmt db 'PUSH: %u ms',10,'MOV: %u ms',0
buf db 255 dup ?    
It displays:
Code:
PUSH: 1010 ms
MOV: 1012 ms    
Second run gave me:
Code:
PUSH: 1027 ms
MOV: 1039 ms    
Third run:
Code:
PUSH: 1037 ms
MOV: 1057 ms    


SO SHUT THE FUCK UP BECAUSE PUSH IS FASTER, DEAL WITH IT. The difference is negligible, though, so let's say they are the same speed but PUSH IS 5 TIMES SMALLER.

ON TOP OF THAT, IT IS 5 TIMES SMALLER IN BYTES, and BLOAT deals with SIZE, not PERFORMANCE, so ask again "bloated where" like a broken record.

Not going to waste any more time on this, pathetic you've not even supplied ONE LINK, didn't benchmark anything, all you do is TALK without knowing shit and 0 experience. So keep barking.



As for the "realignment of the stack" you posted, YOU NEED THAT FOR AVX even in MS ABI. Not my problem you don't code for AVX, so be quiet.

I mean this code:
Code:
AVX_func:
    push rbp
    mov rbp,rsp
    and rsp,-32
     ...
    mov rsp,rbp
    pop rbp    
^ that code YOU NEED for AVX. So what's your point? That MS ABI is so shit, it aligns the stack to the wrong vector size, even as we get AVX512 and beyond? Wastes 8 bytes for no reason, and you ask "bloated where"?!??

If not, show me your AVX function and 256-bit vectors on the stack, I want to see how proficient you are in coding for AVX and not outdated SSE.
Post 01 Apr 2017, 20:08
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
I still lose, I am sorry Sad


It's ok. Not the end of the world though Very Happy

It's shows otherwise on my laptop (Silvermont)

PUSH: 2625 ms
MOV: 2532 ms

Second run:

PUSH: 2891 ms
MOV: 2750 ms

Third run:
PUSH: 2875 ms
MOV: 2766 ms

Result:Microsoft ABI knocked you cold every time! Very Happy

Your calling convention is HUMILIATINGLY SLOW! And I wonder why you keep your sacred function prologue and epilogue out of the 'picture'. Are you hiding your INCOMPETENT FACE from the public? Hahahaha Very Happy
Post 01 Apr 2017, 21:17
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
HAHAHAHA Very Happy Very Happy
Post 01 Apr 2017, 21:18
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Quote:
Silvermont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel.
Why do you care about performance? Your CPU is shit for performance.

Calling other people incompetent because they have a better CPU is a loser mentality you know Wink

BTW the previous test has an artificial dependency on rsp via the pushes, try this one. I'm curious to the results, report with your CPU:

Code:
include 'include\win64ax.inc'

.code

start:

  invoke GetTickCount
  mov [pushes], eax

  mov ecx, 1 shl 28
  @@:
    push rax
    push rbx
    push rcx
    push rdx
    push rsi
    push rdi
    push rbp
    push r11
    push r8
    push r9
    push r10
    pop r10
    pop r9
    pop r8
    pop r11
    pop rbp
    pop rdi
    pop rsi
    pop rdx
    pop rcx
    pop rbx
    pop rax
    dec ecx
    jnz @b

  invoke GetTickCount
  sub eax, [pushes]
  mov [pushes], eax

  invoke GetTickCount
  mov [movs_], eax

  mov ecx, 1 shl 28
  sub rsp, 8*11
  @@:
    mov [rsp], rax
    mov [rsp+8], rbx
    mov [rsp+8*2], rcx
    mov [rsp+8*3], rdx
    mov [rsp+8*4], rsi
    mov [rsp+8*5], rdi
    mov [rsp+8*6], rbp
    mov [rsp+8*7], r11
    mov [rsp+8*8], r8
    mov [rsp+8*9], r9
    mov [rsp+8*10], r10
    mov r10, [rsp+8*10]
    mov r9, [rsp+8*9]
    mov r8, [rsp+8*8]
    mov r11, [rsp+8*7]
    mov rbp, [rsp+8*6]
    mov rdi, [rsp+8*5]
    mov rsi, [rsp+8*4]
    mov rdx, [rsp+8*3]
    mov rcx, [rsp+8*2]
    mov rbx, [rsp+8]
    mov rax, [rsp]
    dec ecx
    jnz @b
  add rsp, 8*11

  invoke GetTickCount
  sub eax, [movs_]

  invoke wsprintf,buf,str_fmt,[pushes],rax
  invoke MessageBox,HWND_DESKTOP,buf,"Test",MB_OK
  invoke ExitProcess,0

.end start

.data

pushes dd 0
movs_ dd 0
str_fmt db 'PUSH: %u ms',10,'MOV: %u ms',0
buf db 255 dup ?    
I simply replaced rsp with r11.

I got:
Code:
Push: 798 ms
Mov: 812 ms    
Though, such small differences are usually insignificant and "random".

Fact is, push is not slower on any modern CPU.

And nobody forces you to use push if the ABI is well designed.

You can always use mov instead of push, but you can't always use push instead of mov.

Hence the ABI design is shit since it should've tailored it for push -- if a processor has slow push, then simply compile with mov. You can't do that the other way around if the ABI is designed badly, without thinking of the future.


Last edited by Furs on 01 Apr 2017, 22:57; edited 1 time in total
Post 01 Apr 2017, 22:51
View user's profile Send private message Reply with quote
zhak



Joined: 12 Apr 2005
Posts: 490
Location: Belarus
zhak
I'm fascinated with your dispute. Just executed your sample. My Core i7-4790K gives (for three runs)
Code:
PUSH: 921, 905, 920
MOV:  858, 874, 874
    
Post 01 Apr 2017, 22:56
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Was that for the last one? Or the first code.

BTW I read system error's benchmark wrong. The difference is actually small, so I'm not sure what he's on about.

Such small difference compared to 5 times the instruction size, yeah. That's bloated.
Post 01 Apr 2017, 22:58
View user's profile Send private message Reply with quote
zhak



Joined: 12 Apr 2005
Posts: 490
Location: Belarus
zhak
The first one.
The last one gives:
Code:
PUSH: 687, 671, 702 
MOV:  686, 686, 671 
    
Post 01 Apr 2017, 23:11
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
Thanks, so basically they're the same speed (with random jitter, as expected). Either way, this confirms the stack machine register renaming.

I know it won't be enough for system error, but frankly I don't care.

Even in his benchmark, it's only 4% slower. His mumbo jumbo about micro ops clearly doesn't hold, since PUSH is "only" 4% slower on his CPU. Compared to 500% smaller (well 5:1 ratio) that's quite the bloat when using MOV. Razz

To put it in perspective, if MOV were to PUSH in size, what PUSH is to MOV in speed on his CPU, MOV would take 1.04 bytes versus 1 byte for PUSH. But MOV is 5 bytes...
Post 01 Apr 2017, 23:51
View user's profile Send private message Reply with quote
zhak



Joined: 12 Apr 2005
Posts: 490
Location: Belarus
zhak
But don't forget, that even one byte leads to wasting another 511 on disk space and 4095 in memory (assuming win executables with default file/section alignment). So, unless pushed to the limits, saving one or two bytes doesn't win anything
Post 02 Apr 2017, 00:30
View user's profile Send private message Reply with quote
zhak



Joined: 12 Apr 2005
Posts: 490
Location: Belarus
zhak
But on the other side even one byte saves 512 bytes on disk and one page in memory Smile
Post 02 Apr 2017, 00:34
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
Thanks, so basically they're the same speed (with random jitter, as expected). Either way, this confirms the stack machine register renaming.

I know it won't be enough for system error, but frankly I don't care.

Even in his benchmark, it's only 4% slower. His mumbo jumbo about micro ops clearly doesn't hold, since PUSH is "only" 4% slower on his CPU. Compared to 500% smaller (well 5:1 ratio) that's quite the bloat when using MOV. Razz

To put it in perspective, if MOV were to PUSH in size, what PUSH is to MOV in speed on his CPU, MOV would take 1.04 bytes versus 1 byte for PUSH. But MOV is 5 bytes...


No, you INCOMPETENT big mouth. The face-saving question that you should ask YOUR BRAIN righit now is; Why does an INCOMPETENT attempt to save so much space result in poor, substandard and inferior code on modern CPUs?. Remember we haven't applied any 'calling convention' in it yet, the thing that you been trying to hide from us --> you sacred function prologue and epilogue. HAHAHA Very Happy

To help you put it into perspective (for your own SCHOOLING), here's one of the possible answer;

Code:
a. You are not as INCOMPETENT as your BIG MOUTH
b. CISC/complex 'imaginary microcodes' don't work quite well on RISC / modern CPU
c. Both    


So, take your pick. There's no other possible choices.

Btw, thanks for the timing code. You really worked that hard just to prove to people that you're a BIG LOSER in the end! Congratulations!

Ayyyyye!! Hahahaha Very Happy
Post 02 Apr 2017, 06:41
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1471
Furs
zhak wrote:
But don't forget, that even one byte leads to wasting another 511 on disk space and 4095 in memory (assuming win executables with default file/section alignment). So, unless pushed to the limits, saving one or two bytes doesn't win anything
Well for a simple program yes, but I mean it does add up. You obviously aren't going to call just one function, and the ABI applies to ANY compliant function!

GCC devs saw a difference of between 5%-20% in executable sizes on average when changing from PUSH to MOV on Linux ABI (Linux ABI supports PUSH just fine with no overhead). So they reverted it. GCC defaults to PUSH by default for any ABI except MS ABI. (that means PUSH including on Linux 64-bit ABI). (the options -mpush-args and -mno-accumulate-outgoing-args I mean)

system error wrote:
No, you INCOMPETENT big mouth. The face-saving question that you should ask YOUR BRAIN righit now is; Why does an INCOMPETENT attempt to save so much space result in poor, substandard and inferior code on modern CPUs?.
Why does the reason matter? Stop grasping at straws in desperation

It's simple math.

Bloat = size. See: https://en.wikipedia.org/wiki/Software_bloat

Quote:
Quote:
Software bloat is a process whereby successive versions of a computer program become perceptibly slower, use more memory, DISK SPACE or processing power, or have higher hardware requirements than the previous version—whilst making only dubious user-perceptible improvements or suffering from feature creep.
In this case, let's analyze it:

Is 5 greater than 1? MOV is more bloated. So accept your defeat.
Is the performance the same for modern CPUs? Yes so we ignore it.

So please ask again "bloated where?!?" you incompetent kid.

Maybe you should learn some math or WORDS before asking questions! Seems all you can say is "incompetent" and "bloated where?!?" like a broken record.

system error wrote:
Remember we haven't applied any 'calling convention' in it yet, the thing that you been trying to hide from us --> you sacred function prologue and epilogue. HAHAHA Very Happy
No, that is unrelated. Linux 64-bit ABI uses PUSH unlike MS ABI, but it also aligns the stack to 16-bytes. So it is an UNRELATED thing.

Also THAT PROLOG/EPILOG IS **ONLY** IF THE FUNCTION USES SSE. Do you understand this simple fact? It is **ONLY** for SSE.

And by SSE, I mean SSE, not vectors. Not AVX, but SSE.

I don't use SSE, sorry, I use AVX. Mad? Why must I be stuck using stupid SSE because the ABI is short-sighted?

In case you still think you know what you're talking about, show me your AVX2 function and its prolog, or shut up.

I want to see your 32-byte vectors with your MS ABI, show me how "superior" it is! On the other hand, your MS ABI *wastes* a potential 8-bytes of stack per EACH function, whether they use SSE or not!!

Remember the definition of bloat? Yeah, one of them was "increased memory use" and stack is memory.
Post 02 Apr 2017, 11:17
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.