flat assembler
Message board for the users of flat assembler.

Index > OS Construction > What is more faster?

Author
Thread Post new topic Reply to topic
Fulgurance



Joined: 27 Nov 2017
Posts: 276
Fulgurance 16 Dec 2018, 23:30
Hello, i have just one pure technical question: what is more faster ? Pass function parameters with general register (ax,bx ...etc) or pass paramaters into stack ? (It's just question about speed, not for ethical question or other things)
Post 16 Dec 2018, 23:30
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 16 Dec 2018, 23:49
It depends. Very Happy

The call by itself is faster with registers parameter passing. But!

The effort needed to put the parameters into the needed registers may invalidate this advantage. And it will probably will.

For example if you need to first save all your registers that you are using for the program variables into the stack. Then load the same registers with the proper arguments then to call the functions then to restore the registers from the stack.

So, if the function is to be called from very limited places in the code, where you can arrange the working registers to contain needed values without moving them back and forth, make it with registers parameter passing.

If the function is a general API functions aimed to be called from unknown in advance places, then use stack parameter passing.
Post 16 Dec 2018, 23:49
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 17 Dec 2018, 00:53
Fulgurance wrote:
Hello, i have just one pure technical question: what is more faster ? Pass function parameters with general register (ax,bx ...etc) or pass paramaters into stack ? (It's just question about speed, not for ethical question or other things)
There are far too many factors involved so there is no way to know the answer. Each body of code and each CPU/RAM combination will give different results. You will need to test it in your actual code (not in a synthetic test) to see which performs better for you on the systems you intend to use the code on.
Post 17 Dec 2018, 00:53
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 17 Dec 2018, 06:27
revolution wrote:
There are far too many factors involved so there is no way to know the answer. Each body of code and each CPU/RAM combination will give different results. You will need to test it in your actual code (not in a synthetic test) to see which performs better for you on the systems you intend to use the code on.

revolution, although I see the rationale behind your attitude towards questions about performance and mostly agree that measurement is the only way to know the truth (relative to specific setup), I wonder how do you adjust it with project time requirements and all this business stuff. I mean, it definitely is the way to go to write a good piece of software but that takes a lot of time for a reasonably large program, so there should be a trick to write and measure several solutions while having boss happy. What is the trick?
Post 17 Dec 2018, 06:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 17 Dec 2018, 07:49
For a calling conventions (ccall, stdcall, ...) we can use macros. So you can change one macro, reassemble, then test the new version.

But in general while the above method may show some variations in runtime, it is quite a broad test. It is usually more prudent to identify the real bottlenecks in the code and work upon improving those places only. That way you can often get 90% improvement for 10% of the work.

I doubt that just selecting a calling standard will give any significant improvements in performance. Real performance gains are more often found in optimising the critical loops (maybe by removing calls and inlining instead). But it depends upon the application and the system of course.
Post 17 Dec 2018, 07:49
View user's profile Send private message Visit poster's website Reply with quote
Fulgurance



Joined: 27 Nov 2017
Posts: 276
Fulgurance 17 Dec 2018, 10:06
JohnFound wrote:
If the function is a general API functions aimed to be called from unknown in advance places, then use stack parameter passing.


But that is posssible as well with registers, no ? I don't understand very well.
Post 17 Dec 2018, 10:06
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 17 Dec 2018, 10:30
Fulgurance wrote:
JohnFound wrote:
If the function is a general API functions aimed to be called from unknown in advance places, then use stack parameter passing.


But that is posssible as well with registers, no ? I don't understand very well.


It is possible. My point was that in the general case, register argument passing ca be even slower or equal to the stack parameter passing. See the example below. You can see that the stack variant contains less instructions and probably will be faster. I am not talking that it is much more readable.

Code:
RegFunction: ; arguments in eax, ebx, ecx, edx
        ; some code here.
        retn


StackFunction: ; arguments .arg1, .arg2, .arg3, .arg4
        ; some code here.
        retn 16


; Use of RegFunction

WithRegisters:

        mov     ecx, 1000       ; first argument

.loop1:

        mov     edx, 1000       ; second argument


.loop2:
        mov     eax, ecx
        imul    eax, edx        ; this is the 3rd argument.

; prepare arguments

        push    ecx edx ; save the loop counters in the stack.

        xchg    eax, ecx        ; first argument in eax, third argument in ecx
        mov     ebx, edx
        mov     edx, [array1 + 4*ecx]   ; 4th argument is from an array.


; now call the function:

        call    RegFunction

; then restore the loop counters

        pop     edx ecx

        dec     edx
        jnz     .loop2

        dec     ecx
        jnz     .loop1

; end of program.



; Use of StackFunction:

        mov     ecx, 1000       ; first argument

.loop1:

        mov     edx, 1000       ; second argument


.loop2:
        mov     eax, ecx
        imul    eax, edx        ; this is the 3rd argument.

; prepare arguments
        push    [array1 + 4*ecx]
        push    eax
        push    edx
        push    ecx

; now call the function:

        call    StackFunction

        dec     edx
        jnz     .loop2

        dec     ecx
        jnz     .loop1
    

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 17 Dec 2018, 10:30
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Fulgurance



Joined: 27 Nov 2017
Posts: 276
Fulgurance 17 Dec 2018, 12:55
If i understand good, pass parameters with registers force user generally to use more code ligns, and finally have program more slow ?
Post 17 Dec 2018, 12:55
View user's profile Send private message Reply with quote
sts-q



Joined: 29 Nov 2018
Posts: 57
sts-q 17 Dec 2018, 13:23
If I understand good, I would say:
Keep it in cpu registers if possible, because cpu regs are fast.
Use stack if you do not have more registers to use, because RAM ( or some cache ) is slower.

I think parameter passing is just one part of the story:
I think you need to be certain about which function needs which registers and parameters,
and then you can figure out when to spill regs to stack.

The more registers there are, the more important this gets.

Best Regards
sts-q
Post 17 Dec 2018, 13:23
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 17 Dec 2018, 13:31
Fulgurance wrote:
If i understand good, pass parameters with registers force user generally to use more code ligns, and finally have program more slow ?


Yes. And in addition, very often, in the function internal code, you will be forced to save these registers to the stack/memory simply because of the algorithm needs. In the case of the stack arguments passing they are already saved in the stack.

One exception is if the function makes very simple processing, and has only one argument. In this case using a register for the argument can worth it.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9


Last edited by JohnFound on 17 Dec 2018, 13:32; edited 1 time in total
Post 17 Dec 2018, 13:31
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 726
Ali.Z 17 Dec 2018, 13:31
its not all about calling conventions, consider instruction size, latency, algorithm length ... all these factors affect.

_________________
Asm For Wise Humans
Post 17 Dec 2018, 13:31
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 17 Dec 2018, 15:24
Ali.A wrote:
its not all about calling conventions, consider instruction size, latency, algorithm length ... all these factors affect.
Yes. Plus many other factors; buffers, caches, ports, cores, SMT, exceptions, ... so many factors and no way to predict which factors are most important without trying it.

With regard to execution time modern CPUs are not deterministic!
Post 17 Dec 2018, 15:24
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 17 Dec 2018, 17:09
revolution wrote:
With regard to execution time modern CPUs are not deterministic!


Well, well, not very agree. While, it is true, that the speed depends on too many factors, but it still does not vary from 0 do infinity.

There is a bump on the Gaussian curve and more or less if we have two equivalent code chunks we can predict "much slower", "much faster" or "almost the same +-20%" speed they have. Very Happy

So, yes, if you want to know the exact speed you will never know it, even by making tests. But if you want to know approximately, it is possible to be determined by analyzing the code.

After all, we need to choose some implementation when we are writing our programs. Creating all possible implementations and then comparing them with tests is obviously impossible.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 17 Dec 2018, 17:09
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 17 Dec 2018, 17:54
What I mean is that with many other things happening asynchronously that are beyond the control or knowledge of the code there is no way to predict the runtime by statically looking at the code. Even on the same system and the same code, each time it is run it encounters different conditions due to outside events, and the contents of the cache (etc.), being different.

This is part of the testing regime. You try different variants of the code and see the real effect. Sometimes the results are inconclusive, i.e. the times of different runs appear to show no correlation, and from that you can say the difference is not important. Just use either version since both are about the same,

At the finer scale you will find runtimes vary each time, this is normal, and you can try to characterise the variance (called SD in statistics) to get an idea for how much variance to expect. Then apply normal statistical tests (like Chi-square, T-test etc.) to see if there is a real measurable difference. But this level of comparison is only for the most extreme cases where the code is run on thousands of systems and even a small 1% change gives a significant benefit in time and/or cost.
Post 17 Dec 2018, 17:54
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4352
Location: Now
edfed 20 Dec 2018, 14:01
it depends on the nature of the data you are computing on.

if data is the content of a file, let say you should first pass the filename with some pointer to a location in ram.
after that, you will use std file stream api to read bytes into ram.
then, you will have a pointer to this ram location.

and finally, pass each bytes one by one, using registers to play on, before to save them to ram, then to file.

as you can see, this typical usecase don't have any prefered method to pass parameters, but each step of the process will, or not.

original C standard (used to orient the design of x86 architectures) will use stack for parameters, and EAX for return value.

i myself use various parameters passing conventions depending on the abstraction level of the function, and the intrinsinc performance of this function.

a read file function really don't need to pass parameters by optimised version.
a signal computing math function will need a very optimised parameter passing.
Post 20 Dec 2018, 14:01
View user's profile Send private message Visit poster's website Reply with quote
guignol



Joined: 06 Dec 2008
Posts: 763
guignol 16 Jan 2019, 18:44
this is exactly where AI comes with its compiling "skills"...
Post 16 Jan 2019, 18:44
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.