flat assembler
Message board for the users of flat assembler.
Index
> OS Construction > What is more faster? |
Author |
|
JohnFound 16 Dec 2018, 23:49
It depends.
The call by itself is faster with registers parameter passing. But! The effort needed to put the parameters into the needed registers may invalidate this advantage. And it will probably will. For example if you need to first save all your registers that you are using for the program variables into the stack. Then load the same registers with the proper arguments then to call the functions then to restore the registers from the stack. So, if the function is to be called from very limited places in the code, where you can arrange the working registers to contain needed values without moving them back and forth, make it with registers parameter passing. If the function is a general API functions aimed to be called from unknown in advance places, then use stack parameter passing. |
|||
16 Dec 2018, 23:49 |
|
revolution 17 Dec 2018, 00:53
Fulgurance wrote: Hello, i have just one pure technical question: what is more faster ? Pass function parameters with general register (ax,bx ...etc) or pass paramaters into stack ? (It's just question about speed, not for ethical question or other things) |
|||
17 Dec 2018, 00:53 |
|
DimonSoft 17 Dec 2018, 06:27
revolution wrote: There are far too many factors involved so there is no way to know the answer. Each body of code and each CPU/RAM combination will give different results. You will need to test it in your actual code (not in a synthetic test) to see which performs better for you on the systems you intend to use the code on. revolution, although I see the rationale behind your attitude towards questions about performance and mostly agree that measurement is the only way to know the truth (relative to specific setup), I wonder how do you adjust it with project time requirements and all this business stuff. I mean, it definitely is the way to go to write a good piece of software but that takes a lot of time for a reasonably large program, so there should be a trick to write and measure several solutions while having boss happy. What is the trick? |
|||
17 Dec 2018, 06:27 |
|
revolution 17 Dec 2018, 07:49
For a calling conventions (ccall, stdcall, ...) we can use macros. So you can change one macro, reassemble, then test the new version.
But in general while the above method may show some variations in runtime, it is quite a broad test. It is usually more prudent to identify the real bottlenecks in the code and work upon improving those places only. That way you can often get 90% improvement for 10% of the work. I doubt that just selecting a calling standard will give any significant improvements in performance. Real performance gains are more often found in optimising the critical loops (maybe by removing calls and inlining instead). But it depends upon the application and the system of course. |
|||
17 Dec 2018, 07:49 |
|
Fulgurance 17 Dec 2018, 10:06
JohnFound wrote: If the function is a general API functions aimed to be called from unknown in advance places, then use stack parameter passing. But that is posssible as well with registers, no ? I don't understand very well. |
|||
17 Dec 2018, 10:06 |
|
JohnFound 17 Dec 2018, 10:30
Fulgurance wrote:
It is possible. My point was that in the general case, register argument passing ca be even slower or equal to the stack parameter passing. See the example below. You can see that the stack variant contains less instructions and probably will be faster. I am not talking that it is much more readable. Code: RegFunction: ; arguments in eax, ebx, ecx, edx ; some code here. retn StackFunction: ; arguments .arg1, .arg2, .arg3, .arg4 ; some code here. retn 16 ; Use of RegFunction WithRegisters: mov ecx, 1000 ; first argument .loop1: mov edx, 1000 ; second argument .loop2: mov eax, ecx imul eax, edx ; this is the 3rd argument. ; prepare arguments push ecx edx ; save the loop counters in the stack. xchg eax, ecx ; first argument in eax, third argument in ecx mov ebx, edx mov edx, [array1 + 4*ecx] ; 4th argument is from an array. ; now call the function: call RegFunction ; then restore the loop counters pop edx ecx dec edx jnz .loop2 dec ecx jnz .loop1 ; end of program. ; Use of StackFunction: mov ecx, 1000 ; first argument .loop1: mov edx, 1000 ; second argument .loop2: mov eax, ecx imul eax, edx ; this is the 3rd argument. ; prepare arguments push [array1 + 4*ecx] push eax push edx push ecx ; now call the function: call StackFunction dec edx jnz .loop2 dec ecx jnz .loop1 _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
17 Dec 2018, 10:30 |
|
Fulgurance 17 Dec 2018, 12:55
If i understand good, pass parameters with registers force user generally to use more code ligns, and finally have program more slow ?
|
|||
17 Dec 2018, 12:55 |
|
sts-q 17 Dec 2018, 13:23
If I understand good, I would say:
Keep it in cpu registers if possible, because cpu regs are fast. Use stack if you do not have more registers to use, because RAM ( or some cache ) is slower. I think parameter passing is just one part of the story: I think you need to be certain about which function needs which registers and parameters, and then you can figure out when to spill regs to stack. The more registers there are, the more important this gets. Best Regards sts-q |
|||
17 Dec 2018, 13:23 |
|
JohnFound 17 Dec 2018, 13:31
Fulgurance wrote: If i understand good, pass parameters with registers force user generally to use more code ligns, and finally have program more slow ? Yes. And in addition, very often, in the function internal code, you will be forced to save these registers to the stack/memory simply because of the algorithm needs. In the case of the stack arguments passing they are already saved in the stack. One exception is if the function makes very simple processing, and has only one argument. In this case using a register for the argument can worth it. _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 Last edited by JohnFound on 17 Dec 2018, 13:32; edited 1 time in total |
|||
17 Dec 2018, 13:31 |
|
Ali.Z 17 Dec 2018, 13:31
its not all about calling conventions, consider instruction size, latency, algorithm length ... all these factors affect.
_________________ Asm For Wise Humans |
|||
17 Dec 2018, 13:31 |
|
revolution 17 Dec 2018, 15:24
Ali.A wrote: its not all about calling conventions, consider instruction size, latency, algorithm length ... all these factors affect. With regard to execution time modern CPUs are not deterministic! |
|||
17 Dec 2018, 15:24 |
|
JohnFound 17 Dec 2018, 17:09
revolution wrote: With regard to execution time modern CPUs are not deterministic! Well, well, not very agree. While, it is true, that the speed depends on too many factors, but it still does not vary from 0 do infinity. There is a bump on the Gaussian curve and more or less if we have two equivalent code chunks we can predict "much slower", "much faster" or "almost the same +-20%" speed they have. So, yes, if you want to know the exact speed you will never know it, even by making tests. But if you want to know approximately, it is possible to be determined by analyzing the code. After all, we need to choose some implementation when we are writing our programs. Creating all possible implementations and then comparing them with tests is obviously impossible. _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
17 Dec 2018, 17:09 |
|
revolution 17 Dec 2018, 17:54
What I mean is that with many other things happening asynchronously that are beyond the control or knowledge of the code there is no way to predict the runtime by statically looking at the code. Even on the same system and the same code, each time it is run it encounters different conditions due to outside events, and the contents of the cache (etc.), being different.
This is part of the testing regime. You try different variants of the code and see the real effect. Sometimes the results are inconclusive, i.e. the times of different runs appear to show no correlation, and from that you can say the difference is not important. Just use either version since both are about the same, At the finer scale you will find runtimes vary each time, this is normal, and you can try to characterise the variance (called SD in statistics) to get an idea for how much variance to expect. Then apply normal statistical tests (like Chi-square, T-test etc.) to see if there is a real measurable difference. But this level of comparison is only for the most extreme cases where the code is run on thousands of systems and even a small 1% change gives a significant benefit in time and/or cost. |
|||
17 Dec 2018, 17:54 |
|
edfed 20 Dec 2018, 14:01
it depends on the nature of the data you are computing on.
if data is the content of a file, let say you should first pass the filename with some pointer to a location in ram. after that, you will use std file stream api to read bytes into ram. then, you will have a pointer to this ram location. and finally, pass each bytes one by one, using registers to play on, before to save them to ram, then to file. as you can see, this typical usecase don't have any prefered method to pass parameters, but each step of the process will, or not. original C standard (used to orient the design of x86 architectures) will use stack for parameters, and EAX for return value. i myself use various parameters passing conventions depending on the abstraction level of the function, and the intrinsinc performance of this function. a read file function really don't need to pass parameters by optimised version. a signal computing math function will need a very optimised parameter passing. |
|||
20 Dec 2018, 14:01 |
|
guignol 16 Jan 2019, 18:44
this is exactly where AI comes with its compiling "skills"...
|
|||
16 Jan 2019, 18:44 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.