flat assembler
Message board for the users of flat assembler.
Index
> Main > What is the fastest instruction to increment eaxregister? Goto page 1, 2 Next |
Author |
|
HaHaAnonymous 03 Feb 2013, 01:35
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:46; edited 1 time in total |
|||
03 Feb 2013, 01:35 |
|
JohnFound 03 Feb 2013, 05:19
You can't increment something with xor. In the remaining - it depends on the CPU, so you can't say it "generally speaking".
|
|||
03 Feb 2013, 05:19 |
|
revolution 03 Feb 2013, 07:03
The answer is: There is no fastest instruction.
Really. Even on the same system and same CPU the execution/latency time of any instruction can change depending upon the exact circumstances and internal states. |
|||
03 Feb 2013, 07:03 |
|
AsmGuru62 03 Feb 2013, 13:49
Intel does not recommend using INC/DEC if the next instruction is a conditional branch (like JE,JNE,JC,JNC, etc.).
It is better to use ADD/SUB for this case, because the INC/DEC modifying only parts of the flags register and that fact will slow down the code. |
|||
03 Feb 2013, 13:49 |
|
shutdownall 03 Feb 2013, 14:19
I think
add ecx,1 is faster than inc ecx This was discussed in several earlier threads. Even if inc sounds simpler it is just a specialised addition. The intermal ALU can normally add two values only and will treat an increment like an add 1. There is an internal register to load first with 1 to do the operation. However, add ecx,1 is not slower than inc ecx. |
|||
03 Feb 2013, 14:19 |
|
HaHaAnonymous 03 Feb 2013, 14:46
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:45; edited 1 time in total |
|||
03 Feb 2013, 14:46 |
|
ASM-Man 03 Feb 2013, 19:16
AsmGuru62 wrote: Intel does not recommend using INC/DEC if the next instruction is a conditional branch (like JE,JNE,JC,JNC, etc.). Hello AsmGuru62, Thank you for the info. I'd love get more. Where did you get intel's recommendations like this? _________________ I'm not a native speaker of the english language. So, if you find any mistake what I have written, you are free to fix for me or tell me on. |
|||
03 Feb 2013, 19:16 |
|
revolution 03 Feb 2013, 19:18
ASM-Man: Intel has been publishing an optimisation manual for many many years. Perhaps you would be interested to download it. Last time I looked it was on the same webpage as the architecture manuals.
|
|||
03 Feb 2013, 19:18 |
|
AsmGuru62 03 Feb 2013, 23:43
The latest manual is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html Right click on 'Download PDF' icon on the right side and select "Save Target As...". At the very end of this document you will find the summary of all rules. It is like "Ferengi Rules of Acquisition" -- very valuable! Take a look at the Rule #33: it talks about replacing INC/DEC with ADD/SUB. |
|||
03 Feb 2013, 23:43 |
|
ASM-Man 04 Feb 2013, 06:34
Revolution and AsmGuru62: Thanks very much!
|
|||
04 Feb 2013, 06:34 |
|
TheRedPill 04 Feb 2013, 07:22
Thank you for the answers.
|
|||
04 Feb 2013, 07:22 |
|
shutdownall 04 Feb 2013, 13:12
HaHaAnonymous wrote: If it does then the difference is very minimal (I consider non-significant for real use): If just imagine the technical realization of an increment operation it is really an addition. And doesn't matter if you add 1,2,4 or soemthing more. It's always the same cost. And you always have to take care of all 16, 32 or 64 bits because the register could have an value to increment. In most programs an inc is not very useful, many times you have to add offsets of 2 for word, 4 for dword or 8 for qword. So I wonder why INTEL still offered an extra opcode for an increment instruction and think this is only for historical reasons (compatibility of the first 8086 to the 8080 or Z80 cpu's). I wonder if ARM has still an inc or dec instruction - not sure about it. I think RISC processors won't have INC or DEC but anyway not sure. It's not that helpful but INTEL has many more specialized opcodes rarely used. |
|||
04 Feb 2013, 13:12 |
|
HaHaAnonymous 04 Feb 2013, 13:50
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:44; edited 7 times in total |
|||
04 Feb 2013, 13:50 |
|
shutdownall 04 Feb 2013, 17:42
I can not do anything against your paranoia.
add rax,1 is sufficient. There is no need to write 15 zeros in front. It is just a kind of style. |
|||
04 Feb 2013, 17:42 |
|
Bargest 04 Feb 2013, 18:12
Maybe peeking this "1" from code will take more time than using a predefined value of "1" for increment.
In case of ADD eax, 1 processor needs to decode command (add), register (eax) and number (1). In case of INC eax processor needs to decode just command (inc) and register (eax). UPDATE: Yes, it is really a little bit faster. I wrote this code in my operation system. Code: cli mov r9, 25 .loop: mov rcx, 1000*1000*1000*10 rdtsc shl rdx, 32 add rdx, rax mov r8, rdx xor rax, rax @@: inc rax dec rcx jnz @b rdtsc shl rdx, 32 add rdx, rax sub rdx, r8 DebugOut rdx, clRed mov rcx, 1000*1000*1000*10 rdtsc shl rdx, 32 add rdx, rax mov r8, rdx xor rax, rax @@: add rax, 1 dec rcx jnz @b rdtsc shl rdx, 32 add rdx, rax sub rdx, r8 DebugOut rdx, clGreen dec r9 jnz .loop sti Just one core was working and there were no programs running at all. This stupid cycle with "dec rcx | jnz @b" was written to be shure there are no effects of flag tests or anything else, specific to jump. Result is in the attachment. (count of ticks in HEX; red for INC, green for ADD). UPDATE 2: Tested with same aligns for both codes. Got practically the same result.
_________________ jmp $ ; Happy end! Last edited by Bargest on 04 Feb 2013, 19:22; edited 6 times in total |
||||||||||
04 Feb 2013, 18:12 |
|
JohnFound 04 Feb 2013, 18:39
The proper test of "inc" vs "add" is as described in the Intel manual. We have to benchmark following codes:
Code: inc ecx jnz .foo add ecx, 1 jnz .foo In this code, ECX must start from 0 and then the branch will never be taken, but probably (according to Intel manual) will affect the performance. |
|||
04 Feb 2013, 18:39 |
|
HaHaAnonymous 04 Feb 2013, 18:59
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 21:44; edited 1 time in total |
|||
04 Feb 2013, 18:59 |
|
AsmGuru62 04 Feb 2013, 19:07
It is OK -- we'll just not take your results seriously!
|
|||
04 Feb 2013, 19:07 |
|
shutdownall 04 Feb 2013, 19:25
Bargest wrote: Maybe peeking this "1" from code will take more time than using a predefined value of "1" for increment. Well in general I wouldn't care it the 1 is loaded from the cpu cache or any internal storage area. It would always need a clock cylce to load the ALU with this value. It is just a little bit waste of memory but I wouldn't care about that in a 32 or 64 bit system. The cache is anyway filled line by line automatically in sequential way. The main purpose to use INC is to keep the carry flag untouched which would be affected by ADD register,1. This has to be kept in mind. This could be good or bad in the same way. |
|||
04 Feb 2013, 19:25 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.