flat assembler
Message board for the users of flat assembler.

Index > Main > What is the fastest instruction to increment eaxregister?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
TheRedPill



Joined: 28 Jan 2013
Posts: 18
TheRedPill 03 Feb 2013, 01:30
Hi,

i wonder what the fatest code/instruction is, to increment the eax register on a 32 and 64 bit x84/x64 system. Is it the inc, the add or the xor variant or even something else?

thanks in advance

TRP
Post 03 Feb 2013, 01:30
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 03 Feb 2013, 01:35
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:46; edited 1 time in total
Post 03 Feb 2013, 01:35
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 03 Feb 2013, 05:19
You can't increment something with xor. In the remaining - it depends on the CPU, so you can't say it "generally speaking".
Post 03 Feb 2013, 05:19
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20757
Location: In your JS exploiting you and your system
revolution 03 Feb 2013, 07:03
The answer is: There is no fastest instruction.

Really.

Even on the same system and same CPU the execution/latency time of any instruction can change depending upon the exact circumstances and internal states.
Post 03 Feb 2013, 07:03
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1738
Location: Toronto, Canada
AsmGuru62 03 Feb 2013, 13:49
Intel does not recommend using INC/DEC if the next instruction is a conditional branch (like JE,JNE,JC,JNC, etc.).
It is better to use ADD/SUB for this case, because the INC/DEC modifying
only parts of the flags register and that fact will slow down the code.
Post 03 Feb 2013, 13:49
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 03 Feb 2013, 14:19
I think

add ecx,1

is faster than

inc ecx

This was discussed in several earlier threads.
Even if inc sounds simpler it is just a specialised addition.
The intermal ALU can normally add two values only and will treat an increment like an add 1. There is an internal register to load first with 1 to do the operation. However, add ecx,1 is not slower than inc ecx.
Post 03 Feb 2013, 14:19
View user's profile Send private message Send e-mail Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 03 Feb 2013, 14:46
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:45; edited 1 time in total
Post 03 Feb 2013, 14:46
View user's profile Send private message Reply with quote
ASM-Man



Joined: 11 Jan 2013
Posts: 64
ASM-Man 03 Feb 2013, 19:16
AsmGuru62 wrote:
Intel does not recommend using INC/DEC if the next instruction is a conditional branch (like JE,JNE,JC,JNC, etc.).
It is better to use ADD/SUB for this case, because the INC/DEC modifying
only parts of the flags register and that fact will slow down the code.


Hello AsmGuru62,
Thank you for the info. I'd love get more. Where did you get intel's recommendations like this?

_________________
I'm not a native speaker of the english language. So, if you find any mistake what I have written, you are free to fix for me or tell me on. Smile
Post 03 Feb 2013, 19:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20757
Location: In your JS exploiting you and your system
revolution 03 Feb 2013, 19:18
ASM-Man: Intel has been publishing an optimisation manual for many many years. Perhaps you would be interested to download it. Last time I looked it was on the same webpage as the architecture manuals.
Post 03 Feb 2013, 19:18
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1738
Location: Toronto, Canada
AsmGuru62 03 Feb 2013, 23:43
The latest manual is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

Right click on 'Download PDF' icon on the right side and select "Save Target As...".

At the very end of this document you will find the summary of all rules.
It is like "Ferengi Rules of Acquisition" -- very valuable!
Take a look at the Rule #33: it talks about replacing INC/DEC with ADD/SUB.
Post 03 Feb 2013, 23:43
View user's profile Send private message Send e-mail Reply with quote
ASM-Man



Joined: 11 Jan 2013
Posts: 64
ASM-Man 04 Feb 2013, 06:34
Revolution and AsmGuru62: Thanks very much! Smile
Post 04 Feb 2013, 06:34
View user's profile Send private message Reply with quote
TheRedPill



Joined: 28 Jan 2013
Posts: 18
TheRedPill 04 Feb 2013, 07:22
Thank you for the answers.
Post 04 Feb 2013, 07:22
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 04 Feb 2013, 13:12
HaHaAnonymous wrote:
If it does then the difference is very minimal (I consider non-significant for real use):


In other words, no matter if you use "inc ecx" or "add ecx,1" (at least here Razz).

But the difference if compare "inc [ecx]" and "inc ecx" is much greater for obvious reasons. Very Happy


If just imagine the technical realization of an increment operation it is really an addition. And doesn't matter if you add 1,2,4 or soemthing more. It's always the same cost. And you always have to take care of all 16, 32 or 64 bits because the register could have an value to increment.

In most programs an inc is not very useful, many times you have to add offsets of 2 for word, 4 for dword or 8 for qword.

So I wonder why INTEL still offered an extra opcode for an increment instruction and think this is only for historical reasons (compatibility of the first 8086 to the 8080 or Z80 cpu's). I wonder if ARM has still an inc or dec instruction - not sure about it. I think RISC processors won't have INC or DEC but anyway not sure. It's not that helpful but INTEL has many more specialized opcodes rarely used.
Post 04 Feb 2013, 13:12
View user's profile Send private message Send e-mail Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 04 Feb 2013, 13:50
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:44; edited 7 times in total
Post 04 Feb 2013, 13:50
View user's profile Send private message Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 04 Feb 2013, 17:42
I can not do anything against your paranoia.

add rax,1

is sufficient. There is no need to write 15 zeros in front.
It is just a kind of style.
Post 04 Feb 2013, 17:42
View user's profile Send private message Send e-mail Reply with quote
Bargest



Joined: 09 Feb 2012
Posts: 79
Location: Russia
Bargest 04 Feb 2013, 18:12
Maybe peeking this "1" from code will take more time than using a predefined value of "1" for increment.
In case of ADD eax, 1 processor needs to decode command (add), register (eax) and number (1). In case of INC eax processor needs to decode just command (inc) and register (eax).

UPDATE:
Yes, it is really a little bit faster.Smile
I wrote this code in my operation system.
Code:
cli
  mov r9, 25
.loop:
  mov rcx, 1000*1000*1000*10
  rdtsc
  shl rdx, 32
  add rdx, rax
  mov r8, rdx
  xor rax, rax
 @@:
    inc rax
  dec rcx
  jnz @b
  rdtsc
  shl rdx, 32
  add rdx, rax
  sub rdx, r8
  DebugOut rdx, clRed

  mov rcx, 1000*1000*1000*10
  rdtsc
  shl rdx, 32
  add rdx, rax
  mov r8, rdx
  xor rax, rax
 @@:
    add rax, 1
  dec rcx
  jnz @b
  rdtsc
  shl rdx, 32
  add rdx, rax
  sub rdx, r8
  DebugOut rdx, clGreen
  dec r9
  jnz .loop
sti  
    

Just one core was working and there were no programs running at all.
This stupid cycle with "dec rcx | jnz @b" was written to be shure there are no effects of flag tests or anything else, specific to jump.
Result is in the attachment. (count of ticks in HEX; red for INC, green for ADD).

UPDATE 2:
Tested with same aligns for both codes. Got practically the same result.


Description:
Filesize: 280.69 KB
Viewed: 18530 Time(s)

1234.jpg



_________________
jmp $ ; Happy end!


Last edited by Bargest on 04 Feb 2013, 19:22; edited 6 times in total
Post 04 Feb 2013, 18:12
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 04 Feb 2013, 18:39
The proper test of "inc" vs "add" is as described in the Intel manual. We have to benchmark following codes:

Code:
    inc   ecx
    jnz   .foo


   add  ecx, 1
   jnz   .foo    


In this code, ECX must start from 0 and then the branch will never be taken, but probably (according to Intel manual) will affect the performance.
Post 04 Feb 2013, 18:39
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 04 Feb 2013, 18:59
[ Post removed by author. ]


Last edited by HaHaAnonymous on 28 Feb 2015, 21:44; edited 1 time in total
Post 04 Feb 2013, 18:59
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1738
Location: Toronto, Canada
AsmGuru62 04 Feb 2013, 19:07
It is OK -- we'll just not take your results seriously!
Smile
Post 04 Feb 2013, 19:07
View user's profile Send private message Send e-mail Reply with quote
shutdownall



Joined: 02 Apr 2010
Posts: 517
Location: Munich
shutdownall 04 Feb 2013, 19:25
Bargest wrote:
Maybe peeking this "1" from code will take more time than using a predefined value of "1" for increment.
In case of ADD eax, 1 processor needs to decode command (add), register (eax) and number (1). In case of INC eax processor needs to decode just command (inc) and register (eax).


Well in general I wouldn't care it the 1 is loaded from the cpu cache or any internal storage area. It would always need a clock cylce to load the ALU with this value. It is just a little bit waste of memory but I wouldn't care about that in a 32 or 64 bit system. The cache is anyway filled line by line automatically in sequential way.

The main purpose to use INC is to keep the carry flag untouched which would be affected by ADD register,1. This has to be kept in mind. This could be good or bad in the same way. Wink
Post 04 Feb 2013, 19:25
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.