flat assembler
Message board for the users of flat assembler.

Index > Main > jcc vs cmov - which is faster?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 02 May 2009, 19:37
Hello!
I have simple question - which function is faster:
Code:
mov eax, [esp+4]
cmp al, '0'
jb .false
cmp al, '9'
ja .false
mov eax, 1
ret
.false:
xor eax, eax
ret    
or
Code:
mov edx, [esp+4]
xor ecx, ecx
mov eax, 1
cmp edx, '0'
cmovb eax, ecx
cmp edx, '9'
cmova eax, ecx
ret    
?

_________________
Sorry for my English...
Post 02 May 2009, 19:37
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 02 May 2009, 20:20
Sorry for not replying your question, I'll just add an extra variant:
Code:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 9
setbe al
movzx eax, al
ret    
Post 02 May 2009, 20:20
View user's profile Send private message Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 02 May 2009, 20:46
Huh, I didn't know there are set* instructions... I think your code must be really faster than mine two, how many cycles it takes?

_________________
Sorry for my English...
Post 02 May 2009, 20:46
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 02 May 2009, 20:50
It's impossible to determine that on current processors (I mean the number of clock cycles, not the 'vague idea' that it is faster, which it most likely is). Try to test it. (run it in a huge loop, so it repeats many times)
Post 02 May 2009, 20:50
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 02 May 2009, 21:02
Also what about this?

Code:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 10
sbb   eax, eax
ret    
Note that this gives -1 instead of 1. Can you use that instead? Wink

_________________
Previously known as The_Grey_Beast
Post 02 May 2009, 21:02
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 02 May 2009, 21:02
Somewhat hard to know these days but I believe that in my Athlon64 it would take 7 cycles (three for "mov eax, [esp+4]") not counting RET.

BTW, if your function is allowed to return non-zero as true instead of strictly one then this may be faster:
Code:
mov   eax, -'0'
add   eax, [esp+4] ; PERHAPS the CPU will start executing this along with "mov eax, -'0'" since EAX value is not needed while in bus read cycle (i.e. this instruction is not just one micro-op)
cmp   eax, 10
sbb   eax, eax ; -1 if [esp+4] is in range, 0 otherwise
ret    


[edit]hehe, Borsuc was seconds faster than me Razz (But check the small difference)[/edit]
Post 02 May 2009, 21:02
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 03 May 2009, 04:41
If this is stdcall should it use
Code:
retn 4    
to cleanup parameter?

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 03 May 2009, 04:41
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 03 May 2009, 05:03
Yes, if it is stdcall, but it is OK for cdecl.
Post 03 May 2009, 05:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 03 May 2009, 06:01
Just for giggles: Here is similar in ARM
Code:
  cmp     r0,'0'                ;carry is set if r0 >= '0'
     rsbcss  r0,r0,'9'     ;carry is set if '0' <= r0 <= '9'
 movcs   r0,1            ;return 1 if within range
   movcc   r0,0            ;return 0 if out of range
   mov     pc,lr    
Post 03 May 2009, 06:01
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 03 May 2009, 07:53
LocoDelAssembly wrote:
Yes, if it is stdcall, but it is OK for cdecl.
So to invoke it as ccall i could do like this?
Code:
push [value]
call IsDigit
jcc
...
add esp,4    

Oh, and by the way, these are some nice methods guys Razz

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 03 May 2009, 07:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 03 May 2009, 08:17
bitshifter wrote:
So to invoke it as ccall i could do like this?
Code:
push [value]
call IsDigit
jcc
...
add esp,4    
You might want to consider this instead:
Code:
push [value]
call IsDigit
add esp,4
cmp eax,0
jcc somewhere
...    
Post 03 May 2009, 08:17
View user's profile Send private message Visit poster's website Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 03 May 2009, 09:50
I've tested speed of these functions with this code:
Code:
format PE console 4.0

entry _start

include '%FASMINC%/win32ax.inc'

section '__text__' code readable executable

macro tester func
{
  local ..loop
  call [GetTickCount]
  mov [timestart], eax
  mov ebx, 0FFFFFFFFh
  ..loop:
    push '0'
    call func
    add esp, 4
    sub ebx, 1
    jnz ..loop
  call [GetTickCount]
  sub eax, [timestart]
  push eax
  push fmt
  call [printf]
  add esp, 8
}

_start:
  tester my1
  tester my2
  tester locos1
  tester borsucs
  tester locos2
  xor eax, eax
  ret
  
align 16
my1:
  mov eax, [esp+4]
  cmp al, '0'
  jb .false
  cmp al, '9'
  ja .false
  mov eax, 1
  ret
  .false:
  xor eax, eax
  ret
  
align 16
my2:
  mov edx, [esp+4]
  xor ecx, ecx
  mov eax, 1
  cmp edx, '0'
  cmovb eax, ecx
  cmp edx, '9'
  cmova eax, ecx
  ret
  
align 16
locos1:
  mov   eax, [esp+4]
  sub   eax, '0'
  cmp   eax, 9
  setbe al
  movzx eax, al
  ret
  
align 16
borsucs:
  mov   eax, [esp+4]
  sub   eax, '0'
  cmp   eax, 10
  sbb   eax, eax
  ret
  
align 16
locos2:
  mov   eax, -'0'
  add   eax, [esp+4]
  cmp   eax, 10
  sbb   eax, eax
  ret
  
section '__data__' data readable writable

  fmt db "%d ", 0
  timestart dd 0
  
section '_import_' import readable

  library msvcrt, 'msvcrt.dll',\
    kernel32, 'kernel32.dll'
    
  import msvcrt,\
    printf, 'printf'
    
  include '%FASMINC%/api/kernel32.inc'    
and... results are looking strange for me: 11578 13250 13250 11579 11593. After simple change in testing code -
Code:
mov [timestart], eax    
replaced by
Code:
mov edi, eax    
and
Code:
sub eax, [timestart]    
by
Code:
sub eax, edi    
results are very different: 22531 11594 13234 11594 9922. Why?

_________________
Sorry for my English...
Post 03 May 2009, 09:50
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 03 May 2009, 10:02
Yes indeed, the standard problems with optimisation. It is not an easy thing to do well. There are many things that can affect timing. Things like: which CPU, memory speed, code alignment, operating system, cache size, other running tasks, and so on. You have embarked on a difficult task and there is no simple solution. And any solution you may find will likely only work for your system setup and within that test section only.

One question you should ask yourself is "Is it really going to save me enough time compared to how much time I spend optimising it?" If the answer is yes then go right ahead and optimise it, but one thing to remember is that an isolated piece of code cannot be timed properly as a stand alone section unless you take some very special precautions. This is because the rest of the program, and the rest of the OS, will strongly affect the timings.
Post 03 May 2009, 10:02
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 03 May 2009, 14:52
revolution wrote:
You might want to consider this instead:
Code:
push [value]
call IsDigit
add esp,4
cmp eax,0
jcc somewhere
...    
Why cmp eax, 0 and not test eax, eax? Confused

By the way, if you want to use jcc directly immediately after the function without storing the result in eax, use this:

Code:
mov   eax, -'0'
add   eax, [esp+4]
cmp   eax, 10
retn  4    
This will set the CARRY flag, if it's outside the range. Wink
Notice that this will NOT set eax to 0 or -1. eax will represent the "parameter - '0'" in this case.

So you can use something like this:
Code:
push [value]
call IsDigit
jc NotDigit    
but even better would be to pass eax directly as a parameter, instead of pushing it on the stack -- if you use assembly that is, and a custom calling convention Wink

_________________
Previously known as The_Grey_Beast
Post 03 May 2009, 14:52
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20448
Location: In your JS exploiting you and your system
revolution 03 May 2009, 15:12
Borsuc wrote:
Why cmp eax, 0 and not test eax, eax?
Okay, that will suffice also. Just a hand over from my ARM coding for the last many months. In ARM cmp has no disadvantage.

BTW: If you want it to be "faster" (in whatever context you are choosing to define that) then make it a macro and forgo the call/ret overhead ...
Code:
macro IsDigit value {
  movzx eax,byte[value]
  sub eax,'0'
  cmp eax,10
}    
... but watch your cache size constraint, in some situations it might hurt more than helps.
Post 03 May 2009, 15:12
View user's profile Send private message Visit poster's website Reply with quote
manfred



Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred 03 May 2009, 15:26
Thanks for replies!
By the way - what can I do with that function:
Code:
mov eax, [esp+4]
cmp eax, 'A'
jb .false
cmp eax, 'Z'
ja .checklower
jmp .true
.checklower:
cmp eax, 'a'
jb .false
cmp eax, 'z'
ja .false
.true:
mov eax, 1
ret
.false:
xor eax, eax
ret    
? Is this good:
Code:
;fastcall, return by cf
and eax, 0DFh
sub eax, 'A'
cmp eax, 26
ret    
?

_________________
Sorry for my English...
Post 03 May 2009, 15:26
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 03 May 2009, 15:31
manfred, I've tested with the following code:
Code:
format PE console 4.0

entry _start

include 'win32ax.inc'

section '__text__' code readable executable

macro tester func
{
  local ..loop

  invoke Sleep, 1000

  xor eax, eax
  cpuid

  call [GetTickCount]
  mov [timestart], eax
  mov ebx, $80000000

  align 16
  ..loop:
    push ebx ; Instead of push 0 to "sabotage" the branch predictor at my1 a bit
    call func
    add esp, 4
    sub ebx, 1
    jnz ..loop

; Serialize
  xor eax, eax
  cpuid

  call [GetTickCount]
  sub eax, [timestart]
  push eax
  call @f
  db `func, 0
@@:
  push fmt
  call [printf]
  add esp, 12
  align 16
}

_start:

  invoke  GetCurrentProcess
  invoke  SetPriorityClass, eax, REALTIME_PRIORITY_CLASS
  invoke  GetCurrentThread
  invoke  SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL

  tester my1
  tester my2
  tester locos1
  tester borsucs
  tester locos2
  xor eax, eax
  ret
  
align 16
my1:
  mov eax, [esp+4]
  cmp al, '0'
  jb .false
  cmp al, '9'
  ja .false
  mov eax, 1
  ret
  .false:
  xor eax, eax
  ret
  
align 16
my2:
  mov edx, [esp+4]
  xor ecx, ecx
  mov eax, 1
  cmp edx, '0'
  cmovb eax, ecx
  cmp edx, '9'
  cmova eax, ecx
  ret
  
align 16
locos1:
  mov   eax, [esp+4]
  sub   eax, '0'
  cmp   eax, 9
  setbe al
  movzx eax, al
  ret
  
align 16
borsucs:
  mov   eax, [esp+4]
  sub   eax, '0'
  cmp   eax, 10
  sbb   eax, eax
  ret
  
align 16
locos2:
  mov   eax, -'0'
  add   eax, [esp+4]
  cmp   eax, 10
  sbb   eax, eax
  ret
  
section '__data__' data readable writable

  fmt db "%s: ", "%dms", 10, 0
  timestart dd 0
  
section '_import_' import readable

  library msvcrt, 'msvcrt.dll',\
    kernel32, 'kernel32.dll'
    
  import msvcrt,\
    printf, 'printf'
    
  include 'api/kernel32.inc'    


Results with an Athlon64 (Venice):

Code:
C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7563ms
locos1: 7562ms
borsucs: 6484ms
locos2: 6468ms

C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7563ms
locos1: 7563ms
borsucs: 6485ms
locos2: 6485ms

C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7562ms
locos1: 7562ms
borsucs: 6469ms
locos2: 6485ms    


Looks that my2==locos1 and borsucs==locos2.
Post 03 May 2009, 15:31
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 09 Jun 2009, 10:33
Borsuc wrote:
revolution wrote:
You might want to consider this instead:
Code:
push [value]
call IsDigit
add esp,4
cmp eax,0
jcc somewhere
...    
Why cmp eax, 0 and not test eax, eax? Confused

By the way, if you want to use jcc directly immediately after the function without storing the result in eax, use this:

Code:
mov   eax, -'0'
add   eax, [esp+4]
cmp   eax, 10
retn  4    
This will set the CARRY flag, if it's outside the range. Wink
Notice that this will NOT set eax to 0 or -1. eax will represent the "parameter - '0'" in this case.

So you can use something like this:
Code:
push [value]
call IsDigit
jc NotDigit    
but even better would be to pass eax directly as a parameter, instead of pushing it on the stack -- if you use assembly that is, and a custom calling convention Wink
Or fastcall
Post 09 Jun 2009, 10:33
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Nikolay Petrov



Joined: 22 Apr 2004
Posts: 101
Location: Bulgaria
Nikolay Petrov 29 Jun 2009, 22:32
Linear algorithms are always faster.
for example:
Code:
align 16
isdec:
    movzx   eax, byte [esp+4]
    movzx   eax, byte [_is_dec_table+eax]
    ret

align 16
_is_dec_table:
    db 48 dup(0)
    db 10 dup(1)
    db 198 dup(0)    

_________________
regards
Post 29 Jun 2009, 22:32
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 30 Jun 2009, 01:23
Nikolay Petrov wrote:
Linear algorithms are always faster.
Always?


align 16
isn't0:
ret


align 16
isn't1:
dec eax
ret

align 16
isn't-1:
inc eax
ret

align 16
isodd:
and eax,1
ret

align 16
issigned:
and eax,31 shl 8
ret


Can you make all of those faster with linear algorithms? Razz


How about SHA-512?
Post 30 Jun 2009, 01:23
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.