flat assembler
Message board for the users of flat assembler.

 Index > Main > jcc vs cmov - which is faster? Goto page 1, 2  Next
Author
manfred

Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred
Hello!
I have simple question - which function is faster:
Code:
```mov eax, [esp+4]
cmp al, '0'
jb .false
cmp al, '9'
ja .false
mov eax, 1
ret
.false:
xor eax, eax
ret    ```
or
Code:
```mov edx, [esp+4]
xor ecx, ecx
mov eax, 1
cmp edx, '0'
cmovb eax, ecx
cmp edx, '9'
cmova eax, ecx
ret    ```
?

_________________
Sorry for my English...
02 May 2009, 19:37
LocoDelAssembly

Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Code:
```mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 9
setbe al
movzx eax, al
ret    ```
02 May 2009, 20:20
manfred

Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred
Huh, I didn't know there are set* instructions... I think your code must be really faster than mine two, how many cycles it takes?

_________________
Sorry for my English...
02 May 2009, 20:46
Borsuc

Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
It's impossible to determine that on current processors (I mean the number of clock cycles, not the 'vague idea' that it is faster, which it most likely is). Try to test it. (run it in a huge loop, so it repeats many times)
02 May 2009, 20:50
Borsuc

Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc

Code:
```mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 10
sbb   eax, eax
ret    ```
Note that this gives -1 instead of 1. Can you use that instead?

_________________
Previously known as The_Grey_Beast
02 May 2009, 21:02
LocoDelAssembly

Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Somewhat hard to know these days but I believe that in my Athlon64 it would take 7 cycles (three for "mov eax, [esp+4]") not counting RET.

BTW, if your function is allowed to return non-zero as true instead of strictly one then this may be faster:
Code:
```mov   eax, -'0'
add   eax, [esp+4] ; PERHAPS the CPU will start executing this along with "mov eax, -'0'" since EAX value is not needed while in bus read cycle (i.e. this instruction is not just one micro-op)
cmp   eax, 10
sbb   eax, eax ; -1 if [esp+4] is in range, 0 otherwise
ret    ```

hehe, Borsuc was seconds faster than me (But check the small difference)[/edit]
02 May 2009, 21:02
bitshifter

Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
If this is stdcall should it use
Code:
`retn 4    `
to cleanup parameter?

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
03 May 2009, 04:41
LocoDelAssembly

Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Yes, if it is stdcall, but it is OK for cdecl.
03 May 2009, 05:03
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17621
revolution
Just for giggles: Here is similar in ARM
Code:
```  cmp     r0,'0'                ;carry is set if r0 >= '0'
rsbcss  r0,r0,'9'     ;carry is set if '0' <= r0 <= '9'
movcs   r0,1            ;return 1 if within range
movcc   r0,0            ;return 0 if out of range
mov     pc,lr    ```
03 May 2009, 06:01
bitshifter

Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
LocoDelAssembly wrote:
Yes, if it is stdcall, but it is OK for cdecl.
So to invoke it as ccall i could do like this?
Code:
```push [value]
call IsDigit
jcc
...

Oh, and by the way, these are some nice methods guys

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
03 May 2009, 07:53
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17621
revolution
bitshifter wrote:
So to invoke it as ccall i could do like this?
Code:
```push [value]
call IsDigit
jcc
...
You might want to consider this instead:
Code:
```push [value]
call IsDigit
cmp eax,0
jcc somewhere
...    ```
03 May 2009, 08:17
manfred

Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred
I've tested speed of these functions with this code:
Code:
```format PE console 4.0

entry _start

include '%FASMINC%/win32ax.inc'

macro tester func
{
local ..loop
call [GetTickCount]
mov [timestart], eax
mov ebx, 0FFFFFFFFh
..loop:
push '0'
call func
sub ebx, 1
jnz ..loop
call [GetTickCount]
sub eax, [timestart]
push eax
push fmt
call [printf]
}

_start:
tester my1
tester my2
tester locos1
tester borsucs
tester locos2
xor eax, eax
ret

align 16
my1:
mov eax, [esp+4]
cmp al, '0'
jb .false
cmp al, '9'
ja .false
mov eax, 1
ret
.false:
xor eax, eax
ret

align 16
my2:
mov edx, [esp+4]
xor ecx, ecx
mov eax, 1
cmp edx, '0'
cmovb eax, ecx
cmp edx, '9'
cmova eax, ecx
ret

align 16
locos1:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 9
setbe al
movzx eax, al
ret

align 16
borsucs:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 10
sbb   eax, eax
ret

align 16
locos2:
mov   eax, -'0'
cmp   eax, 10
sbb   eax, eax
ret

fmt db "%d ", 0
timestart dd 0

library msvcrt, 'msvcrt.dll',\
kernel32, 'kernel32.dll'

import msvcrt,\
printf, 'printf'

include '%FASMINC%/api/kernel32.inc'    ```
and... results are looking strange for me: 11578 13250 13250 11579 11593. After simple change in testing code -
Code:
`mov [timestart], eax    `
replaced by
Code:
`mov edi, eax    `
and
Code:
`sub eax, [timestart]    `
by
Code:
`sub eax, edi    `
results are very different: 22531 11594 13234 11594 9922. Why?

_________________
Sorry for my English...
03 May 2009, 09:50
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17621
revolution
Yes indeed, the standard problems with optimisation. It is not an easy thing to do well. There are many things that can affect timing. Things like: which CPU, memory speed, code alignment, operating system, cache size, other running tasks, and so on. You have embarked on a difficult task and there is no simple solution. And any solution you may find will likely only work for your system setup and within that test section only.

One question you should ask yourself is "Is it really going to save me enough time compared to how much time I spend optimising it?" If the answer is yes then go right ahead and optimise it, but one thing to remember is that an isolated piece of code cannot be timed properly as a stand alone section unless you take some very special precautions. This is because the rest of the program, and the rest of the OS, will strongly affect the timings.
03 May 2009, 10:02
Borsuc

Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
revolution wrote:
You might want to consider this instead:
Code:
```push [value]
call IsDigit
cmp eax,0
jcc somewhere
...    ```
Why cmp eax, 0 and not test eax, eax?

By the way, if you want to use jcc directly immediately after the function without storing the result in eax, use this:

Code:
```mov   eax, -'0'
cmp   eax, 10
retn  4    ```
This will set the CARRY flag, if it's outside the range.
Notice that this will NOT set eax to 0 or -1. eax will represent the "parameter - '0'" in this case.

So you can use something like this:
Code:
```push [value]
call IsDigit
jc NotDigit    ```
but even better would be to pass eax directly as a parameter, instead of pushing it on the stack -- if you use assembly that is, and a custom calling convention

_________________
Previously known as The_Grey_Beast
03 May 2009, 14:52
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17621
revolution
Borsuc wrote:
Why cmp eax, 0 and not test eax, eax?
Okay, that will suffice also. Just a hand over from my ARM coding for the last many months. In ARM cmp has no disadvantage.

BTW: If you want it to be "faster" (in whatever context you are choosing to define that) then make it a macro and forgo the call/ret overhead ...
Code:
```macro IsDigit value {
movzx eax,byte[value]
sub eax,'0'
cmp eax,10
}    ```
... but watch your cache size constraint, in some situations it might hurt more than helps.
03 May 2009, 15:12
manfred

Joined: 28 Feb 2009
Posts: 43
Location: Racibórz, Poland
manfred
Thanks for replies!
By the way - what can I do with that function:
Code:
```mov eax, [esp+4]
cmp eax, 'A'
jb .false
cmp eax, 'Z'
ja .checklower
jmp .true
.checklower:
cmp eax, 'a'
jb .false
cmp eax, 'z'
ja .false
.true:
mov eax, 1
ret
.false:
xor eax, eax
ret    ```
? Is this good:
Code:
```;fastcall, return by cf
and eax, 0DFh
sub eax, 'A'
cmp eax, 26
ret    ```
?

_________________
Sorry for my English...
03 May 2009, 15:26
LocoDelAssembly

Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
manfred, I've tested with the following code:
Code:
```format PE console 4.0

entry _start

include 'win32ax.inc'

macro tester func
{
local ..loop

invoke Sleep, 1000

xor eax, eax
cpuid

call [GetTickCount]
mov [timestart], eax
mov ebx, \$80000000

align 16
..loop:
push ebx ; Instead of push 0 to "sabotage" the branch predictor at my1 a bit
call func
sub ebx, 1
jnz ..loop

; Serialize
xor eax, eax
cpuid

call [GetTickCount]
sub eax, [timestart]
push eax
call @f
db `func, 0
@@:
push fmt
call [printf]
align 16
}

_start:

invoke  GetCurrentProcess
invoke  SetPriorityClass, eax, REALTIME_PRIORITY_CLASS

tester my1
tester my2
tester locos1
tester borsucs
tester locos2
xor eax, eax
ret

align 16
my1:
mov eax, [esp+4]
cmp al, '0'
jb .false
cmp al, '9'
ja .false
mov eax, 1
ret
.false:
xor eax, eax
ret

align 16
my2:
mov edx, [esp+4]
xor ecx, ecx
mov eax, 1
cmp edx, '0'
cmovb eax, ecx
cmp edx, '9'
cmova eax, ecx
ret

align 16
locos1:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 9
setbe al
movzx eax, al
ret

align 16
borsucs:
mov   eax, [esp+4]
sub   eax, '0'
cmp   eax, 10
sbb   eax, eax
ret

align 16
locos2:
mov   eax, -'0'
cmp   eax, 10
sbb   eax, eax
ret

fmt db "%s: ", "%dms", 10, 0
timestart dd 0

library msvcrt, 'msvcrt.dll',\
kernel32, 'kernel32.dll'

import msvcrt,\
printf, 'printf'

include 'api/kernel32.inc'    ```

Results with an Athlon64 (Venice):

Code:
```C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7563ms
locos1: 7562ms
borsucs: 6484ms
locos2: 6468ms

C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7563ms
locos1: 7563ms
borsucs: 6485ms
locos2: 6485ms

C:\Documents and Settings\Hernan\Escritorio>test.exe
my1: 10531ms
my2: 7562ms
locos1: 7562ms
borsucs: 6469ms
locos2: 6485ms    ```

Looks that my2==locos1 and borsucs==locos2.
03 May 2009, 15:31
Azu

Joined: 16 Dec 2008
Posts: 1159
Azu
Borsuc wrote:
revolution wrote:
You might want to consider this instead:
Code:
```push [value]
call IsDigit
cmp eax,0
jcc somewhere
...    ```
Why cmp eax, 0 and not test eax, eax?

By the way, if you want to use jcc directly immediately after the function without storing the result in eax, use this:

Code:
```mov   eax, -'0'
cmp   eax, 10
retn  4    ```
This will set the CARRY flag, if it's outside the range.
Notice that this will NOT set eax to 0 or -1. eax will represent the "parameter - '0'" in this case.

So you can use something like this:
Code:
```push [value]
call IsDigit
jc NotDigit    ```
but even better would be to pass eax directly as a parameter, instead of pushing it on the stack -- if you use assembly that is, and a custom calling convention
Or fastcall
09 Jun 2009, 10:33
Nikolay Petrov

Joined: 22 Apr 2004
Posts: 101
Location: Bulgaria
Nikolay Petrov
Linear algorithms are always faster.
for example:
Code:
```align 16
isdec:
movzx   eax, byte [esp+4]
movzx   eax, byte [_is_dec_table+eax]
ret

align 16
_is_dec_table:
db 48 dup(0)
db 10 dup(1)
db 198 dup(0)    ```

_________________
regards
29 Jun 2009, 22:32
Azu

Joined: 16 Dec 2008
Posts: 1159
Azu
Nikolay Petrov wrote:
Linear algorithms are always faster.
Always?

align 16
isn't0:
ret

align 16
isn't1:
dec eax
ret

align 16
isn't-1:
inc eax
ret

align 16
isodd:
and eax,1
ret

align 16
issigned:
and eax,31 shl 8
ret

Can you make all of those faster with linear algorithms?

30 Jun 2009, 01:23
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page 1, 2  Next

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum