flat assembler
Message board for the users of flat assembler.
Index
> Main > jcc vs cmov - which is faster? Goto page 1, 2 Next |
Author |
|
LocoDelAssembly 02 May 2009, 20:20
Sorry for not replying your question, I'll just add an extra variant:
Code: mov eax, [esp+4] sub eax, '0' cmp eax, 9 setbe al movzx eax, al ret |
|||
02 May 2009, 20:20 |
|
manfred 02 May 2009, 20:46
Huh, I didn't know there are set* instructions... I think your code must be really faster than mine two, how many cycles it takes?
_________________ Sorry for my English... |
|||
02 May 2009, 20:46 |
|
Borsuc 02 May 2009, 20:50
It's impossible to determine that on current processors (I mean the number of clock cycles, not the 'vague idea' that it is faster, which it most likely is). Try to test it. (run it in a huge loop, so it repeats many times)
|
|||
02 May 2009, 20:50 |
|
Borsuc 02 May 2009, 21:02
Also what about this?
Code: mov eax, [esp+4] sub eax, '0' cmp eax, 10 sbb eax, eax ret _________________ Previously known as The_Grey_Beast |
|||
02 May 2009, 21:02 |
|
LocoDelAssembly 02 May 2009, 21:02
Somewhat hard to know these days but I believe that in my Athlon64 it would take 7 cycles (three for "mov eax, [esp+4]") not counting RET.
BTW, if your function is allowed to return non-zero as true instead of strictly one then this may be faster: Code: mov eax, -'0' add eax, [esp+4] ; PERHAPS the CPU will start executing this along with "mov eax, -'0'" since EAX value is not needed while in bus read cycle (i.e. this instruction is not just one micro-op) cmp eax, 10 sbb eax, eax ; -1 if [esp+4] is in range, 0 otherwise ret [edit]hehe, Borsuc was seconds faster than me (But check the small difference)[/edit] |
|||
02 May 2009, 21:02 |
|
bitshifter 03 May 2009, 04:41
If this is stdcall should it use
Code: retn 4 _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
03 May 2009, 04:41 |
|
LocoDelAssembly 03 May 2009, 05:03
Yes, if it is stdcall, but it is OK for cdecl.
|
|||
03 May 2009, 05:03 |
|
revolution 03 May 2009, 06:01
Just for giggles: Here is similar in ARM
Code: cmp r0,'0' ;carry is set if r0 >= '0' rsbcss r0,r0,'9' ;carry is set if '0' <= r0 <= '9' movcs r0,1 ;return 1 if within range movcc r0,0 ;return 0 if out of range mov pc,lr |
|||
03 May 2009, 06:01 |
|
bitshifter 03 May 2009, 07:53
LocoDelAssembly wrote: Yes, if it is stdcall, but it is OK for cdecl. Code: push [value] call IsDigit jcc ... add esp,4 Oh, and by the way, these are some nice methods guys _________________ Coding a 3D game engine with fasm is like trying to eat an elephant, you just have to keep focused and take it one 'byte' at a time. |
|||
03 May 2009, 07:53 |
|
revolution 03 May 2009, 08:17
bitshifter wrote: So to invoke it as ccall i could do like this? Code: push [value] call IsDigit add esp,4 cmp eax,0 jcc somewhere ... |
|||
03 May 2009, 08:17 |
|
manfred 03 May 2009, 09:50
I've tested speed of these functions with this code:
Code: format PE console 4.0 entry _start include '%FASMINC%/win32ax.inc' section '__text__' code readable executable macro tester func { local ..loop call [GetTickCount] mov [timestart], eax mov ebx, 0FFFFFFFFh ..loop: push '0' call func add esp, 4 sub ebx, 1 jnz ..loop call [GetTickCount] sub eax, [timestart] push eax push fmt call [printf] add esp, 8 } _start: tester my1 tester my2 tester locos1 tester borsucs tester locos2 xor eax, eax ret align 16 my1: mov eax, [esp+4] cmp al, '0' jb .false cmp al, '9' ja .false mov eax, 1 ret .false: xor eax, eax ret align 16 my2: mov edx, [esp+4] xor ecx, ecx mov eax, 1 cmp edx, '0' cmovb eax, ecx cmp edx, '9' cmova eax, ecx ret align 16 locos1: mov eax, [esp+4] sub eax, '0' cmp eax, 9 setbe al movzx eax, al ret align 16 borsucs: mov eax, [esp+4] sub eax, '0' cmp eax, 10 sbb eax, eax ret align 16 locos2: mov eax, -'0' add eax, [esp+4] cmp eax, 10 sbb eax, eax ret section '__data__' data readable writable fmt db "%d ", 0 timestart dd 0 section '_import_' import readable library msvcrt, 'msvcrt.dll',\ kernel32, 'kernel32.dll' import msvcrt,\ printf, 'printf' include '%FASMINC%/api/kernel32.inc' Code: mov [timestart], eax Code: mov edi, eax Code: sub eax, [timestart] Code: sub eax, edi _________________ Sorry for my English... |
|||
03 May 2009, 09:50 |
|
revolution 03 May 2009, 10:02
Yes indeed, the standard problems with optimisation. It is not an easy thing to do well. There are many things that can affect timing. Things like: which CPU, memory speed, code alignment, operating system, cache size, other running tasks, and so on. You have embarked on a difficult task and there is no simple solution. And any solution you may find will likely only work for your system setup and within that test section only.
One question you should ask yourself is "Is it really going to save me enough time compared to how much time I spend optimising it?" If the answer is yes then go right ahead and optimise it, but one thing to remember is that an isolated piece of code cannot be timed properly as a stand alone section unless you take some very special precautions. This is because the rest of the program, and the rest of the OS, will strongly affect the timings. |
|||
03 May 2009, 10:02 |
|
Borsuc 03 May 2009, 14:52
revolution wrote: You might want to consider this instead: By the way, if you want to use jcc directly immediately after the function without storing the result in eax, use this: Code: mov eax, -'0' add eax, [esp+4] cmp eax, 10 retn 4 Notice that this will NOT set eax to 0 or -1. eax will represent the "parameter - '0'" in this case. So you can use something like this: Code: push [value] call IsDigit jc NotDigit _________________ Previously known as The_Grey_Beast |
|||
03 May 2009, 14:52 |
|
revolution 03 May 2009, 15:12
Borsuc wrote: Why cmp eax, 0 and not test eax, eax? BTW: If you want it to be "faster" (in whatever context you are choosing to define that) then make it a macro and forgo the call/ret overhead ... Code: macro IsDigit value { movzx eax,byte[value] sub eax,'0' cmp eax,10 } |
|||
03 May 2009, 15:12 |
|
manfred 03 May 2009, 15:26
Thanks for replies!
By the way - what can I do with that function: Code: mov eax, [esp+4] cmp eax, 'A' jb .false cmp eax, 'Z' ja .checklower jmp .true .checklower: cmp eax, 'a' jb .false cmp eax, 'z' ja .false .true: mov eax, 1 ret .false: xor eax, eax ret Code: ;fastcall, return by cf and eax, 0DFh sub eax, 'A' cmp eax, 26 ret _________________ Sorry for my English... |
|||
03 May 2009, 15:26 |
|
LocoDelAssembly 03 May 2009, 15:31
manfred, I've tested with the following code:
Code: format PE console 4.0 entry _start include 'win32ax.inc' section '__text__' code readable executable macro tester func { local ..loop invoke Sleep, 1000 xor eax, eax cpuid call [GetTickCount] mov [timestart], eax mov ebx, $80000000 align 16 ..loop: push ebx ; Instead of push 0 to "sabotage" the branch predictor at my1 a bit call func add esp, 4 sub ebx, 1 jnz ..loop ; Serialize xor eax, eax cpuid call [GetTickCount] sub eax, [timestart] push eax call @f db `func, 0 @@: push fmt call [printf] add esp, 12 align 16 } _start: invoke GetCurrentProcess invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS invoke GetCurrentThread invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL tester my1 tester my2 tester locos1 tester borsucs tester locos2 xor eax, eax ret align 16 my1: mov eax, [esp+4] cmp al, '0' jb .false cmp al, '9' ja .false mov eax, 1 ret .false: xor eax, eax ret align 16 my2: mov edx, [esp+4] xor ecx, ecx mov eax, 1 cmp edx, '0' cmovb eax, ecx cmp edx, '9' cmova eax, ecx ret align 16 locos1: mov eax, [esp+4] sub eax, '0' cmp eax, 9 setbe al movzx eax, al ret align 16 borsucs: mov eax, [esp+4] sub eax, '0' cmp eax, 10 sbb eax, eax ret align 16 locos2: mov eax, -'0' add eax, [esp+4] cmp eax, 10 sbb eax, eax ret section '__data__' data readable writable fmt db "%s: ", "%dms", 10, 0 timestart dd 0 section '_import_' import readable library msvcrt, 'msvcrt.dll',\ kernel32, 'kernel32.dll' import msvcrt,\ printf, 'printf' include 'api/kernel32.inc' Results with an Athlon64 (Venice): Code: C:\Documents and Settings\Hernan\Escritorio>test.exe my1: 10531ms my2: 7563ms locos1: 7562ms borsucs: 6484ms locos2: 6468ms C:\Documents and Settings\Hernan\Escritorio>test.exe my1: 10531ms my2: 7563ms locos1: 7563ms borsucs: 6485ms locos2: 6485ms C:\Documents and Settings\Hernan\Escritorio>test.exe my1: 10531ms my2: 7562ms locos1: 7562ms borsucs: 6469ms locos2: 6485ms Looks that my2==locos1 and borsucs==locos2. |
|||
03 May 2009, 15:31 |
|
Azu 09 Jun 2009, 10:33
Borsuc wrote:
|
|||
09 Jun 2009, 10:33 |
|
Nikolay Petrov 29 Jun 2009, 22:32
Linear algorithms are always faster.
for example: Code: align 16 isdec: movzx eax, byte [esp+4] movzx eax, byte [_is_dec_table+eax] ret align 16 _is_dec_table: db 48 dup(0) db 10 dup(1) db 198 dup(0) _________________ regards |
|||
29 Jun 2009, 22:32 |
|
Azu 30 Jun 2009, 01:23
Nikolay Petrov wrote: Linear algorithms are always faster. align 16 isn't0: ret align 16 isn't1: dec eax ret align 16 isn't-1: inc eax ret align 16 isodd: and eax,1 ret align 16 issigned: and eax,31 shl 8 ret Can you make all of those faster with linear algorithms? How about SHA-512? |
|||
30 Jun 2009, 01:23 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.