flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
hopcode 22 Dec 2009, 02:57
Hallo Everybody,
Here different technique to this conditional Code: IF REG#0 set REG=CONST_VALUE ELSE (REG=0) END IF .. or ... IF REG=0 set REG=CONST_VALUE ELSE set REG=0 END IF Sometimes from previous functions the return value, to set flags These different approaches (in macro): Simple cmovz/cmovnz or here http://board.flatassembler.net/topic.php?p=106575#106575 for another compact complete version ,but using a 3rd register Code: CONST_VALUE = 100h macro @setA reg,const { test reg,reg mov ecx,const cmovnz reg,ecx } VariantB, here for only reg#0 Code: macro @setB reg,const { mov ecx,const neg reg mov edx,ecx sbb reg,reg sub ecx,reg and reg,ecx and reg,edx nop } Variant eax cmpxchg Code: macro @seteax const,fbool{ xor edx,edx mov ecx,const cmpxchg edx,ecx match =TRUE,fbool\{ xor eax,ecx xor eax,edx \} match =FALSE,fbool\{ or eax,ecx and eax,edx \} } Variant C (my preferite) Code: macro @setC reg,const,fbool { neg reg mov ecx,const sbb reg,reg sub reg,ecx match =TRUE,fbool \{ not reg \} and reg,ecx nop } Any other (better) idea ? hopcode[mrk] ![]() Last edited by hopcode on 25 Dec 2009, 04:30; edited 1 time in total |
|||
![]() |
|
hopcode 22 Dec 2009, 05:17
there was really something souspicios with that unaesthetic repetition of sub after sbb!!!!!!
Code: sbb eax,eax sub eax,ecx yours seems to work OK with the following values: (macro to simplify usage) Code: macro @setD result,reg,const,fbool { mov eax,result neg reg sbb reg,reg match =FALSE,fbool \{ not reg \} and reg,const nop } @setD 3,eax,80000000h,FALSE @setD 3,eax,0,FALSE @setD 3,eax,-1,FALSE @setD 3,eax,1,FALSE @setD 80000000h,eax,80000000h,FALSE @setD 0,eax,3,FALSE @setD -1,eax,CONST_VALUE,FALSE @setD 0,eax,CONST_VALUE,FALSE @setD 1,eax,CONST_VALUE,FALSE @setD 80000000h,eax,CONST_VALUE,FALSE @setD CONST_VALUE,eax,CONST_VALUE,FALSE @setD -1,eax,80000000h,FALSE @setD 0,eax,80000000h,FALSE @setD 1,eax,80000000h,FALSE @setD CONST_VALUE,eax,80000000h,FALSE nop nop @setD 3,eax,80000000h,TRUE @setD 3,eax,0,TRUE @setD 3,eax,-1,TRUE @setD 3,eax,1,TRUE @setD 80000000h,eax,80000000h,TRUE @setD 0,eax,3,TRUE @setD -1,eax,CONST_VALUE,TRUE @setD 0,eax,CONST_VALUE,TRUE @setD 1,eax,CONST_VALUE,TRUE @setD 80000000h,eax,CONST_VALUE,TRUE @setD CONST_VALUE,eax,CONST_VALUE,TRUE @setD -1,eax,80000000h,TRUE @setD 0,eax,80000000h,TRUE @setD 1,eax,80000000h,TRUE @setD CONST_VALUE,eax,80000000h,TRUE Good! ![]() are there other not contempled cases ? Regards, hopcode EDIT: and these other cases: Code: @setD -1,eax,-1,FALSE @setD 0,eax,-1,FALSE @setD -1,eax,-1,TRUE @setD 0,eax,-1,TRUE |
|||
![]() |
|
LocoDelAssembly 22 Dec 2009, 05:46
Quote:
A correction, I meant reg=0 here actually, EAX was the register I was using for testing. And talking about reg, your macros will need extra code to handle cases in which reg is used for temporal data (ECX and EDX in VariantB). PS: I have checked VariantC only, don't know if the others works. VariantA seems to not be following the pseudocode, in case the conditional move is not performed the reg will retain its previous state instead of zero. |
|||
![]() |
|
edfed 22 Dec 2009, 05:57
an(other) idea:
![]() set a register on a condition? without conditional jumps... if eax = ebx, then, eax=320 Code: cmp eax,ebx jne @f mov eax,320 @@: without flags: Code: mov ecx,eax mov edx,ebx sub ecx,edx or cl,ch rol ecx,8 or cl,ch rol ecx,8 or cl,ch rol ecx,8 or cl,ch mov ch,cl shr cl,4 and ch,0fh or cl,ch mov ch,cl shl ch,1 or cl,ch shl ch,1 or cl,ch shl ch,1 or cl,ch and ecx,1 mov edx,ecx xor edx,1 imul ecx,320 imul eax,edx add eax,ecx i wonder if it is faster than qbasic. |
|||
![]() |
|
hopcode 22 Dec 2009, 06:19
I have got an idea
![]() Code: macro @setD result,reg,const,fbool { mov eax,result mov ecx,const neg reg sbb reg,reg match =TRUE,fbool \{ xchg reg,ecx \} match =FALSE,fbool \{ not reg \} and reg,ecx } 7 bytes instead of 8 for the TRUE case ![]() 8 bytes for the FALSE case, as in your snippet (excepted mov instructions) LocoDelAssembly wrote: I have checked VariantC only, don't know if the others works.... Perhaps, something interesting may be possible with cmpxchg for its 64bits extension on 8 bytes !!! PS i am concentrating on a visual training of the logic when reading not macro-encapsulated code. Greetings, hopcode |
|||
![]() |
|
baldr 22 Dec 2009, 08:41
hopcode,
I've lost your idea ![]() and is commutative operation, so xchg reg, ecx before and reg, ecx is effectively nop (unless ecx is also output value). cmp reg, 1 sets CF iff neg reg clears it, and (for 32-bit reg) only 1 byte longer (not is always 2 bytes). |
|||
![]() |
|
hopcode 22 Dec 2009, 16:01
Hallo, baldr
You are welcome here, ![]() So a long time not hearing from you... Ok. baldr wrote: I've lost your idea using xchg reg,ecx, the concept is to preserve the quantity found in eax: if eax#0 ecx=number if eax=0 ecx=0 but that works only on the TRUE branch (i.e. when eax#0) because of the fact that the last instruction (and reg,ecx) destroy the CF. Quote: result and eax are not used in macro, are they? and represents the REG in the macro. Quote: cmp reg, 1 sets CF iff neg reg clears it, and (for 32-bit reg) only 1 byte longer (not is always 2 bytes). Yes, i read about it some time ago in a "gem". But, for example this code, that works for all the above cases... Code: macro @setD result,reg,const,fbool { mov eax,result mov ecx,const cmp reg,1 sbb reg,reg match =TRUE,fbool \{ xor reg,ecx \} and reg,ecx nop } has (excepted mov instructions): 7 bytes for the FALSE branch 9 bytes for the TRUE. The matter is how to preserve info on the found quantity in eax without using any other register than eax/ecx. it is to say: was EAX 0 ? was EAX a number ? Regards, hopcode |
|||
![]() |
|
baldr 22 Dec 2009, 17:42
hopcode,
Let's see: eax will be equal to result regardless of other parameters' values; if reg was 0, it will be 0 if fbool was TRUE, const otherwise; if reg was not 0, it will be const if fbool was TRUE, 0 otherwise. Is it what you're trying to achieve? About cmp reg, 1/neg reg: choose one based on fbool before sbb reg, reg and voila -- you don't have to not it, just and. To preserve value in reg you may choose some other register for result (sbb/and combo does not depend on previous register value). |
|||
![]() |
|
hopcode 23 Dec 2009, 02:41
baldr wrote: Is it what you're trying to achieve? Yes, exactly so. User tell in fbools how the result should be handled: IF RESULT # 0 (and fbool=TRUE) set REG=CONST IF RESULT = 0 (and fbool=FALSE) set REG=CONST Quote: ...choose one based on fbool... That is a good idea! Quote: ...you don't have to not it, just and... How can you do it ? You must know the quantity in REG#0 or REG=0 in every case, whether fbool=TRUE or FALSE So in this way,we have in the code below 1) --- 8 bytes for each branch 2) --- ECX tell us what in EAX was ECX=1 EAX was a number / ECX=0 EAX was 0 ECX is important for a design/scope readability. testmacro Code: macro @setD result,reg,const,fbool { mov ecx,result mov eax,const match =TRUE,fbool \{ neg ecx sbb ecx,ecx and reg,ecx neg ecx \} match =FALSE,fbool \{ cmp ecx,1 sbb ecx,ecx and reg,ecx inc ecx \} nop } ready-2-use-macro is Code: macro @set reg,const,fbool { xchg reg,ecx push const match =TRUE,fbool \{ neg ecx pop reg sbb ecx,ecx and reg,ecx neg ecx \} match =FALSE,fbool \{ cmp ecx,1 pop reg sbb ecx,ecx and reg,ecx inc ecx \} } Regards, hopcode |
|||
![]() |
|
baldr 23 Dec 2009, 09:45
hopcode,
I still don't get it. Two separate macros, for example Code: macro @setZ reg, value { cmp reg, 1 sbb reg, reg and reg, value } macro @setNZ reg, value { neg reg sbb reg, reg and reg, value } They set ZF appropriately, so you may use setz/setnz if you need 0/1 value. Or you may insert sbb ecx, ecx after/before sbb reg, reg -- it doesn't alter CF. Specify exactly what macro should do. hopcode wrote: IF RESULT # 0 (and fbool=TRUE) set REG=CONST |
|||
![]() |
|
r22 23 Dec 2009, 15:43
TEST + CMOV is the fastest and shortest method.
|
|||
![]() |
|
hopcode 23 Dec 2009, 15:47
baldr wrote: ..what macro should do... Yes, like specified ahead in the thread. btw: Is this sign " # " the reason of the quiPROquo ? because i have typed it for "<>", meaning: IF EAX # 0 corresponds to IF EAX <> 0 if yes, sorry and thanks for helping |
|||
![]() |
|
baldr 23 Dec 2009, 16:00
r22,
CMOVcc's source operand can't be immediate (AND's can). |
|||
![]() |
|
Borsuc 23 Dec 2009, 16:27
baldr wrote:
_________________ Previously known as The_Grey_Beast |
|||
![]() |
|
LocoDelAssembly 23 Dec 2009, 17:02
Quote:
Indeed, but the CMP and MOV tempReg, constant could be executed in parallel so there is a little hope for it to be faster. The only problem is the need for the extra register... |
|||
![]() |
|
r22 23 Dec 2009, 19:59
@Borsuc
@Baldir On a Core2Duo TEST+CMOV is slightly faster. Here is some simple test code to illustrate this. Perhaps on P4 (and older) or even AMD this may not be true. Using the extra register is a possible concern and may push you to another solution when optimizing, but for the general case TEST+CMOV seems superior even if it's just for reading clarity. Using an immediate value Code: format PE console include 'win32a.inc' entry start section ".data" data readable writeable szFormat db '%d',13,10,0 _repeat db 'Repeat Test?',0 _pause db 'pause',0 align 8 _VALUE dd 99 section ".code" code readable executable TEST_COUNT = 1FFFFFFFh VALUE = 99 start: ;;;;;; TEST Borsuc mov ebp, TEST_COUNT call [GetTickCount] mov ebx, eax align 16 .TST1: call [GetTickCount] ; and eax, 1 ;;psuedo random 0 or 1 neg eax sbb eax, eax and eax, VALUE sub ebp, 1 jnz .TST1 call [GetTickCount] sub eax, ebx cinvoke printf, szFormat, eax ;;;;;; TEST r22 mov ebp, TEST_COUNT call [GetTickCount] mov ebx, eax align 16 .TST2: call [GetTickCount] ; and eax, 1 ;;psuedo random 0 or 1 mov ecx, VALUE test eax, eax cmovnz eax, ecx sub ebp, 1 jnz .TST2 call [GetTickCount] sub eax, ebx cinvoke printf, szFormat, eax ;;;;;; invoke MessageBox, 0, _repeat, _repeat, MB_YESNO cmp eax, IDYES je start cinvoke system, _pause invoke ExitProcess, 0 section '.idata' import data readable writeable library kernel,'KERNEL32.DLL',\ msvcrt,'msvcrt.dll',\ user,'USER32.DLL' import kernel,\ GetTickCount,'GetTickCount',\ ExitProcess,'ExitProcess' import msvcrt,\ printf,'printf',\ system,'system' import user,\ wsprintf,'wsprintfA',\ MessageBox,'MessageBoxA' |
|||
![]() |
|
LocoDelAssembly 23 Dec 2009, 20:15
AMD Athlon64 Venice
Code: 2734 2719 2735 2672 2703 2656 2734 2656 2875 2656 2734 2672 |
|||
![]() |
|
LocoDelAssembly 23 Dec 2009, 21:05
I've made some changes:
Code: format PE console include 'win32a.inc' entry setPriority section ".data" data readable writeable szFormat db '%d',13,10,0 _repeat db 'Repeat Test?',0 _pause db 'pause',0 align 8 _VALUE dd 99 section ".code" code readable executable TEST_COUNT = 01FFFFFFFh VALUE = 99 start: ;;;;;; TEST Borsuc mov ebp, TEST_COUNT call [GetTickCount] push eax xor esi, esi align 64 .TST1: mov edi, esi not esi xor eax, eax cpuid ; call [GetTickCount] ; and eax, 13 neg edi sbb edi, edi and edi, VALUE ; Do something with the value add edi, 7 lea edx, [edi+3] sub ebp, 1 jnz .TST1 call [GetTickCount] pop ebx sub eax, ebx push eax push szFormat ;;;;;; TEST r22 mov ebp, TEST_COUNT call [GetTickCount] push eax align 64 .TST2: mov edi, esi not esi xor eax, eax cpuid ; call [GetTickCount] ; and eax, 13 mov ecx, VALUE test edi, edi cmovnz edi, ecx ; Do something with the value add edi, 7 lea edx, [edi+3] sub ebp, 1 jnz .TST2 call [GetTickCount] pop ebx sub eax, ebx push eax push szFormat ;;;;;; call [printf] add esp, 8 call [printf] add esp, 8 invoke MessageBox, 0, _repeat, _repeat, MB_YESNO cmp eax, IDYES je start cinvoke system, _pause invoke ExitProcess, 0 setPriority: invoke GetCurrentProcess mov ebx, eax invoke SetPriorityClass, eax, REALTIME_PRIORITY_CLASS invoke GetCurrentThread invoke SetThreadPriority, eax, THREAD_PRIORITY_TIME_CRITICAL jmp start section '.idata' import data readable writeable library kernel,'KERNEL32.DLL',\ msvcrt,'msvcrt.dll',\ user,'USER32.DLL' import kernel,\ GetCurrentProcess, 'GetCurrentProcess',\ SetPriorityClass, 'SetPriorityClass',\ GetCurrentThread, 'GetCurrentThread',\ SetThreadPriority, 'SetThreadPriority',\ GetTickCount,'GetTickCount',\ ExitProcess,'ExitProcess' import msvcrt,\ printf,'printf',\ system,'system' import user,\ wsprintf,'wsprintfA',\ MessageBox,'MessageBoxA' It gave me this: Code: 13969 14265 13985 14250 13969 14266 Removing the "Do something" part (intended to generate dependency over the conditional set register), makes both take the same time. Code: 13984 13985 13984 13985 13985 13968 And finally, removing the CPUID part but retaining the "Do something" part (and TEST_COUNT set to 0FFFFFFFFh): Code: 6454 6453 6453 6453 6453 6453 6454 6453 The conclusion is that there is no conclusion here ![]() [edit]WARNING: My code prints the results in reverse order so in the first test cmov won.[/edit] Last edited by LocoDelAssembly on 23 Dec 2009, 22:34; edited 1 time in total |
|||
![]() |
|
baldr 23 Dec 2009, 22:02
r22,
Your test mostly measures call [GetTickCount] performance. ![]() I'd replaced it with mov eax, ebp/and eax, 1 and unrolled loops 256 times. With priority boosted to ABOVE_NORMAL_PRIORITY_CLASS/THREAD_PRIORITY_HIGHEST cmov variant wins by ~20% on Celeron D 315 (Prescott@2.26). Strangely LocoDelAssembly's test gives almost identical (delta<0.03%) results for both. Here's my test code: Code: format PE console include 'win32ax.inc' entry start section ".data" data readable writeable szFormat db '%d',13,10,0 _repeat db 'Repeat Test?',0 _pause db 'pause',0 align 8 _VALUE dd 99 section ".code" code readable executable TEST_COUNT = 1FFFFFh VALUE = 99 start: ABOVE_NORMAL_PRIORITY_CLASS = 0x00008000 invoke SetPriorityClass, invoke GetCurrentProcess, ABOVE_NORMAL_PRIORITY_CLASS invoke SetThreadPriority, invoke GetCurrentThread, THREAD_PRIORITY_HIGHEST ;;;;;; TEST Borsuc mov ebp, TEST_COUNT call [GetTickCount] mov ebx, eax align 16 .TST1: ; call [GetTickCount] mov eax, ebp and eax, 1 rept 256 { neg eax sbb eax, eax and eax, VALUE } sub ebp, 1 jnz .TST1 call [GetTickCount] sub eax, ebx cinvoke printf, szFormat, eax ;;;;;; TEST r22 mov ebp, TEST_COUNT call [GetTickCount] mov ebx, eax align 16 .TST2: ; call [GetTickCount] mov eax, ebp and eax, 1 rept 256 { mov ecx, VALUE test eax, eax cmovnz eax, ecx } sub ebp, 1 jnz .TST2 call [GetTickCount] sub eax, ebx cinvoke printf, szFormat, eax ;;;;;; invoke MessageBox, 0, _repeat, _repeat, MB_YESNO cmp eax, IDYES je start cinvoke system, _pause invoke ExitProcess, 0 section '.idata' import data readable writeable library kernel,'KERNEL32.DLL',\ msvcrt,'msvcrt.dll',\ user,'USER32.DLL' import kernel,\ GetCurrentProcess, "GetCurrentProcess",\ SetPriorityClass, "SetPriorityClass",\ GetCurrentThread, "GetCurrentThread",\ SetThreadPriority, "SetThreadPriority",\ GetTickCount,'GetTickCount',\ ExitProcess,'ExitProcess' import msvcrt,\ printf,'printf',\ system,'system' import user,\ MessageBox,'MessageBoxA' Writing code (roughly) equivalent to cmp eax, 1/sbb eax, eax/and eax, VALUE is more complex than it seems? ![]() |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.