flat assembler
Message board for the users of flat assembler.

Index > Main > SETx 4 bytes

Author
Thread Post new topic Reply to topic
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 26 Mar 2015, 14:24
Code:
xor eax,eax
(condition)
setX al    

Code:
(condition)
setX al
and eax,1    

Code:
(condition)
setX al
movzx eax,al    

Code:
(condition)
setX cl
movzx eax,cl    

which one would be the fastest on most modern computers? Smile

i have seen the first option being used the most but it has the drawback of using 2 register as minimun


Last edited by randomdude on 26 Mar 2015, 14:33; edited 4 times in total
Post 26 Mar 2015, 14:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 26 Mar 2015, 14:29
randomdude wrote:
Code:
xor eax,eax
(condition)
setX al    


Code:
(condition)
setX al
and eax,1    


Code:
(condition)
setX al
movzx eax,al    


which one would be the fastest on most modern computers? Smile
There is no way to know for sure. Every CPU is different. And the surrounding code and various other internal CPU states will affect the runtime performance.
Post 26 Mar 2015, 14:29
View user's profile Send private message Visit poster's website Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 26 Mar 2015, 14:35
sure, i knew you would say that Very Happy but there isnt an option which would be prefered over the rest on most cases? just like xor eax,eax vs and eax,0 vs mov eax,0 etc
Post 26 Mar 2015, 14:35
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 26 Mar 2015, 14:41
randomdude wrote:
... but there isnt an option which would be prefered over the rest on most cases?
I doubt it. Each will have its own problems and benefits. I would suggest to test it on your target system. If it is important enough to post about it then it must be important enough to properly test it out.
randomdude wrote:
just like xor eax,eax vs and eax,0 vs mov eax,0 etc
There was a thread here recently where HaHaAnonymous showed that using mov vs xor was inconclusive. One CPU showed "mov" faster and another CPU showed "xor" faster. So there are no rules-of thumb here.

Measure, don't guess.
Post 26 Mar 2015, 14:41
View user's profile Send private message Visit poster's website Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 26 Mar 2015, 14:49
not really important, but i use setx a lot and i always have the doubt of which method to use and if with it im actually optimizing my code or doing the opposite :/ is there at least any info about how fast movzx is or when to use it?
Post 26 Mar 2015, 14:49
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 26 Mar 2015, 14:57
randomdude wrote:
is there at least any info about how fast movzx is or when to use it?
Agner Fog may have some information. But things change quickly with each new CPU.

However I would suggest that since you don't actually know if your code is an optimisation or not then you can't even know if that code is a bottleneck. Perhaps you are looking deeply into something that is only making a 0.0001% difference?
Post 26 Mar 2015, 14:57
View user's profile Send private message Visit poster's website Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 26 Mar 2015, 15:10
Quote:
Perhaps you are looking deeply into something that is only making a 0.0001% difference?

isnt this the main reason to choose asm for programming? Twisted Evil
Post 26 Mar 2015, 15:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 26 Mar 2015, 15:22
randomdude wrote:
Quote:
Perhaps you are looking deeply into something that is only making a 0.0001% difference?

isnt this the main reason to choose asm for programming? Twisted Evil
Not at all. We choose assembly because it is stylish, fashionable and trendy. All the cool kids are into assembly.
Post 26 Mar 2015, 15:22
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 26 Mar 2015, 18:45
revolution wrote:
Agner Fog may have some information. But things change quickly with each new CPU.
SETcc has been around for a while and doesn't seem to have changed a great deal in recent years (at least for Intel).

My understanding is that the 8 bit register write causes a false dependency on the previous value of the register, and as such can create a false dependency chain if you aren't careful.

Instruction latency doc says SETcc latency is 1, throughput 0.5, but again that requires the dependency on the destination to be resolved completely due to the partial register write.

I use CMOVcc in most cases to avoid the false dependency, but agree with revolution that optimization at this level really needs the bigger picture of what all of the dependencies and instruction latencies really are in order to determine which of your variants really is the better choice.
Post 26 Mar 2015, 18:45
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 26 Mar 2015, 23:23
Talking about instruction latencies and throughputs is premature without knowing which CPU(s) you want to run the code on.
Post 26 Mar 2015, 23:23
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 27 Mar 2015, 00:18
randomdude wrote:
which one would be the fastest on most modern computers? Smile
Haha, internal projection and interpretation of that comment I suppose, but I absolutely agree that without specific CPU information, throughputs and latencies are useless. I superimposed by own idea of what "modern computers" means and opened up the Intel x86_64 pages from the last few years.

Smile my bad
Post 27 Mar 2015, 00:18
View user's profile Send private message Reply with quote
Mikl___



Joined: 30 Dec 2014
Posts: 143
Location: Russian Federation, Irkutsk
Mikl___ 27 Mar 2015, 00:26
Code:
setX al
cbw
cwd    
Post 27 Mar 2015, 00:26
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 27 Mar 2015, 14:01
Mikl___ wrote:
Code:
setX al
cbw
cwd    
That'd be slow! !! !!!
Post 27 Mar 2015, 14:01
View user's profile Send private message Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 27 Mar 2015, 14:03
The first one is smallest
Post 27 Mar 2015, 14:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20489
Location: In your JS exploiting you and your system
revolution 27 Mar 2015, 14:03
l4m2 wrote:
Mikl___ wrote:
Code:
setX al
cbw
cwd    
That'd be slow! !! !!!
Can you prove that Question
Post 27 Mar 2015, 14:03
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 27 Mar 2015, 14:46
revolution wrote:
l4m2 wrote:
Mikl___ wrote:
Code:
setX al
cbw
cwd    
That'd be slow! !! !!!
Can you prove that Question
2 c** 6 tick on pentium and it's not used usually so no one optimize it after
Post 27 Mar 2015, 14:46
View user's profile Send private message Reply with quote
randomdude



Joined: 01 Jun 2012
Posts: 83
randomdude 27 Mar 2015, 17:05
talking about speed...

recently i wanted to optimizing a function to get rid of branches. but i needed a conditional move instruction and cmovcc is not guaranteed to be supported on every cpu after Pentium Pro. so i used the only left one... cmpxchg

Code:
proc strnlen c str,num

        xor     eax,eax
        mov     ecx,dword[num]
        mov     edx,ecx
        jecxz   .end
        push    edi
        mov     edi,dword[str]
        repne   scasb
        mov     eax,edx
        jne     @F
        lea     eax,[edi-1]
        sub     eax,dword[str]
        @@:
        pop     edi
        .end:
        ret
endp    

Code:
proc strnlen c str,num

        xor     eax,eax
        mov     ecx,dword[num]
        jecxz   .end
        push    edi
        mov     edx,ecx
        mov     edi,dword[str]
        repne   scasb
        lea     ecx,[edi-1]
        setne   al
        sub     ecx,dword[str]
        add     eax,edx
        cmpxchg edx,ecx
        mov     eax,edx
        pop     edi
        .end:
        ret
endp    

anyone knows if its worth using this rare instruction?
Post 27 Mar 2015, 17:05
View user's profile Send private message Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 28 Mar 2015, 00:29
Quote:

isnt this the main reason to choose asm for programming?

Not mine at least. My main reason is freedom, before everything.
Post 28 Mar 2015, 00:29
View user's profile Send private message Reply with quote
16bitPM



Joined: 08 Jul 2011
Posts: 30
16bitPM 14 Jul 2015, 11:13
It's an ingenious use of an otherwise very task-specific instruction. Well done!
AFAIK no one ever used the reg,reg encoding...

As for your question: you could make a test-case and time the whole thing. That's the only way to know for sure...
OR look up the timings for i486 and calculate the exact clocks for your subroutine.



randomdude wrote:
talking about speed...

recently i wanted to optimizing a function to get rid of branches. but i needed a conditional move instruction and cmovcc is not guaranteed to be supported on every cpu after Pentium Pro. so i used the only left one... cmpxchg

Code:
proc strnlen c str,num

        xor     eax,eax
        mov     ecx,dword[num]
        jecxz   .end
        push    edi
        mov     edx,ecx
        mov     edi,dword[str]
        repne   scasb
        lea     ecx,[edi-1]
        setne   al
        sub     ecx,dword[str]
        add     eax,edx
        cmpxchg edx,ecx
        mov     eax,edx
        pop     edi
        .end:
        ret
endp    

anyone knows if its worth using this rare instruction?
Post 14 Jul 2015, 11:13
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.