flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution
randomdude wrote:
|
|||
![]() |
|
randomdude
sure, i knew you would say that
![]() |
|||
![]() |
|
revolution
randomdude wrote: ... but there isnt an option which would be prefered over the rest on most cases? randomdude wrote: just like xor eax,eax vs and eax,0 vs mov eax,0 etc Measure, don't guess. |
|||
![]() |
|
randomdude
not really important, but i use setx a lot and i always have the doubt of which method to use and if with it im actually optimizing my code or doing the opposite :/ is there at least any info about how fast movzx is or when to use it?
|
|||
![]() |
|
revolution
randomdude wrote: is there at least any info about how fast movzx is or when to use it? However I would suggest that since you don't actually know if your code is an optimisation or not then you can't even know if that code is a bottleneck. Perhaps you are looking deeply into something that is only making a 0.0001% difference? |
|||
![]() |
|
randomdude
Quote: Perhaps you are looking deeply into something that is only making a 0.0001% difference? isnt this the main reason to choose asm for programming? ![]() |
|||
![]() |
|
revolution
randomdude wrote:
|
|||
![]() |
|
redsock
revolution wrote: Agner Fog may have some information. But things change quickly with each new CPU. My understanding is that the 8 bit register write causes a false dependency on the previous value of the register, and as such can create a false dependency chain if you aren't careful. Instruction latency doc says SETcc latency is 1, throughput 0.5, but again that requires the dependency on the destination to be resolved completely due to the partial register write. I use CMOVcc in most cases to avoid the false dependency, but agree with revolution that optimization at this level really needs the bigger picture of what all of the dependencies and instruction latencies really are in order to determine which of your variants really is the better choice. |
|||
![]() |
|
revolution
Talking about instruction latencies and throughputs is premature without knowing which CPU(s) you want to run the code on.
|
|||
![]() |
|
redsock
randomdude wrote: which one would be the fastest on most modern computers? ![]() |
|||
![]() |
|
Mikl___
Code: setX al cbw cwd |
|||
![]() |
|
l4m2
Mikl___ wrote:
|
|||
![]() |
|
l4m2
The first one is smallest
|
|||
![]() |
|
revolution
l4m2 wrote:
![]() |
|||
![]() |
|
l4m2
revolution wrote:
|
|||
![]() |
|
randomdude
talking about speed...
recently i wanted to optimizing a function to get rid of branches. but i needed a conditional move instruction and cmovcc is not guaranteed to be supported on every cpu after Pentium Pro. so i used the only left one... cmpxchg Code: proc strnlen c str,num xor eax,eax mov ecx,dword[num] mov edx,ecx jecxz .end push edi mov edi,dword[str] repne scasb mov eax,edx jne @F lea eax,[edi-1] sub eax,dword[str] @@: pop edi .end: ret endp Code: proc strnlen c str,num xor eax,eax mov ecx,dword[num] jecxz .end push edi mov edx,ecx mov edi,dword[str] repne scasb lea ecx,[edi-1] setne al sub ecx,dword[str] add eax,edx cmpxchg edx,ecx mov eax,edx pop edi .end: ret endp anyone knows if its worth using this rare instruction? |
|||
![]() |
|
HaHaAnonymous
Quote:
Not mine at least. My main reason is freedom, before everything. |
|||
![]() |
|
16bitPM
It's an ingenious use of an otherwise very task-specific instruction. Well done!
AFAIK no one ever used the reg,reg encoding... As for your question: you could make a test-case and time the whole thing. That's the only way to know for sure... OR look up the timings for i486 and calculate the exact clocks for your subroutine. randomdude wrote: talking about speed... |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.