flat assembler
Message board for the users of flat assembler.
Index
> Macroinstructions > Could it be optimized some way? |
Author |
|
revolution 30 Nov 2018, 01:13
You need to define what to optimise for. Do you want it to run faster on which strings lengths on which CPUs, on which systems?
If you just want it "fast", then you need to specify the string lengths you expect to be dealing with. Different algorithms and approaches work better with only a subset of string lengths. Also the underlying system will affect the runtimes; memory access speeds, and whether or not the data/instructions are already in the cache will affect the results. Some CPUs have circuitry to execute some instructions more efficiently than others so you could take advantage of that if your intended CPUs have such things. There is no universal code that will always be faster, with all string lengths, on all systems, with all CPUs, in all production code. Each of those things needs to be accounted for to achieve the desired "fast" result. |
|||
30 Nov 2018, 01:13 |
|
DimonSoft 30 Nov 2018, 07:14
ProMiNick wrote:
scasb would do a better job if you manage to hold a pointer to the source string in EDI. Which seems to be easy since you already do a few eXCHanGes between ESI and EDI. As for performance, REP xxxxS is said to be well-optimized on modern and ancient processors except for a small amount of models in between. And even if the rumours are false I find it a better approach to consider a more declarative way of expressing your algorithm being able to become faster at some point in time: string instruction with REP prefix is much easier to recognize than one of the infinite ways to do that with simple instructions, and that’s why this construction is expected to be optimized better most of the time. Again, it’s a good idea to measure if the piece of code you’re trying to optimize is really worth it. And if it is, it’s also a good idea to have some statistics about different hardware: I’ve seen Intel processors to be unbelievably good at optimizing calls to short FPU procedures (like 1.5 faster), AMDs, on the contrary, became twice slower on the same code, and there might be cases where the winner and the loser exchange. |
|||
30 Nov 2018, 07:14 |
|
ProMiNick 30 Nov 2018, 08:56
I have a question about rep:
ecx is checked before 1st rep iteration in string instructions. but other conditions checked only from 2nd iteration: repe scasb with nonzero ecx runs atleast once, and that is independent from value of zero flag just before repe scasb instruction. or I am not right? |
|||
30 Nov 2018, 08:56 |
|
revolution 30 Nov 2018, 09:34
Yes. The zero flag is not checked before the loop starts.
You can test it also to make sure your CPU performs correctly. |
|||
30 Nov 2018, 09:34 |
|
ProMiNick 30 Nov 2018, 13:39
Thanks revolution.
added Code: macro seekrachar achar,from { match [<],from {std\} match [>],from {cld\} match any,achar {mov al,achar\} repne scasb } Code: @copy_path: or ecx,-1 movastr [>] dec edi seekrachar '/',[<] mov esi,ebx test esi,esi jz .outof..loop .via..loop: cmp word[esi],'..' jne .outof..loop add esi,3 ;¯à®¯ã᪠¥¬ ®¤® ¯®¤ï⨥ '../' seekrachar ;'/',[<] jmp .via..loop .outof..loop: cld add edi,2 test esi,esi jz .str_lp4 movastr ;[>] dec edi .str_lp4: mov esi,ebp movastr ;[>] ret } previous variant is even not worked. But this variant is. I have no needance to control lengths so ecx is just -1. |
|||
30 Nov 2018, 13:39 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.