flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > Could it be optimized some way?

Author
Thread Post new topic Reply to topic
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 30 Nov 2018, 00:40
Code:
macro movastr from
{ local ..move
        match [<],from {std\}
        match [>],from {cld\}
..move:
        lodsb
        stosb
        test    al,al
        jnz     ..move }    


Code:
macro seekachar achar*,from
{ local ..seek
        match [<],from {std\}
        match [>],from {cld\}
..seek:
        lodsb
        cmp     al,achar
        jnz     ..seek }    


Code:
@copy_path:
;in ebx - relative path
;in ebp - file name
;in esi - current dir
;in/out edi - output path
        movastr [>]
        mov     esi,ebx
        xchg    esi,edi
        dec     esi
        seekachar '/',[<]
        test    edi,edi ; test    ebx,ebx - may be even more readable
        jz      .outof..loop
.via..loop:
        cmp     word[edi],'..'
        jne     .outof..loop
        add     edi,3
        seekachar '/';[<]
        jmp     .via..loop
.outof..loop:
        cld
        add     esi,2
        xchg    esi,edi
        test    esi,esi ; test    ebx,ebx - may be even more readable
        jz      .str_lp4
        movastr ;[>]
        dec     edi
.str_lp4:
        mov     esi,ebp
        movastr ;[>]
        ret    


is counting length and usung rep operations more fast than that approach? or may be I make a bug somewhere? any suggestions are welkome...

_________________
I don`t like to refer by "you" to one person.
My soul requires acronim "thou" instead.
Post 30 Nov 2018, 00:40
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 30 Nov 2018, 01:13
You need to define what to optimise for. Do you want it to run faster on which strings lengths on which CPUs, on which systems?

If you just want it "fast", then you need to specify the string lengths you expect to be dealing with. Different algorithms and approaches work better with only a subset of string lengths. Also the underlying system will affect the runtimes; memory access speeds, and whether or not the data/instructions are already in the cache will affect the results. Some CPUs have circuitry to execute some instructions more efficiently than others so you could take advantage of that if your intended CPUs have such things.

There is no universal code that will always be faster, with all string lengths, on all systems, with all CPUs, in all production code. Each of those things needs to be accounted for to achieve the desired "fast" result.
Post 30 Nov 2018, 01:13
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 30 Nov 2018, 07:14
ProMiNick wrote:
Code:
macro seekachar achar*,from
{ local ..seek
        match [<],from {std\}
        match [>],from {cld\}
..seek:
        lodsb
        cmp     al,achar
        jnz     ..seek }    

scasb would do a better job if you manage to hold a pointer to the source string in EDI. Which seems to be easy since you already do a few eXCHanGes between ESI and EDI.

As for performance, REP xxxxS is said to be well-optimized on modern and ancient processors except for a small amount of models in between. And even if the rumours are false I find it a better approach to consider a more declarative way of expressing your algorithm being able to become faster at some point in time: string instruction with REP prefix is much easier to recognize than one of the infinite ways to do that with simple instructions, and that’s why this construction is expected to be optimized better most of the time.

Again, it’s a good idea to measure if the piece of code you’re trying to optimize is really worth it. And if it is, it’s also a good idea to have some statistics about different hardware: I’ve seen Intel processors to be unbelievably good at optimizing calls to short FPU procedures (like 1.5 faster), AMDs, on the contrary, became twice slower on the same code, and there might be cases where the winner and the loser exchange.
Post 30 Nov 2018, 07:14
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 30 Nov 2018, 08:56
I have a question about rep:
ecx is checked before 1st rep iteration in string instructions.
but other conditions checked only from 2nd iteration: repe scasb with nonzero ecx runs atleast once, and that is independent from value of zero flag just before repe scasb instruction. or I am not right?
Post 30 Nov 2018, 08:56
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 30 Nov 2018, 09:34
Yes. The zero flag is not checked before the loop starts.

You can test it also to make sure your CPU performs correctly.
Post 30 Nov 2018, 09:34
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 30 Nov 2018, 13:39
Thanks revolution.
added
Code:
macro seekrachar achar,from
{       match [<],from {std\}
        match [>],from {cld\}
        match any,achar {mov al,achar\}
        repne   scasb }    


Code:
@copy_path:
        or      ecx,-1
        movastr [>]
        dec     edi
        seekrachar '/',[<]
        mov     esi,ebx
        test    esi,esi
        jz      .outof..loop
.via..loop:
        cmp word[esi],'..'
        jne .outof..loop
        add esi,3 ;¯à®¯ã᪠¥¬ ®¤­® ¯®¤­ï⨥ '../'
        seekrachar ;'/',[<]
        jmp .via..loop
.outof..loop:
        cld
        add     edi,2
        test    esi,esi
        jz      .str_lp4
        movastr ;[>]
        dec     edi
.str_lp4:
        mov     esi,ebp
        movastr ;[>]
        ret
}    


previous variant is even not worked. But this variant is. I have no needance to control lengths so ecx is just -1.
Post 30 Nov 2018, 13:39
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.