flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
r22 26 Apr 2006, 19:30
String opcodes are slow. You can almost always write faster code without using specialized opcodes like stos, cmps*, and rep.
cmp dl, 0 je .found_end_s1 replace that with test dl,dl jz .found_end_s1 and you may get a little speed boost. Code: eax = al = byte to set to 000000XXh ebx = length (how many bytes to set) esi = address to start at memset: mov edx,01010101h mul edx ;;eax = XX XX XX XXh .LPqword: sub ebx,8 js .skip mov dword[esi+ebx+4],eax mov dword[esi+ebx],eax jmp .LPqword .skip: add ebx, 8 .finishup: dec ebx js .end mov byte[esi+ebx],al jmp .finishup .end: xor eax,eax ret 0 |
|||
![]() |
|
Patrick_ 26 Apr 2006, 20:37
Thanks, that's nice to know.
![]() |
|||
![]() |
|
LocoDelAssembly 26 Apr 2006, 20:45
Patrick_, did you check what happens if you change loop .comparison with dec ecx/jnz .comparison? If I remember well loop is worst than the equivalent instruction pair.
|
|||
![]() |
|
Patrick_ 27 Apr 2006, 00:27
locodelassembly wrote: Patrick_, did you check what happens if you change loop .comparison with dec ecx/jnz .comparison? If I remember well loop is worst than the equivalent instruction pair. Actually, yes I did, as I remember loop being very slow, also; however, no change whatsoever. |
|||
![]() |
|
Borsuc 01 May 2006, 11:00
AMD says you should use loop, so it's better anyway on AMD
![]() |
|||
![]() |
|
f0dder 01 May 2006, 11:12
The_Grey_Beast wrote: AMD says you should use loop, so it's better anyway on AMD It's not worth it considering the massive speed hit you'll take on Intel architecture, though. As for the string instructions, "rep movsd" and "rep stosd" are relatively fast, while cmps* is slow. Even movsd/stosd can be beaten though, especially if you use the non-cached writes... you have to ask yourself, though, what minimum CPU you want to require ![]() _________________ ![]() |
|||
![]() |
|
Borsuc 01 May 2006, 13:12
Yep "rep movsd" are especially useful when doing "inline" things, that is without calling functions.
If I recall correctly loop is the same as "dec ecx/je" on Pentium, but it's faster on AMD. Why not use it? (ok, correct me if i'm wrong) and it's also smaller, if you strive for size |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.