flat assembler
Message board for the users of flat assembler.
Index
> Main > Why is this code _so_ much slower? |
Author |
|
r22 26 Apr 2006, 19:30
String opcodes are slow. You can almost always write faster code without using specialized opcodes like stos, cmps*, and rep.
cmp dl, 0 je .found_end_s1 replace that with test dl,dl jz .found_end_s1 and you may get a little speed boost. Code: eax = al = byte to set to 000000XXh ebx = length (how many bytes to set) esi = address to start at memset: mov edx,01010101h mul edx ;;eax = XX XX XX XXh .LPqword: sub ebx,8 js .skip mov dword[esi+ebx+4],eax mov dword[esi+ebx],eax jmp .LPqword .skip: add ebx, 8 .finishup: dec ebx js .end mov byte[esi+ebx],al jmp .finishup .end: xor eax,eax ret 0 |
|||
26 Apr 2006, 19:30 |
|
Patrick_ 26 Apr 2006, 20:37
Thanks, that's nice to know.
|
|||
26 Apr 2006, 20:37 |
|
LocoDelAssembly 26 Apr 2006, 20:45
Patrick_, did you check what happens if you change loop .comparison with dec ecx/jnz .comparison? If I remember well loop is worst than the equivalent instruction pair.
|
|||
26 Apr 2006, 20:45 |
|
Patrick_ 27 Apr 2006, 00:27
locodelassembly wrote: Patrick_, did you check what happens if you change loop .comparison with dec ecx/jnz .comparison? If I remember well loop is worst than the equivalent instruction pair. Actually, yes I did, as I remember loop being very slow, also; however, no change whatsoever. |
|||
27 Apr 2006, 00:27 |
|
Borsuc 01 May 2006, 11:00
AMD says you should use loop, so it's better anyway on AMD
|
|||
01 May 2006, 11:00 |
|
f0dder 01 May 2006, 11:12
The_Grey_Beast wrote: AMD says you should use loop, so it's better anyway on AMD It's not worth it considering the massive speed hit you'll take on Intel architecture, though. As for the string instructions, "rep movsd" and "rep stosd" are relatively fast, while cmps* is slow. Even movsd/stosd can be beaten though, especially if you use the non-cached writes... you have to ask yourself, though, what minimum CPU you want to require _________________ - carpe noctem |
|||
01 May 2006, 11:12 |
|
Borsuc 01 May 2006, 13:12
Yep "rep movsd" are especially useful when doing "inline" things, that is without calling functions.
If I recall correctly loop is the same as "dec ecx/je" on Pentium, but it's faster on AMD. Why not use it? (ok, correct me if i'm wrong) and it's also smaller, if you strive for size |
|||
01 May 2006, 13:12 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.