flat assembler
Message board for the users of flat assembler.
Index
> Windows > new version of dynamic string library Goto page 1, 2, 3 Next |
Author |
|
roticv 19 Sep 2003, 08:26
Replacing string opcodes with branches? I think string opcodes are slow..
|
|||
19 Sep 2003, 08:26 |
|
decard 19 Sep 2003, 13:05
Hi roticv,
I have looked into some documents about optimizing assembly code, and realized that you're right. Your version of StrLen should run faster on Pentium and above (don't know about older mashines). I don't have enough experience with more complex optimization (all those V and U pipes, branch prediction... too difficult to care by now), and as I think optimizing string operations is very important, so meaby you would be a better person to take the StrLib? What do you think? regards, decard |
|||
19 Sep 2003, 13:05 |
|
scientica 19 Sep 2003, 13:15
I have an (old) tool which shows how the code will pair, I'll see if I can find it on the net (if not I'll uppload it unless I find some text that prohibits it)
_________________ ... a professor saying: "use this proprietary software to learn computer science" is the same as English professor handing you a copy of Shakespeare and saying: "use this book to learn Shakespeare without opening the book itself. - Bradley Kuhn |
|||
19 Sep 2003, 13:15 |
|
roticv 19 Sep 2003, 13:15
Try this
Code: strlen: mov ecx, [esp+4] ; first paramter code_base: mov eax, 1 cpuid test edx, 800000h db 2Eh ;prediction.hintnot taken jz no_mmx_code mmx_code: @@: mov al, byte ptr [ecx] inc ecx test al, al je done test ecx, 7 jne @B pxor mm0, mm0 @@: movq mm1, qword [ecx] movq mm2, qword [ecx + 8] movq mm3, qword [ecx + 16] movq mm4, qword [ecx + 24] movq mm5, qword [ecx + 32] movq mm6, qword [ecx + 40] pcmpeqb mm1, mm0 pcmpeqb mm2, mm0 pcmpeqb mm3, mm0 pcmpeqb mm4, mm0 pcmpeqb mm5, mm0 pcmpeqb mm6, mm0 por mm1, mm2 por mm3, mm4 por mm5, mm6 por mm1, mm3 por mm1, mm5 add ecx, 48 packsswb mm1, mm1 movd eax, mm1 test eax, eax jz @B sub ecx, 48 emms no_mmx_code: cmp byte [ecx],0 lea ecx, [ecx+1] jnz no_mmx_code sub ecx, [esp][4] xchg eax, ecx dec eax ;return value in eax Don't worry about optimisation, we will optimise it while we go along... Anyway don't mind if I remove the stack frame. I do not see the need for stack frame for string functions. Don't mind if any mistakes pop up. I was coding in a notepad and did not attempt to compile the code. |
|||
19 Sep 2003, 13:15 |
|
scientica 19 Sep 2003, 13:19
_________________ ... a professor saying: "use this proprietary software to learn computer science" is the same as English professor handing you a copy of Shakespeare and saying: "use this book to learn Shakespeare without opening the book itself. - Bradley Kuhn |
|||
19 Sep 2003, 13:19 |
|
roticv 19 Sep 2003, 13:37
okay just realised that I made a tiny mistake since ebx and ecx is modified by cpuid (Dammable)
Code: strlen: code_base: mov eax, 1 push ebx cpuid mov ecx, [esp+4] ; first paramter test edx, 800000h db 2Eh ;prediction.hintnot taken jz no_mmx_code mmx_code: @@: mov al, byte ptr [ecx] inc ecx test al, al je done test ecx, 7 jne @B pxor mm0, mm0 @@: movq mm1, qword [ecx] movq mm2, qword [ecx + 8] movq mm3, qword [ecx + 16] movq mm4, qword [ecx + 24] movq mm5, qword [ecx + 32] movq mm6, qword [ecx + 40] pcmpeqb mm1, mm0 pcmpeqb mm2, mm0 pcmpeqb mm3, mm0 pcmpeqb mm4, mm0 pcmpeqb mm5, mm0 pcmpeqb mm6, mm0 por mm1, mm2 por mm3, mm4 por mm5, mm6 por mm1, mm3 por mm1, mm5 add ecx, 48 packsswb mm1, mm1 movd eax, mm1 test eax, eax jz @B sub ecx, 48 emms no_mmx_code: cmp byte [ecx],0 lea ecx, [ecx+1] jnz no_mmx_code pop ebx sub ecx, [esp][4] xchg eax, ecx dec eax ;return value in eax Last edited by roticv on 19 Sep 2003, 13:58; edited 2 times in total |
|||
19 Sep 2003, 13:37 |
|
Tomasz Grysztar 19 Sep 2003, 13:49
An offtopic bit: you are using the following construction in your code (the feature of latest Intel processors):
Code: db 2Eh ;prediction.hintnot taken jz no_mmx_code It's enough to write it this way: Code: cs jz no_mmx_code and if you want it to be more logical, you can define some aliases for this purpose, for example: Code: lt equ ds ; likely taken ut equ cs ; unlikely taken ut jz no_mmx_code or even define them as macros, to allow them as a prefixes only... |
|||
19 Sep 2003, 13:49 |
|
JohnFound 19 Sep 2003, 14:03
Hi, guys.
IMO: We need no speed optimization, especialy in exchange of size. Maybe later we will make some ultra fast libraries. Making the strlib without string functions (mov al, [esi]/inc esi instead of lodsb) is good because it don't make so big code overbloat and it's easy for reading by beginers, but doubling routines with and without MMX is not a good idea I think. Regards. |
|||
19 Sep 2003, 14:03 |
|
decard 19 Sep 2003, 14:47
Well... I agree that by now we shouldn't optimize string functions with MMX. (BTW: doesn't the cpuid make StrLen too slow?... or maybe not?), so in next release there will be StrLen starting from "no_mmx_code: "...
roctiv, IMO you are right about no need for stack frame... I was just simply converting those routines to stdcall with macros In next release it will be fixed. regards |
|||
19 Sep 2003, 14:47 |
|
JohnFound 19 Sep 2003, 15:25
decard wrote: roctiv, IMO you are right about no need for stack frame... I was just simply converting those routines to stdcall with macros In next release it will be fixed. Of course roticv is right, but only, please, please, please, keep the readability of the source. It's very important. Describe parameters very clearly and what [sp+???] corresponds with what parameter. Note that if you use the stack the offset will be different for the same parameter in diferent places of the routine - possible bugs. regards. |
|||
19 Sep 2003, 15:25 |
|
decard 20 Sep 2003, 10:59
When I was testing stdcall version of StrLib, I forgot about StrDel.... and of course it was having one stupid bug... It's fixed now, and now the release includes roticv's version of StrLen.
BTW: what do you thing about the NumToStr routine: is it better to create two functions (one for unsigned numbers, and one for signed), or maybe to code one function with additional parameter that will specify whether to threat the number as signed or unsigned...?? |
|||
20 Sep 2003, 10:59 |
|
scientica 20 Sep 2003, 11:44
One word from me just, will you write a "wrapper" for the str functions, so that registers are preserved (but still leaving the register param passing version avalible -- for compabllity and "hand in hand code").
_________________ ... a professor saying: "use this proprietary software to learn computer science" is the same as English professor handing you a copy of Shakespeare and saying: "use this book to learn Shakespeare without opening the book itself. - Bradley Kuhn |
|||
20 Sep 2003, 11:44 |
|
decard 20 Sep 2003, 11:56
OK, no problem... by compatibility you mean to preserve the function name? I wanted NumToStr to be name of a wrapper function, but you are the StrLib user
But what about my question? |
|||
20 Sep 2003, 11:56 |
|
scientica 20 Sep 2003, 12:05
Suggestion for wrapper names:
StrNum and StrNumU IMO it's better to have two functions rather than one with a argument specifying wether it's a signed or unsigned number. _________________ ... a professor saying: "use this proprietary software to learn computer science" is the same as English professor handing you a copy of Shakespeare and saying: "use this book to learn Shakespeare without opening the book itself. - Bradley Kuhn |
|||
20 Sep 2003, 12:05 |
|
JohnFound 20 Sep 2003, 12:16
decard wrote: BTW: what do you thing about the NumToStr routine: is it better to create two functions (one for unsigned numbers, and one for signed), or maybe to code one function with additional parameter that will specify whether to threat the number as signed or unsigned...?? The NumToStr functions are two: NumToStr (signed) and NumToStrU (unsigned). I think that we must rename these functions to _NumToStr and _NumToStrU and write some wraper function: Code: ntsSigned = $00000 ntsUnsigned = $10000 ntsZeroTerminated = $20000 ntsFixedWidth = $40000 ntsBin = $02 ntsQuad = $04 ntsOct = $08 ntsDec = $0a ntsHex = $10 ;*********************************************************** ; NumToStr - converts number to any radix. ; num - number to convert ; str - handle of the string. If NULL - creates new string. ; index - Offset in string where to put converted number. ; flags: ; byte 0 - contains radix for the convertion. ; byte 1 - number of digits if ntsFixedWidth is set. ; byte 2,3 - flags. ; Returns: ; eax - handle of the string (new one or passed in [str]) ; edx - pointer to the string. ; ;*********************************************************** proc NumToStr, num, str, index, flags ; Exmple of using: stdcall NumToStr, $12345, NULL, 1, ntsUnsigned or ntsHex mov byte [edx], '$' |
|||
20 Sep 2003, 12:16 |
|
decard 20 Sep 2003, 13:03
Well, John, I like your idea. That would be a very powerful routine... To get more 'specified' routines we could use some macros...
But what about 'ntsZeroTerminated' flag? what would be its purpouse? regards |
|||
20 Sep 2003, 13:03 |
|
JohnFound 20 Sep 2003, 13:11
decard wrote: But what about 'ntsZeroTerminated' flag? what would be its purpouse? When you convert num to str, it's rare case when you need plain string with only one number in it. In the most cases you need to insert the string with number in some other string with some other text. Because of that I remove zero terminator from original NumToStr functions. [Index] argument is for same reason. Of course you can make plain number string and then use other string functions to concatenate it with any other string, but in most cases it is not optimal. regards. |
|||
20 Sep 2003, 13:11 |
|
decard 20 Sep 2003, 13:21
sounds good , so I'm starting to code it....thanks!
|
|||
20 Sep 2003, 13:21 |
|
roticv 20 Sep 2003, 16:40
Attached is one StrLCase, one StrUCase, one StrCopyMMX. One thing is that I preserved all the registers that was used, uncomment that if the register preservation is not needed. :/ Grr.. irritating... txt file not accepted.
|
|||||||||||
20 Sep 2003, 16:40 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.