flat assembler
Message board for the users of flat assembler.
Index
> Main > String length Goto page 1, 2, 3, 4, 5, 6 Next |
Author |
|
JohnFound 26 Oct 2013, 17:56
If you want to optimize for speed you definitely should not compare byte by byte. It is the slowest possible solution. Below is the version of StrLen procedure from FreshLib. As you can see, it reads and compares the string by double words.
But what is more important, using ASCIIZ is not the best option both for speed and security. As you can see, when the input string is native FreshLib string (With handle >= $c0000000) it uses format where on the address-4 the actual length is stored (so called Pascal strings). In this case, the speed of the StrLen is O(1), which is the fastest possible at all. Code: proc StrLen, .hString begin mov eax, [.hString] cmp eax, $c0000000 jb .pointer stdcall StrPtr, eax jc .error mov eax, [eax+string.len] clc return .error: xor eax, eax stc return .pointer: push ecx edx esi edi ; align on dword .byte1: test eax, 3 jz .scan cmp byte [eax], 0 je .found inc eax jmp .byte1 .scan: mov ecx, [eax] mov edx, [eax+4] lea eax, [eax+8] lea esi, [ecx-$01010101] lea edi, [edx-$01010101] not ecx not edx and esi, ecx and edi, edx and esi, $80808080 and edi, $80808080 or esi, edi jz .scan sub eax, 9 ; byte 0 was found: so search by bytes. .byteloop: lea eax, [eax+1] cmp byte [eax], 0 jne .byteloop .found: sub eax, [.hString] clc pop edi esi edx ecx return endp |
|||
26 Oct 2013, 17:56 |
|
Sasha 26 Oct 2013, 18:37
Scanning by dwords is fast, but I still can't understand how to do it, even looking at your code. And this is just to find a zero byte, and what about more complex scan, like parser? Fasm is very fast, it works with bytes as far as I could understand.
|
|||
26 Oct 2013, 18:37 |
|
HaHaAnonymous 26 Oct 2013, 19:53
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 19:40; edited 2 times in total |
|||
26 Oct 2013, 19:53 |
|
JohnFound 26 Oct 2013, 20:11
Well, it actually uses quad words, by using the parallel instruction execution.
|
|||
26 Oct 2013, 20:11 |
|
Sasha 26 Oct 2013, 20:30
HaHaAnonymous wrote: Off-topic note: Scanning byte by byte you can get length of millions of strings in less than a second if I'm not mistaken (human memory can fail). Let's take a javascript parser. Do you think it is optimized so hard, to scan by dwords and in parallel?) Btw, I still can't imagine how to scan text by dwords... parallel instruction execution, yes... May be sse? Note, also, that dwords must be aligned! |
|||
26 Oct 2013, 20:30 |
|
JohnFound 27 Oct 2013, 04:21
Sasha wrote: Btw, I still can't imagine how to scan text by dwords... Hm, did you read my source? What is not clear? Ask specific questions. _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
27 Oct 2013, 04:21 |
|
Sasha 27 Oct 2013, 07:29
Trying to understand the .scan: section. I understand less or more what should be there, but still don't see it. You make manipulations and than or esi,edi gives you an answer if there was a zero byte inside the qword.
But when I sad "can't imagine" I meant pars string for words and expressions. |
|||
27 Oct 2013, 07:29 |
|
JohnFound 27 Oct 2013, 08:14
"Parsing" strings usually means comparing one string with another. It is possible as well and pretty easy. You only need to carefully care about the end cases, when the string length is not multiply of 4 (8 ).
|
|||
27 Oct 2013, 08:14 |
|
uart777 27 Oct 2013, 08:28
John: Not true. Byte copy is not the slowest and what you presented is far from the fastest. With that "stdcall StrPtr", your method may be the slowest. And those are not parallel instructions.
Sasha: For small text, inline 8-32BIT copy/scan is fast. For long text, 64BIT-128BIT MMX/XMM copy/scan is recommended. N-01010101h subtracts 1 from each byte in 32BIT, and if it contains zero/s, sign bit/s will be set. Search for Agner Fog's site, optimization tutorial/PDF, one of the best references. |
|||
27 Oct 2013, 08:28 |
|
AsmGuru62 27 Oct 2013, 12:53
If we talking about parsing, then it is possible that the length of a string is not needed at all.
There is a method of parsing called 'indestructive parsing', where the string being parsed is not destroyed, i.e. zero bytes are not written into the string. Instead, the parser simply picks tokens into following structure: Code: struct TOKEN pchValue dd ? ; points to token text within parsed string nLength dd ? ; # of characters in the token ends This eliminates the need for taking the token length, since it is already will be known by scanning the parsed string. Also, an advantage of such method is that original string is left intact. Usual parsing will force a full copy of an original string to be parsed and destroyed by null characters. |
|||
27 Oct 2013, 12:53 |
|
Sasha 27 Oct 2013, 13:06
This a meant by parser. This is the way, I search for the begining of the token, while space or tab are delimiters, zero is a terminator, and I don't want to exceed the maxlen. Tested, working.
Code: proc FindToken Offset,MaxLen mov eax,[Offset] mov ecx,[MaxLen] add ecx,eax xor edx,edx dec eax .loop: inc eax cmp eax,ecx ja .finish_too_big mov dl,byte[eax] cmp edx,020h je .loop cmp edx,09h je .loop or edx,edx je .finish_no_token .finish: ;Returns an offset in eax ret .finish_too_big: or eax,-1 ret .finish_no_token: xor eax,eax ret endp Last edited by Sasha on 28 Oct 2013, 16:55; edited 2 times in total |
|||
27 Oct 2013, 13:06 |
|
Sasha 27 Oct 2013, 13:34
Add this proc, and we are ready to parse the command line!(tested,fixed,working)
Code: proc GetTokenLength Offset,MaxLen mov eax,[Offset] mov ecx,[MaxLen] add ecx,eax dec eax xor edx,edx cmp byte[eax+1],022h jne .loop inc eax .quotedstring: inc eax cmp eax,ecx ja .finish_too_big mov dl,byte[eax] cmp edx,022h je .finish_quote or edx,edx jne .quotedstring sub eax,[Offset] ret .loop: inc eax cmp eax,ecx ja .finish_too_big mov dl,byte[eax] cmp edx,020h je .finish cmp edx,09h je .finish or edx,edx jne .loop .finish: sub eax,[Offset] ret .finish_quote: inc eax sub eax,[Offset] ret .finish_too_big: or eax,-1 ret endp Some problems found, edited. Now the token can have a zero length, if we pointed to space. |
|||
27 Oct 2013, 13:34 |
|
Sasha 28 Oct 2013, 17:00
Quote:
This is what I've done. Quote: Usual parsing will force a full copy of an original string to be parsed and destroyed by null characters. Full copy doesn't destroy the original string, too. Also the full copy of a token has one good advantage: it will become aligned! |
|||
28 Oct 2013, 17:00 |
|
ejamesr 29 Oct 2013, 16:07
Sasha,
You can speed up small loops by eliminating a comparison operation. The idea is to have one register point to the end of the data you are scanning (your code currently sets ecx to point to the end, which is good). Then, use another register as a negative counter; set it equal to the negative count, and then increment until it is no longer negative, at which point the end of the data has been met. Here's the basic idea you can adapt: Code: mov eax, [Offset] mov ecx, [MaxLen] add eax, ecx ; point to end of data neg ecx ; negative counter .newLoop: mov dl, [eax+ecx] ; at first iteration, this grabs the very first byte ; ... do your tests here; use 'dl' rather than edx, as it's faster ; in this context. ; ... but if you want to use edx, change above ; to 'movzx edx, byte [eax+ecx] inc eax ; this one operations updates the counter AND ; checks to see if it is within bounds js .newLoop ; keep going as long as eax is negative ; loop flows through when end is reached .doFinish: add eax, [MaxLen] ; eax is neg, so add rather than sub ret I havne't compiled the above so there may be an error or two, it's up to you to adapt and test it. Enjoy! Eric P.S. In the .loop section of your code, ecx points to the first invalid position at the end of the string; you might consider using 'jae .finish_too_big', since if eax is equal to ecx, it's already past the end of the string. |
|||
29 Oct 2013, 16:07 |
|
ejamesr 29 Oct 2013, 16:38
John,
Since Sasha was focusing on speed here, I would like to point out two things that could make your code slightly faster and smaller (the speed difference is negligible and will not be seen in the vast majority of scenarios, so you can certainly disregard my comments). In the first snippet: Code: proc StrLen, .hString begin mov eax, [.hString] cmp eax, $c0000000 jb .pointer stdcall StrPtr, eax jc .error mov eax, [eax+string.len] clc return In the second snippet: Code: ; byte 0 was found: so search by bytes. .byteloop: lea eax, [eax+1] cmp byte [eax], 0 jne .byteloop Just my unasked-for two cents. Regards, Eric |
|||
29 Oct 2013, 16:38 |
|
JohnFound 29 Oct 2013, 17:26
ejamesr, thanks for the suggestions. These are really things I missed.
|
|||
29 Oct 2013, 17:26 |
|
Sasha 04 Nov 2013, 20:54
Quote:
Seems it's what I've done in the third example of strlen. Quote:
First, I thought, that MaxLen will be a maximum length of the string excluding the zero byte. But then I have understood, that it is more intuitive to pass the maximum available memory in the MaxLen, and have already changed the ja to jae. Thank you for your attention. PS. What do you think, is it more optimal and intuitive to use MaxAddr instead of MaxLen? MaxLen needs to be decremented every time, but MaxAddr always stays the same. |
|||
04 Nov 2013, 20:54 |
|
ejamesr 04 Nov 2013, 23:31
Correct... but only for your third example of strlen. And, Oops!, my untested code snippet had some errors, it should have been more like this:
Code: mov eax, [Offset] mov ecx, [MaxLen] add eax, ecx ; point to end of data neg ecx dec ecx ; use as negative counter .newLoop: inc ecx ; this one operation updates the counter AND ; checks to see if it is within bounds jns .finishTooBig ; exit when ecx no longer negative mov dl, [eax+ecx] ; at first iteration, this grabs the very first byte ; ... do your tests here; use 'dl' rather than edx, as it's faster ; in this context. ; ... but if you want to use edx, change above ; to 'movzx edx, byte [eax+ecx] ; loop flows through when token is reached .doFinish: lea eax, [eax+ecx] ; make eax point to token start ret My comment was actually directed to your implementations of FindToken and GetTokenLength. In both of those you use the following code: Code: inc eax
cmp eax,ecx
ja .finish_too_big If you use the 'negative counter' method I described here, you could eliminate the 'cmp eax, ecx' line and it would be slightly faster (using nine lines instead of ten for the inner loop). Study it carefully and you can see. And there's still at least a couple more small changes that could speed up your loop even more, if you wanted. Best wishes, Eric |
|||
04 Nov 2013, 23:31 |
|
revolution 04 Nov 2013, 23:35
Where are the speed measurement figures? Without measurement how do you know if you are improving anything?
|
|||
04 Nov 2013, 23:35 |
|
Goto page 1, 2, 3, 4, 5, 6 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.