flat assembler
Message board for the users of flat assembler.

Index > Main > string array compare

Author
Thread Post new topic Reply to topic
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 12 Jun 2011, 18:14
Hi.

I have an array of strings and i'd like to check if my source matches with some of the strings.

Code:
str_array dd str_list.a,str_list.c,str_list.f, etc.

str_list:
 .a db 'apple',0
 .c db 'chair',0
 .f db 'foo',0
 etc.

source db 'weee',0    

Very basic routine i suppose, but i'm a little bit messy Embarassed

_________________
Sorry if bad english.
Post 12 Jun 2011, 18:14
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 12 Jun 2011, 18:30
Do you have a question? Or some code you wish to contribute?

Or did you just want to find an excuse to post "weee"? Razz
Post 12 Jun 2011, 18:30
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 12 Jun 2011, 18:43
my question is implicit, i want a solution <.>

my current code just compare the first 4 bytes in edx, lol.

Code:
bnf_instruction:
        push    esi
        push    edi

        mov     eax,instr_list ; string array
        xor     ecx,ecx

.for:   mov     esi,keyword ; source
        lea     edi,[eax+ecx*4]

        mov     edx,[esi]
        cmp     [edi],edx
        jne     @f
        mov     eax,1
        jmp     .done
        @@:

        inc     ecx
        cmp     ecx,instr_list.size
        jne     .for

        xor     eax,eax
 .done:
        pop     edi
        pop     esi
        ret     


PS: i dont like to put code bc I like to see the solutions without influence with my codes. Also bc my codes are weird(?) for me.
Post 12 Jun 2011, 18:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 12 Jun 2011, 18:52
If you only need a simple solution then just do a byte-by-byte compare until you get either zero or a mismatch.

There is no need to complicate with dword compares. If your string is only 1 or 2 characters then you could get incorrect results. Also strings that match the first 4 characters might not match later characters.

PS: it is always best to post your code. At least that shows you have tried to do it and are not just being lazy and asking others to do work for you.
Post 12 Jun 2011, 18:52
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 14 Jun 2011, 23:52
there we go:

Code:
bnf_instruction:
        push    esi
        push    edi

        mov     ebx,instr_list
        mov     edx,keyword
        mov     ecx,instr_list.size
.for:   mov     esi,edx
        mov     edi,[ebx+ecx*4]

    @@: mov     al,[esi]
        cmp     [edi],al
        jne     .neq
        inc     esi
        inc     edi
        test    al,al
        jnz     @b
        jmp     .done

  .neq: dec     ecx
        jns     .for

 .done: pop     edi
        pop     esi
        ret    


I really need this to be fast. Can we optimize it?

_________________
Sorry if bad english.
Post 14 Jun 2011, 23:52
View user's profile Send private message Reply with quote
idle



Joined: 06 Jan 2011
Posts: 440
Location: Ukraine
idle 15 Jun 2011, 02:02
Code:
.for:   mov     edi,[ebx+ecx*4]      ;warning: +4 for element #1
                                     ; you could keep number of items at offset 0
                                     ; warning disappears then
...
        test    al,al                ;pedantic shift
        jz      .done
        inc     esi
        inc     edi
        jmp     @b
    
Post 15 Jun 2011, 02:02
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 15 Jun 2011, 02:57
Teehee wrote:
I really need this to be fast. Can we optimize it?
Sure we can. But you don't need to alter the byte-for-byte matching code since that would only give you a linear, and insignificant, speed-up.

Instead this will give exponential speed up:

Sort the elements as they are added.
Use a binary search to find elements.
Post 15 Jun 2011, 02:57
View user's profile Send private message Visit poster's website Reply with quote
Coty



Joined: 17 May 2010
Posts: 553
Location: &#9216;
Coty 15 Jun 2011, 03:12
You could use REPE/REPZ, CMPS/CMPSB/CMPSW/CMPSD and JZ/JNZ instructions witch would be slightly faster, and more faster on Hyper threading CPUs (Since Intel says to avoid loops on hyper threading because its slower)...

Here is a small example using DOS, but you could use it even in 64bit mode... (Well the actual compare code anyway)

Code:
org 0x0100
use16

        cld                    ; set direction flag, so we read from left to right.
        mov  di, string_one
        mov  si, string_two
        mov  cx, 6             ; Maximum string size. (imma do 6 for "weeeee")
        ;----------------------;
        ; Here we compare...   ;
        ;----------------------;
        repe cmpsb             ; repete while equal; compare string, byte by byte
                               ;    NOTE: we can also use "cmpsw" to compare word by word 
                               ;    and "cmpsd" to compare dword by dword.
        jnz  not_equal         ; If there not equal jump to the not equal label...
        ;----------------------;
        ; Display our cool msg ;
        ;----------------------;
        mov  ah, 0x09
        mov  dx, imahappyhappyhorse
        int  0x21
        jmp  _end
        ;----------------------;
        ; Display our fail msg ;
        ;----------------------;
not_equal:
        mov  ah, 0x09
        mov  dx, NOOOOOOOOOOOOOOOOO
        int  0x21

_end:
        xor  ax, ax
        int  0x16
        int  0x20

string_one: db "weeeee"
string_two: db "weee3e"

imahappyhappyhorse: db "Strings are equal!$"
NOOOOOOOOOOOOOOOOO: db "Strings not equal!$"     

_________________
http://codercat.org/
Post 15 Jun 2011, 03:12
View user's profile Send private message Send e-mail Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 15 Jun 2011, 12:01
Quote:
Sort the elements as they are added.
Use a binary search to find elements.


ouch, its time to implement a binary search Surprised

@Coty

but i need to know the string lenght in order to use 'rep cmpsb'. will I need to spend time to count it lenghts or i just need to set a max value (like 15, once my biggest world has that size) and it will repeat just while the bytes are equals regardless of the loop lenght?

_________________
Sorry if bad english.
Post 15 Jun 2011, 12:01
View user's profile Send private message Reply with quote
Coty



Joined: 17 May 2010
Posts: 553
Location: &#9216;
Coty 15 Jun 2011, 12:27
What I normally do is set (E)CX to something like 20, then create buffers for my strings 20 bytes in length.

Code:
string_one: db "weeeee", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
string_two: db "weee3e", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0    
string_three: db "longwordorIthinkso", 0, 0
    


Note that if you don't add the buffers repe cmpsb will continue to read on past your string until data is un-equal! Or CX expires, this may give undesired results.

_________________
http://codercat.org/
Post 15 Jun 2011, 12:27
View user's profile Send private message Send e-mail Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4352
Location: Now
edfed 15 Jun 2011, 13:05
coty, this method isn't good at all.

it is very hard to write strings like that.
why not use align 32, then, words with up to 31 letters (align 32-db 0)

but it is by far, better to do with the list in the teehees first post.

by this way, it will let the list become better, containing a lot of information by expending fields, and so on.

to compare the strings, it is very simple.


just give two pointers to your comparator.

for example, (e)di for the sting to compare, and (e)si for the dictionary word.

to define the si register, just load the pointer to current word in list

le list will contain N elements. if every elements are dword pointers, means that the size in byte will give us the numer of elements*4
this count will be a value, somewhere in the source (a label, a dword...)

it is better to record this size as a dword, for example, the first dword of the list is the size of the list.

the edi string (our string to seek) is just a string like db 'string',0

the esi string comes from a list, then, from si if we are vicious.
ebx then will be used as index in the dictionary.
ebx can be picked from the size of the list, and decremented for every comparaison.

Code:
mov edi,stringtotest
mov esi,dictionary
mov ebx,[esi]
shr ebx,2
dec ebx
jl .end
@@:
mov esi,[esi+ebx*4+4] ;point to the element(ebx*4) of the list(+4)
call compare
jc .ok
dec ebx
jnl @b
.end:
clc
.ok:
ret
;....
compare:
;edi=string to compare
;esi=string from dictionary
;...type your code here
.ok:
stc
ret
    


the code of the compare function cannot be more than... 10 lines, and will pass the result with carry flag.
if carry, then, string found.
Post 15 Jun 2011, 13:05
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 15 Jun 2011, 18:10
i thought you can just loop through the comparative string and check byte by byte against the base string, and if you run into a null character you are done.

if base string is greater than the latter string, then there's no match. vice versa.
Post 15 Jun 2011, 18:10
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.