flat assembler
Message board for the users of flat assembler.

Index > Main > smallest case insenstive compare?

Author
Thread Post new topic Reply to topic
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 26 Jun 2011, 17:31
There are many fast case-insensitive string comparison routines, but I would like to see the smallest case-insensitive string comparison routine. Here is my small (23 bytes) code snippet for case insensitive comparing of 2 strings.
Code:
equal:       ;some code here for match
   ;e.g.   cmp     al,0    ;test for end of both strings
scan:      mov     ah,[edi]
    lodsb
       inc     edi
 xor     ah,al
       je      equal
       cmp     ah,20h  ;test for possible upper vs lower case
      jne     fail
        or      ah,al           ;coerce to lower case
       sub     ah,'a'
    cmp     ah,'z'-'a'
      jbe     equal
fail:      ;some code here for fail
    ;e.g.   cmp     al,13   ;ignore cr/lf    
This won't work with accented letters, but otherwise should work ok Does anyone has something smaller? Many thanks!


Last edited by wht36 on 28 Jun 2011, 07:19; edited 1 time in total
Post 26 Jun 2011, 17:31
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 26 Jun 2011, 20:27
20 bytes, possibly dangerous, somewhat incomplete, and totally untested...
Code:
stricmp:
; in: ds:si -> string 1
;     ds:di -> string 2
  lodsb
  and  al,0xDF  ;force uppercase
  xchg al,ah
  xchg si,di
  lodsb
  xchg si,di
  and  al,0xDF  ;force uppercase
  cmp  al,ah
  jnz  notsame
  test ah,ah
  jnz  stricmp
same:
; ...
notsame:
; ...
    

Now wheres Loco when you need him?
Post 26 Jun 2011, 20:27
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 26 Jun 2011, 22:04
bitshifter, not sure what do you expect from me? Confused Maybe it is MHajduk who you are actually waiting for? Razz

Anyway, your code will handle some cases incorrectly, for instance "[" will be the up-cased version of "{", so the comparison wht36 makes to make sure the byte is in the ['a'..'z'] range is necessary.

If this will become some sort of contest for inlined pseudo-stricmp code, both yours and wht36's submission would be incorrect, it should be part of the solution to stop at the first occurrence of char zero, otherwise it will always result in fail/notsame situation or access violation (or even infinite loop in 16-bit environment if the proper conditions are given like comparing buffers full of zeroes).
Post 26 Jun 2011, 22:04
View user's profile Send private message Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1404
Location: Piraeus, Greece
Picnic 26 Jun 2011, 22:23
Twenty messy bytes that just came out of my head.
I'm too sleepy to try nineteen.

In: si, di
Out: zero flag set = equal

Code:
equal:
                  lodsb
                       or al, al
                   jz @F
                       mov ah, [di]
                        inc di
                      or ax, 2020h    ;force lowercase
                    cmp al, ah
                  jz equal
                    ret
                 @@:
                     cmp byte [di], 0
                    ret
    


Zzz...
Post 26 Jun 2011, 22:23
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 26 Jun 2011, 23:06
31 bytes (and hopefully bulletproof)
Code:
uprswp:
  lodsb
  xchg si,di
  cmp  al,'a'
  jb   .done
  cmp  al,'z'
  ja   .done
  sub  al,32
.done:
  ret

stricmp:
; in:  ds:si -> string 1, ds:di -> string 2
; out: zf = 1 if same, zf = 0 if not same
  call uprswp
  xchg ah,al
  call uprswp
  cmp  ah,al
  jnz  .done
  cmp  al,0
  jnz  stricmp
.done:
  ret
    

Although the real stricmp returns < = >
and follows an ABI calling convention...

In the real world:
I would give up the couple of bytes to change uprswp into toupper
just so it could be reused later in life (if part of a library or something)
Post 26 Jun 2011, 23:06
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 27 Jun 2011, 04:33
30 bytes (some rearranging and call/ret bashing)
Code:
; in:  ds:si -> string 1, ds:di -> string 2
; out: zf = 1 if same, zf = 0 if not same

stricmp:
  push .here
.muck:
  lodsb
  xchg si,di
  cmp  al,'a'
  jb   .done
  cmp  al,'z'
  ja   .done
  sub  al,32
.done:
  ret
.here:
  xchg ah,al
  call .muck
  cmp  ah,al
  jnz  .done
  cmp  al,0
  jnz  stricmp
  ret    

I still have not tested any of this crap Shocked
Post 27 Jun 2011, 04:33
View user's profile Send private message Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 27 Jun 2011, 08:28
LocoDelAssembly wrote:
...If this will become some sort of contest for inlined pseudo-stricmp code, both yours and wht36's submission would be incorrect, it should be part of the solution to stop at the first occurrence of char zero, otherwise it will always result in fail/notsame situation or access violation (or even infinite loop in 16-bit environment if the proper conditions are given like comparing buffers full of zeroes).

Ok, for a full strcmpi, it would be 28 bytes
Code:
equal:  ;some code here for match
        cmp     al,0    ;test for end of both strings
       je      done

strcmpi:        ;compares ESI to EDI; returns ZF set if complete match, ESI & EDI after last char matched
       mov     ah,[edi]
        lodsb
        inc     edi
        xor     ah,al
        je      equal
        cmp     ah,20h  ;test for possible upper vs lower case
        jne     fail
        or      ah,al           ;coerce to lower case
        sub     ah,'a'
        cmp     ah,'z'-'a'
        jbe     equal
fail:   ;some code here for fail
        ;e.g.   cmp     al,13   ;ignore cr/lf
done:      ret    

With a bit more mangling, bitshifter's code would also be 28 bytes as well (haven't tested it though)
Code:
stricmp:
  push .here
.muck:
  lodsb
  xchg si,di
  sub  al,'a'
  cmp  al,'z'-'a'
  ja   .done
  sub  al,32
.done:
  ret
.here:
  xchg ah,al
  call .muck
  cmp  ah,al
  jnz  .done
  cmp  al,0-'a'
  jnz  stricmp
  ret    
Post 27 Jun 2011, 08:28
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 27 Jun 2011, 10:38
So a SSE4.2 approach would fail here because right out of the box it will take 41 bytes. Adding por xmm,[MASK_20h] and some code to make it work in a case insensitive way, it will go up to 66 bytes.

strlen would be a better candidate for this kind of comp. http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
Post 27 Jun 2011, 10:38
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.