flat assembler
Message board for the users of flat assembler.

Index > DOS > Uppercasing characters

Author
Thread Post new topic Reply to topic
adroit



Joined: 21 Feb 2010
Posts: 252
adroit 26 Jun 2010, 00:46
I was trying to find a way to convert lower cased characters to upper cased character, when I found a tutorial on ASCII characters. Then, I conjured up the code below.

What the upcase function does is, it takes a character and checks its range, which is between 60h - 7Bh. (The character _is_ a common letter).
If the character falls between that range, it is AND by 11011111b.
The 1 bits will return the original values but, the 0 at bit 5 will toggle that bit to 0 which will set it as a upper cased character.
Code:
   org 100h

   macro pause
   {
         push  ax
         xor   ax,ax
         int   16h
         pop   ax
   }

   jmp start

   message db "hello world!",0

   start:
         mov   si,message
         call  upcase          ; convert to uppercased
         inc   si              ; clear last value in SI
         mov   si,message
         call  pstr            ; go print string
         pause                 ; pause to view result
         int 20h               ; exit program




;===========================================
; FUNCTIONS
;-------------------------------------------
;  Converts lower cased character into upper
;  cased.
;-------------------------------------------
   upcase:
   ;in:  si = string

         push  ax
         xor   ax,ax           ; clear AX
      .loop:
         mov   al,[si]         ; load byte from SI
         cmp   al,60h          ; is byte < 'a'
         jbe   @f              ; then, go get next byte
         cmp   al,7Bh          ; is byte > 'z'
         jae   @f              ; then, go get next byte
                               ; Only bytes in the range of 'a' - 'z are converted.

         and   al,11011111b    ; toggle bit 5 to 0
       @@:
         mov   [si],al         ; update character stored in data, with uppercased character
         inc   si              ; step forward to next character
         or    al,al           ; is this the end of the string?
         jz   .done            ; if so, then return.
         jmp  .loop            ; else, go convert next character

      .done:
         pop   ax
         ret

;-------------------------------------------
;  Prints string
;-------------------------------------------
   pstr:
   ;in:  si = string
      .loop:
         lodsb
         or    al,al
         jz   .done
         mov   ah,0Eh
         int   10h
         jmp  .loop

      .done:
         ret    

_________________
meshnix
Post 26 Jun 2010, 00:46
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 26 Jun 2010, 01:04
This is not the best but is very simple Razz
Code:
toupper: ; DS:SI -> ASCIIZ string
        lodsb
        test    al,al
        jz      .finished
        cmp     al,'a'
        jb      toupper
        cmp     al,'z'
        ja      toupper
        sub     al,32
        dec     si
        mov     byte[ds:si],al
        inc     si
        jmp     toupper
  .finished:
        ret      

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 26 Jun 2010, 01:04
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 26 Jun 2010, 01:36
Lets simplify it further Razz
Code:
toupper: ; DS:SI -> ASCIIZ string
        lodsb
        test    al,al
        jz      .finished

        sub     al,'a'           ; --+
        cmp     al,'z'-'a'       ;   |
        ja      toupper          ;   |
                                 ;   |
        sub     al, 'a'-'A' - 'a'; <-+
        mov     [si-1],al
        jmp     toupper

  .finished:
        ret    
Post 26 Jun 2010, 01:36
View user's profile Send private message Reply with quote
b1528932



Joined: 21 May 2010
Posts: 287
b1528932 26 Jun 2010, 11:46
character upper/lower case isnt something you can convert to.

you need a character map wich include entries for characters containing lowercase and uppercase version (and field indicating presence of both of them, one, or none). Another thing is encoding, charc can be stored as ansi, utf16, utf8 or other methods. this require not only a map, but an api.
Post 26 Jun 2010, 11:46
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 26 Jun 2010, 12:23
b1528932: This is the DOS forum. So things like UTF8, ANSI etc. are pretty much ruled out for most programs. I think it is safe to assume that ASCII will be the character set used in 99.99% of cases.
Post 26 Jun 2010, 12:23
View user's profile Send private message Visit poster's website Reply with quote
adroit



Joined: 21 Feb 2010
Posts: 252
adroit 26 Jun 2010, 12:54
bitshifter, it is quite simpler than mine -- easier to understand too.

LocoDelAssembly,
Code:
...
sub al, 'a'='A' - 'a'
...    
Nice, but it computes to -65. Does the compiler treat signed integers as unsigned integers, or does it just wrap around?

b1528932, when I said convert I didn't mean converting ASCII to Unicode.
Post 26 Jun 2010, 12:54
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 26 Jun 2010, 12:59
MeshNix wrote:
... it computes to -65. Does the compiler treat signed integers as unsigned integers, or does it just wrap around?
Values from -128 to +255 are valid for 8-bit types.

As you might expect, -1==255, ... ,-128==+128.
Post 26 Jun 2010, 12:59
View user's profile Send private message Visit poster's website Reply with quote
adroit



Joined: 21 Feb 2010
Posts: 252
adroit 26 Jun 2010, 13:16
This would mean that -65==191, doesn't it?
Post 26 Jun 2010, 13:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 26 Jun 2010, 13:19
MeshNix wrote:
This would mean that -65==191, doesn't it?
Yes
Post 26 Jun 2010, 13:19
View user's profile Send private message Visit poster's website Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1391
Location: Piraeus, Greece
Picnic 26 Jun 2010, 13:32
this should also work, although raises LocoDelAssembly's code to 21 bytes.

Code:
.if ( al > 60h & al < 7bh )
 xor al, 20h       ; toggle case
.endif
    
Post 26 Jun 2010, 13:32
View user's profile Send private message Visit poster's website Reply with quote
adroit



Joined: 21 Feb 2010
Posts: 252
adroit 26 Jun 2010, 17:00
revolution, how does this actually work? -65 was treated as if it was unsigned. Does signed or unsigned matter in ASCII?



Picnic wrote:
this should also work, although raises LocoDelAssembly's code to 21 bytes.
Code:
.if ( al > 60h & al < 7bh ) 
        xor al, 20h       ; toggle case 
.endif 
    

Actually it would make the character lowercased.
Try:
Code:
.if ( al > 40h & al < 5Bh )
        xor al, 20h       ; toggle case
.endif    

Great! Now I now how to do reverse uppercasing.
Post 26 Jun 2010, 17:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20356
Location: In your JS exploiting you and your system
revolution 26 Jun 2010, 17:28
The CPU doesn't know or care about signed/unsigned, it is all just binary. Only the programmers will know about signs when testing the flags.
Post 26 Jun 2010, 17:28
View user's profile Send private message Visit poster's website Reply with quote
adroit



Joined: 21 Feb 2010
Posts: 252
adroit 26 Jun 2010, 18:02
Oh, I see; it's up to the programmer.
Post 26 Jun 2010, 18:02
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4347
Location: Now
edfed 26 Jun 2010, 21:00
maybe just a 256 bytes look up table can be really enough and better.
for example, to change the case of éàèêîëï, all these chars are in ascii, depending on the charset used of course.
Post 26 Jun 2010, 21:00
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1903
DOS386 27 Jun 2010, 11:26
> http://board.flatassembler.net/topic.php?t=9736 search for "SSUPPER"

> to change the case of éàèêîëï, all these chars are in ascii, depending on the charset used of course.

Where ??? This never worked AFAIK Laughing
Post 27 Jun 2010, 11:26
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 28 Jun 2010, 06:36
revolution wrote:
b1528932: This is the DOS forum. So things like UTF8, ANSI etc. are pretty much ruled out for most programs. I think it is safe to assume that ASCII will be the character set used in 99.99% of cases.


It seems Europeans are more conscientious of this than others. And, IIRC, even Tomasz uses this (int 21h, 652xh) in FASMD:

http://www.delorie.com/djgpp/doc/rbinter/id/77/31.html

P.S. I always just used "and dl,0DFh" to uppercase. Toggle is "xor dl, 20h". And, unless I'm remembering wrong, "or dl, 20h" will lowercase.
Post 28 Jun 2010, 06:36
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.