flat assembler
Message board for the users of flat assembler.

Index > Main > String to Integer and vice versa

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 04 Dec 2009, 05:13
I'm trying to make a string(consisting of 1-9) to integer converter. The only easy way I could think of was to get str length, say I had "1587" then do something like,
1*10^4
5*10^3...
I know how to do all that is required to do this, except the "^x" part. Also, if anyone knows how to convert from multi-digit numbers to strings, that would be greatly appreciated too.
Post 04 Dec 2009, 05:13
View user's profile Send private message Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak 04 Dec 2009, 05:22
From number, use division by 10 then add '0' (yes, the character) until it's empty, but you'll have to store the character backwords. From string, take a character, subtract '0', multiply by 10, then redo until end of string.
Post 04 Dec 2009, 05:22
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 04 Dec 2009, 05:31
Thanks, I see why that works (ascii 1-9 are in order). My original function to do this uses that principle, I was just being stupid and making it harder than it had to be.
Anyway, thanks for answering my noob-ish question. Embarassed
Post 04 Dec 2009, 05:31
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 04 Dec 2009, 17:26
ASCII to integer:
Code:
;
; REQUIREMENTS before entering this code:
; esi points at string (first char of string already loaded in 'al', so it points to second char)
; also, eax is zeroed except for 'al'...
;

  ;
  ; Convert the ANSI decimal chars to integers
  ;
  xor al, '0'                   ; Translates '0'..'9' to 0..9
  xor edx, edx                  ; prepare edx to store the converted number
  cmp al, 10                    ; is it outside this range? (chars smaller than '0' will get negative values)
  jae .not_number               ; yeah, so it's not a digit (negative numbers are BIG when doing UNSIGNED comparisons)

  @@:
    lea edx, dword [edx+4*edx]  ; edx*=5
    lea edx, dword [eax+2*edx]  ; edx=edx*2+eax
                                ; Therefore we computed  "edx*10 + eax" (this is for decimal ANSI to integer conversion)
    lodsb
    xor al, '0'
    cmp al, 10
    jb @b                       ; more digits, so convert some more

  ;
  ; at this point, edx contains the number, and al a non-digit character (that ended the number)
  ;    

_________________
Previously known as The_Grey_Beast
Post 04 Dec 2009, 17:26
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 04 Dec 2009, 18:21
Quote:

(chars smaller than '0' will get negative values)

Since you used XOR instead of SUB that is not going to happen. The unsigned comparison is still important though.
Post 04 Dec 2009, 18:21
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 04 Dec 2009, 18:31
another str2int routine.
Code:
str2num:
;esi= signed string, formated as below:
;   db '        900098     ',0
;   there can be many spaces ' ' before number, and end with space ' ' or 0.
;eax= return 32 signed value

@@:
        mov     al,[esi]
        inc     esi
        cmp     al,' '
        je      @b
@@:
        mov     al,[esi]
        inc     esi
        cmp     al,' '
        je @f
        cmp     al,'.'
        je @f
        cmp     al,','
        je @f
        cmp     al,'!'
        je @f
        cmp     al,0
        jne     @b
@@:
        dec     esi
        push    esi
        dec     esi
        xor     ebx,ebx
        mov     edx,1
@@:
        movzx   eax,byte[esi]
        dec     esi
        cmp     al,'-'
        je      .neg
        cmp     al,' '
        je      .end
        cmp     al,','
        je      .end
        cmp     al,'.'
        je      .end
        sub     al,'0'
        jl      @f
        cmp     al,9
        jg      @f
        imul    eax,edx
        add     ebx,eax
        imul    edx,10
        jmp     @b
@@:
        pop     esi
.!?:
        cmp byte[esi],'!'
        jne @f
;        inc esi
        cmp ebx,0
        jle .null
        mov eax,1
.!:
        imul eax,ebx
        dec ebx
        jne .!
        mov ebx,eax
@@:
        stc
        ret
.neg:
        mov     al,[esi]
        cmp     al,' '
        jne     @b
        neg     ebx
.end:
        pop     esi
        stc
        ret
.null:
        xor ebx,ebx
        clc
        ret         


can be very interresting in case of formula computaions.

strings like:

db ' 21123 * 32143 + 4343 ',0 can be interpreted easy with this little algo:

Code:
compute
mov [result],0
mov esi,string
@@:
call str2num
cmp byte[esi],0
je .end
call str2operator
cmp byte[esi],0
jne @b
.end:
    

butit needs the str2operator function to work. and some adjustments.
ret
Post 04 Dec 2009, 18:31
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 04 Dec 2009, 22:18
LocoDelAssembly wrote:
Since you used XOR instead of SUB that is not going to happen. The unsigned comparison is still important though.
They will because of 'cmp'... also xor works in ASCII (due to how digits and '0' are ordered, having the unique bits set), but yeah, I could have used sub instead.

_________________
Previously known as The_Grey_Beast
Post 04 Dec 2009, 22:18
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 04 Dec 2009, 22:36
Perhaps I didn't understand the context of the comment but what I saw is that, for instance, the NULL char will be transformed to '0' which is well above 10 but below 128. And no char will get its sign flipped because of that XOR (contrary to what SUB could do).

I don't mean your code isn't correct, I was just talking about the comment I quoted.
Post 04 Dec 2009, 22:36
View user's profile Send private message Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak 04 Dec 2009, 22:56
Borsuc wrote:
ASCII to integer:
Code:
;
; REQUIREMENTS before entering this code:
; esi points at string (first char of string already loaded in 'al', so it points to second char)
; also, eax is zeroed except for 'al'...
;

  ;
  ; Convert the ANSI decimal chars to integers
  ;
  xor al, '0'                   ; Translates '0'..'9' to 0..9
  xor edx, edx                  ; prepare edx to store the converted number
  cmp al, 10                    ; is it outside this range? (chars smaller than '0' will get negative values)
  jae .not_number               ; yeah, so it's not a digit (negative numbers are BIG when doing UNSIGNED comparisons)

  @@:
    lea edx, dword [edx+4*edx]  ; edx*=5
    lea edx, dword [eax+2*edx]  ; edx=edx*2+eax
                                ; Therefore we computed  "edx*10 + eax" (this is for decimal ANSI to integer conversion)
    lodsb
    xor al, '0'
    cmp al, 10
    jb @b                       ; more digits, so convert some more

  ;
  ; at this point, edx contains the number, and al a non-digit character (that ended the number)
  ;    


I just woke up and this is the first thing i see... To think i made a routine like this just before going to bed (for AFS) and it doesn't even compare to this.
Post 04 Dec 2009, 22:56
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 04 Dec 2009, 23:57
LocoDelAssembly wrote:
Perhaps I didn't understand the context of the comment but what I saw is that, for instance, the NULL char will be transformed to '0' which is well above 10 but below 128. And no char will get its sign flipped because of that XOR (contrary to what SUB could do).

I don't mean your code isn't correct, I was just talking about the comment I quoted.
Oh I see the confusion. Yes sub would have been more clear, but '0' is a much higher ASCII number than 9 (max integer), and '0' is "aligned" on specific bits (so it works).

sub is more general-purpose and works in all cases though (instead of xor), I agree -- the xor-ASCII tricks are just a snippet I saw a lot of time ago (not only for string-to-int), that's why. Smile



Here's a tip for those that don't know:

General purpose bound comparison (i.e checking if a number is between x and y, INCLUDING x and y)

Code:
sub reg, x
cmp reg, y-x+1
jb .true    

_________________
Previously known as The_Grey_Beast
Post 04 Dec 2009, 23:57
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4324
Location: Now
edfed 05 Dec 2009, 02:30
Code:
sub reg,x ; instead of defning x & y, why not define x & size?
cmp reg,y-x+1
jb true
    


can be improved like this:
Code:
base dd ? ;vector approach
size dd ? ;module & argument
sub reg,base
jl @f
sub reg,size
jge @f
true:
.
@@:
    
Post 05 Dec 2009, 02:30
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 05 Dec 2009, 03:05
You are using two conditional jumps and the code was supposed to avoid that. Also, Borsuc's "y-x+1" is already doing the "size" role and "x" alone "base".

Your code has the advantage of being able to define the boundaries at run-time, but it should be something more like this:
Code:
base dd ?
size dd ?
.
.
; More code
.
.

sub reg, [base]
cmp reg, [size]
jb out_of_range

inside_range:
.
.
.
out_of_range:
.
.
.
    
(No difference with Borsuc except for the imms replaced with [mem]s)
Post 05 Dec 2009, 03:05
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 05 Dec 2009, 05:25
I ended up with this,
Code:
str_to_int:
pushad
mov si, str_to_convert
mov eax, 00000000h
mov ebx, eax
@@:
lodsb
sub al, '0'
cmp al, 00h
jb @f
cmp al, 09h
ja invalid_input
movsx ebx, al
imul eax, 10
add eax, ebx
jmp @b
@@:
cmp al, 00h - '0'
jne invalid_input
popad
retn         

It's screwed up, why? It causes the program to end without completing it's execution.

Your probably wondering why I didn't just copy one of the many GREAT examples, if you are then I'll explain. If I did that, I would neglect actually learning the code because I'm lazy like that. So I force myself to rewrite every function I get help on, just in a slightly different way. I find that more helpful for learning.

btw, thanks for all the responses.
Post 05 Dec 2009, 05:25
View user's profile Send private message Reply with quote
windwakr



Joined: 30 Jun 2004
Posts: 827
windwakr 05 Dec 2009, 05:47
I haven't looked at your routine, but I'd just like to point out that you don't need all those zero's to empty out eax. Just "mov eax, 0" will do.

You could even use "xor eax, eax", which is 3 bytes smaller. A lot of people think using "mov reg, 0" is better than "xor reg, samereg", and a lot of people think the opposite way. I personally use the xor way any time I code anything.
There's some discussion on the subject here:
http://board.flatassembler.net/topic.php?t=6339&postdays=0&postorder=asc&start=0

_________________
----> * <---- My star, won HERE
Post 05 Dec 2009, 05:47
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 05 Dec 2009, 05:52
Quote:

I haven't looked at your routine, but I'd just like to point out that you don't need all those zero's to empty out eax. Just "mov eax, 0" will do.

I do that to help me keep in mind how much can fit into the reg. Like for al, I use 00h.
Post 05 Dec 2009, 05:52
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 05 Dec 2009, 16:55
I like "xor eax, eax" to my mind it signals a special "clear" register instruction.

also my code is full of comments and whitespace columns (whether asm or HLL)... i prefer clean & descriptive code Razz
Post 05 Dec 2009, 16:55
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler 06 Dec 2009, 01:26
Quote:

also my code is full of comments and whitespace columns

I don't comment much because I screw up a lot and have to go back and change my code. If I commented every time I changed it I would spend more time commenting than coding.

btw, can anyone tell me where I screwed up?
Code:
str_to_int:
pushad
xor ax, ax
xor bx, bx
xor cx, cx
@@:
lodsb
cmp al, 00h ;Check for null termination
je @f
sub al, '0'
cmp al, 00h ;This and next 3 lines check if it's a valid #
jb invalid_input
cmp al, 09h
ja invalid_input
movsx bx, al ;Should I change this to movzx?
imul cx, 10 ;adjust for next digit
add cx, bx ;insert next digit
jmp @b ;next digit
@@:
mov [int_num1], cx
popad
retn    
Post 06 Dec 2009, 01:26
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 06 Dec 2009, 18:22
Quote:

cmp al, 00h ;This and next 3 lines check if it's a valid #
jb invalid_input

This jump will never jump because no unsigned number is below 0. You may wanted to use JL, but actually you don't need this check at all (the JA below is already covering "negative" numbers interpreting them as very high values).

I can't see any error, what did you find wrong with the code?

I'll give you another just in case, but for me your code already convert a a string to a number provided it is below 2^16.

Code:
str_to_int:
pusha ; No need to save the 32-bit register, we are not going to destroy the upper 16 bits of any (and the rest of the code needs the upper 16 bits preserved anyway?)
xor ax, ax
xor cx, cx
jmp .loadChar

.processChar:
sub al, '0'
cmp al, 9
ja .invalidInput

imul cx, 10
add cx, ax ; It is OK doing this, AH is always zero in this code

.loadChar:
lodsb
test al, al
jnz .processChar

mov [int_num1], cx
mov [int_error], 0

.return:
popa
retn

.invalidInput:
mov [int_error], 1
jmp .return    


[edit]
Forgot to add, to use the results is this:
Code:
mov si, string
call str_to_int
cmp [int_error], 0
jne .invalidInt
; You can use int_num1 here    
Post 06 Dec 2009, 18:22
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1898
DOS386 06 Dec 2009, 22:58
> String to Integer and vice versa

You can find such code in the FASM parser (string -> integer) and FASM IDE (integer -> string), latter even 2 x (YES, I did report this "bug" ...).
Post 06 Dec 2009, 22:58
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 790
Location: Massachusetts, USA
bitshifter 07 Dec 2009, 03:18
Now... process 4 byte chunks without jumping.
Then you will be cooking with gas Smile
Post 07 Dec 2009, 03:18
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.