flat assembler
Message board for the users of flat assembler.

Index > Main > Binary number to decimal ASCII string conversion

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 06 Aug 2005, 19:50
Hi, this is my liberal interpretation of an algorithm I found to convert a binary number to its decimal ASCII representation. The algorithm is from "AMD Athlon™ Processor x86 Code Optimization Guide" page 139. Since they don't explain it conveniently, it took me a while to get it Confused , so I decided to rewrite the code, comment it, and share with you guys.
Hope it's usefull to someone!


Ops, there was a bug, I corrected it.


Description: Binary do decimal ASCII string - corrected
Download
Filename: num2decc.asm
Filesize: 4.24 KB
Downloaded: 1783 Time(s)

Post 06 Aug 2005, 19:50
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 09 Aug 2005, 22:58
From: Terje Mathisen <terje.mathisen@NOSPAM.PLEASE.hda.hydro.com>
Date: April 8,2005 Fri PM 02:26:16 EDT
To: rugxulo@NOSPAM.PLEASE.bellsouth.net
Subject: Re: code from asm gems website: GPL? PD?


rugxulo@NOSPAM.PLEASE.bellsouth.net wrote:

> Hey, I hope this e-mail address of yours is valid. Anyways, I've seen some
>code of yours on a website called ASM Gems (or something similar). Is this code
>GPL, public domain, or what?
>
> http://www.df.lth.se/~john_e/gems/gem0033.html
>
>

Feel free to consider it GPL.

I have written a _much_ better binary to ascii converter, however. Smile

It is so fast that AMD "borrowed" it (without attribution) for their
optimization guide. Sad

Terje

--
- <Terje.Mathisen@NOSPAM.PLEASE.hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Code:
/* Convert an unsigned 32-bit number into a 10-digit ascii value */
/* (c) Terje Mathisen */
/* GPL 2005 */
char *dtoa(char *buf, unsigned long n)
{
__asm {
mov ebx,[n]
mov eax,2814749767

mul ebx

shr ebx,1
xor ecx,ecx

mov edi,[buf]
add eax,ebx

adc ecx,edx
mov eax,100000

shr ecx,16 // ECX = high part
mov ebx,[n]

imul eax,ecx // High part * 100k

sub ebx,eax // Low part
mov eax,429497

mul ecx

mov ecx,eax
add dl,'0'

mov eax,429497
mov [edi],dl

mul ebx

mov ebx,eax
add ecx,7

shr ecx,3
add dl,'0'

mov [edi+5],dl
add ebx,7

shr ebx,3
lea ecx,[ecx+ecx*4]

mov edx,ecx
and ecx,0fffffffh

shr edx,28
lea ebx,[ebx+ebx*4]

add dl,'0'
mov eax,ebx

shr eax,28
mov [edi+1],dl

and ebx,0fffffffh
add al,'0'

mov [edi+6],al
lea ecx,[ecx+ecx*4]

lea ebx,[ebx+ebx*4]
mov edx,ecx

mov eax,ebx
and ecx,07ffffffh

shr edx,27
and ebx,07ffffffh

shr eax,27
add dl,'0'

add al,'0'
mov [edi+2],dl

mov [edi+7],al
lea ecx,[ecx+ecx*4]

lea ebx,[ebx+ebx*4]
mov edx,ecx

mov eax,ebx
and ecx,03ffffffh

shr edx,26
and ebx,03ffffffh

shr eax,26
add dl,'0'

add al,'0'
mov [edi+3],dl

mov [edi+8],al
lea ecx,[ecx+ecx*4]

shr ecx,25
lea ebx,[ebx+ebx*4]

shr ebx,25
add cl,'0'

add bl,'0'
mov [edi+10],ah

mov [edi+4],cl
mov [edi+9],bl
}

return buf;
}    
Post 09 Aug 2005, 22:58
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 10 Aug 2005, 08:33
I don't understand why this post was neccessary!? He refrenced clearly where he took it:
El Tangas wrote:
... of an algorithm I found [---] algorithm is from "AMD Athlon™ Processor x86 Code Optimization Guide" page 139...

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 10 Aug 2005, 08:33
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 10 Aug 2005, 16:33
Ha, well, I didn't want to infringe anyone's real or perceived copyright Rolling Eyes

Anyway, I see:

/* Convert an unsigned 32-bit number into a 10-digit ascii value */
/* (c) Terje Mathisen */
/* GPL 2005 */

but AMD's guide is from 2002...
Post 10 Aug 2005, 16:33
View user's profile Send private message Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 11 Aug 2005, 18:49
And in the spirit of optimization contests, this is the optimized version of the code I posted.
<brag>
Its faster than Terje's by 20% on pentium mmx and by 10% on athlon xp.
</brag>

Code:
magic1  equ     0a7c5ac47h
magic2  equ     068db8badh

bin2ascii:
        mov     eax,magic1
        mul     esi
        shr     esi,3

        add     esi,eax
        adc     edx,0

        shr     esi,20                  ;separate remainder
        mov     ebx,edx
        shl     ebx,12
        and     edx,0FFFF0000h          ;mask quotient

        and     ebx,0FFFFFFFh           ;remove quotient nibble from remainder.
        mov     eax,magic2
        add     esi,ebx
        mul     edx

        lea     esi,[4*esi+esi+5]       ;multiply by 5 and round up
        mov     eax,edx
        shr     edx,28
        mov     ebx,esi
        shr     ebx,27
        and     eax,0FFFFFFFh
        add     dl,'0'
        and     esi,07FFFFFFh
        mov     [edi],dl
        add     bl,'0'
        mov     [edi+5],bl

        lea     eax,[4*eax+eax+5]       ;mul by 5 and round up
        lea     esi,[4*esi+esi]
        mov     edx,eax
        shr     edx,27
        and     eax,07FFFFFFh
        mov     ebx,esi
        add     dl,'0'
        shr     ebx,26
        mov     [edi+1],dl
        and     esi,03FFFFFFh
        add     bl,'0'
        mov     [edi+6],bl
        
        lea     eax,[4*eax+eax]
        lea     esi,[4*esi+esi]
        mov     edx,eax
        shr     edx,26
        and     eax,03FFFFFFh
        mov     ebx,esi
        add     dl,'0'
        shr     ebx,25
        mov     [edi+2],dl
        and     esi,01FFFFFFh
        add     bl,'0'
        mov     [edi+7],bl
        
        lea     eax,[4*eax+eax]
        lea     esi,[4*esi+esi]
        mov     edx,eax
        shr     edx,25
        and     eax,01FFFFFFh
        mov     ebx,esi
        add     dl,'0'
        shr     ebx,24
        mov     [edi+3],dl
        and     esi,00FFFFFFh
        add     bl,'0'
        mov     [edi+8],bl
        
        lea     edx,[4*eax+eax]
        shr     edx,24
        lea     ebx,[4*esi+esi]
        shr     ebx,23
        add     dl,'0'
        mov     [edi+4],dl
        add     bl,'0'
        mov     [edi+9],bl
        
        ret
    
Post 11 Aug 2005, 18:49
View user's profile Send private message Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 09 Sep 2005, 10:23
The version of my code that was posted here had the GPL notice added before posting, I was asked in email if this was OK. If you have discovered one or more tweaks that can make it run slightly faster on current CPUs, that's great.

I actually invented this approach in 1999, I was told a few years later by a friend that AMD had used it without attribution. :-(

Anyway, it still remember the thrill when I realized that it might be possible to do the entire 10-digit conversion faster than what's needed for a single DIV opcode inside the inner loop of the standard algorithm. :-)

Terje

_________________
"almost all programming can be viewed as an exercise in caching"
Post 09 Sep 2005, 10:23
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 09 Sep 2005, 12:24
Hi Terje,
I hope you didn't just pass by...what are you working on currently and are you familiar with FASM too? Maybe you have completely abandoned ASM :S

P.S. How did you find this board? Google gave this by searching your name or something?
Post 09 Sep 2005, 12:24
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 12 Sep 2005, 07:19
I am indeed passing by, I got an email tip about your tweaks.

I still write asm code, mostly in the form on inline asm in C programs, my last serious all-asm project was the asm version of DFC. DFC was one of the also-rans in the Advanced Encryption Standard competion.

Together with a few other guys we managed to triple the speed of that entry, which was enough to go from one of the 3-4 slowest (out of 15) to one of the 3-4 fastest:

Full duplex 100 Mbit Ethernet encrypt/decrypt on a 10 year old 200 MHz PPro. :-)

Terje

_________________
"almost all programming can be viewed as an exercise in caching"
Post 12 Sep 2005, 07:19
View user's profile Send private message Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1166
Location: Overflow
Matrix 12 Sep 2005, 12:29
yo Terje Mathisen
cool dude Cool

maeby you could join the fasm community and come from c to fasm.

machine code vs. c
Post 12 Sep 2005, 12:29
View user's profile Send private message Visit poster's website Reply with quote
HyperVista



Joined: 18 Apr 2005
Posts: 691
Location: Virginia, USA
HyperVista 12 Sep 2005, 13:04
This is one of the coolest websites I've seen in a while. Check it out!!

http://www.confluence.org/visitor.php?id=157 Cool

Thanks Terje!
Post 12 Sep 2005, 13:04
View user's profile Send private message Visit poster's website Reply with quote
THEWizardGenius



Joined: 14 Jan 2005
Posts: 382
Location: California, USA
THEWizardGenius 12 Sep 2005, 19:36
I assume this code converts a fp number to ASCII; there are many algorithms around for converting integers to ASCII.

I don't understand this; can someone explain how this works (I'm not that mathematically-oriented yet, as I just started high school math)?
Post 12 Sep 2005, 19:36
View user's profile Send private message AIM Address Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 13 Sep 2005, 17:42
No, not fp->ascii, but unsigned -> ascii:

The usual algorithm uses repeated division by 10, saving the remainders, while my optimized code is an order of magnitude faster.

The key idea is that by first splitting the number into two parts, effectively div/mod 100000, you get 5 digits in each half. This division is done using a reciprocal multiplication, and the remainder is generated after back-multiplication and subtraction.

The actual digit extractions happen after each 5-digit value is scaled (with MUL) such that the top 4 bits directly contain the top digit.

Extract this (with a SHR), then mask away and multiply the remainder with 10/2 = 5 using LEA, and you can repeat the process.

Terje

Terje
Post 13 Sep 2005, 17:42
View user's profile Send private message Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 13 Sep 2005, 17:52
Matrix wrote:
yo Terje Mathisen
cool dude 8)

maeby you could join the fasm community and come from c to fasm.

machine code vs. c


The days of pure asm programming are definitely gone, at least on x86. I wrote a lot of x86 asm code in my days (around 3 MB of source code by 1991), but the most fun was probably my ascii executable. :-)

Maybe I should write a little pst about that?

Terje

_________________
"almost all programming can be viewed as an exercise in caching"
Post 13 Sep 2005, 17:52
View user's profile Send private message Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 13 Sep 2005, 17:58
HyperVista wrote:
This is one of the coolest websites I've seen in a while. Check it out!!

http://www.confluence.org/visitor.php?id=157 8)

Thanks Terje!


The DCP project is a lot of fun, I've been the Scandinavian coordinator for some years.

It is the perfect geek/nerd excuse for actually going outside the usual streets and trails. :-)

My chief hobby is orienteering though: http://www.orienteering.org

This is a combination of physical and mental exercise, relatively easy to explain, but at least as difficult as golf to really master.

Terje[/url]

_________________
"almost all programming can be viewed as an exercise in caching"
Post 13 Sep 2005, 17:58
View user's profile Send private message Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 13 Sep 2005, 21:34
THEWizardGenius:

It took me a while to understand this algo, too, but this allowed me to rewrite it from scratch, and in the end I noticed I used one less mul instruction than the original, resulting in faster code. I comented the code in my first post the best I could (its not optimized, for clarity), if you follow the code and see what is happening with a hex capable calculator, I think you will eventually understand it.

This algorithm is based on base conversion of fractions, you can search the web for more information.

And for really criptic code, my best optimized version of this algo:

Code:
magic1  equ     0a7c5ac47h 
magic2  equ     068db8badh 

bin2ascii_test:
        mov     eax,magic1
        mul     esi
        shr     esi,3
        add     esi,eax
        adc     edx,0
        shr     esi,20                  ;separate remainder
        mov     ebx,edx
        shl     ebx,12
        and     edx,0FFFF0000h          ;mask quotient
        and     ebx,0FFFFFFFh           ;remove quotient nibble from remainder.
        mov     eax,magic2
        mul     edx
        add     esi,ebx
        mov     eax,edx
        shr     edx,28
        and     eax,0FFFFFFFh
        lea     esi,[4*esi+esi+5]       ;multiply by 5 and round up
        add     dl,'0'
        mov     ebx,esi
        and     esi,07FFFFFFh
        shr     ebx,27
        mov     [edi],dl
        add     bl,'0'
        lea     eax,[4*eax+eax+5]       ;mul by 5 and round up
        mov     [edi+5],bl
        lea     esi,[4*esi+esi]
        mov     edx,eax
        and     eax,07FFFFFFh
        shr     edx,27
        lea     ebx,[esi+0c0000000h]
        shr     ebx,26              
        and     esi,03FFFFFFh       
        add     dl,'0'              
        lea     eax,[4*eax+eax]     
        mov     [edi+1],dl          
        lea     esi,[4*esi+esi]
        mov     [edi+6],bl
        lea     edx,[eax+0c0000000h]         
        shr     edx,26              
        and     eax,03FFFFFFh       
        lea     ebx,[esi+60000000h] 
        and     esi,01FFFFFFh       
        shr     ebx,25              
        lea     eax,[4*eax+eax]
        mov     [edi+2],dl          
        lea     esi,[4*esi+esi]
        mov     [edi+7],bl
        lea     edx,[eax+60000000h]
        shr     edx,25            
        and     eax,01FFFFFFh     
        lea     ebx,[esi+30000000h]
        mov     [edi+3],dl   
        shr     ebx,24       
        and     esi,00FFFFFFh
        mov     [edi+8],bl   
        lea     edx,[4*eax+eax+30000000h]
        shr     edx,24                  
        lea     ebx,[4*esi+esi+18000000h]
        shr     ebx,23                  
        mov     [edi+4],dl
        mov     [edi+9],bl
        ret              
    


Terje:
I think the Fasm community would be very interested in your posts, if you have some time for that.
Post 13 Sep 2005, 21:34
View user's profile Send private message Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 16 Sep 2005, 09:51
I wonder one thing:

You have obviously saved MUL operations by avoiding the need for back-multiplication and subtraction, so just to be certain:

Have you done either a mathematical proof or an exhaustive test to verify that your shortcut does work for all possible 32-bit inputs?

Terje
Post 16 Sep 2005, 09:51
View user's profile Send private message Reply with quote
El Tangas



Joined: 11 Oct 2003
Posts: 120
Location: Sunset Empire
El Tangas 16 Sep 2005, 18:45
Yes, I tested for all 32 bit numbers, by converting to ASCII and back, and comparing with the original value.
In fact, I had to add some code to increase the precision, because the test failed the first time.
Post 16 Sep 2005, 18:45
View user's profile Send private message Reply with quote
Terje Mathisen



Joined: 09 Sep 2005
Posts: 7
Terje Mathisen 19 Sep 2005, 10:44
Quote:

In fact, I had to add some code to increase the precision, because the test failed the first time.


Right, that was why I asked. If the initial reciprocal MUL had been a bit more prescise, then the fractional result would have been directly usable for the lower 5 digits of the result.

Terje
Post 19 Sep 2005, 10:44
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 16 Mar 2007, 15:18
64-bit! Smile

I discovered 2 things:
1) lea r32/64,[r32/64+0c0000000h] doesn't work because of the encoding limits. This introduces more code and
2) you can fuse two magics into one possibly reducing code

this may result in a third interesting thing, that you can do INT64_2_ASCII conversion...well, lets see
Post 16 Mar 2007, 15:18
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat 16 Mar 2007, 15:39
What kind of program does enough number to ascii conversions to make this worthwhile?
Post 16 Mar 2007, 15:39
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.