flat assembler
Message board for the users of flat assembler.
Index
> Main > Binary number to decimal ASCII string conversion Goto page 1, 2 Next 
Author 

El Tangas
Hi, this is my liberal interpretation of an algorithm I found to convert a binary number to its decimal ASCII representation. The algorithm is from "AMD Athlonâ„˘ Processor x86 Code Optimization Guide" page 139. Since they don't explain it conveniently, it took me a while to get it , so I decided to rewrite the code, comment it, and share with you guys.
Hope it's usefull to someone! Ops, there was a bug, I corrected it.


06 Aug 2005, 19:50 

rugxulo
From: Terje Mathisen <terje.mathisen@NOSPAM.PLEASE.hda.hydro.com>
Date: April 8,2005 Fri PM 02:26:16 EDT To: rugxulo@NOSPAM.PLEASE.bellsouth.net Subject: Re: code from asm gems website: GPL? PD? rugxulo@NOSPAM.PLEASE.bellsouth.net wrote: > Hey, I hope this email address of yours is valid. Anyways, I've seen some >code of yours on a website called ASM Gems (or something similar). Is this code >GPL, public domain, or what? > > http://www.df.lth.se/~john_e/gems/gem0033.html > > Feel free to consider it GPL. I have written a _much_ better binary to ascii converter, however. It is so fast that AMD "borrowed" it (without attribution) for their optimization guide. Terje   <Terje.Mathisen@NOSPAM.PLEASE.hda.hydro.com> "almost all programming can be viewed as an exercise in caching" Code: /* Convert an unsigned 32bit number into a 10digit ascii value */ /* (c) Terje Mathisen */ /* GPL 2005 */ char *dtoa(char *buf, unsigned long n) { __asm { mov ebx,[n] mov eax,2814749767 mul ebx shr ebx,1 xor ecx,ecx mov edi,[buf] add eax,ebx adc ecx,edx mov eax,100000 shr ecx,16 // ECX = high part mov ebx,[n] imul eax,ecx // High part * 100k sub ebx,eax // Low part mov eax,429497 mul ecx mov ecx,eax add dl,'0' mov eax,429497 mov [edi],dl mul ebx mov ebx,eax add ecx,7 shr ecx,3 add dl,'0' mov [edi+5],dl add ebx,7 shr ebx,3 lea ecx,[ecx+ecx*4] mov edx,ecx and ecx,0fffffffh shr edx,28 lea ebx,[ebx+ebx*4] add dl,'0' mov eax,ebx shr eax,28 mov [edi+1],dl and ebx,0fffffffh add al,'0' mov [edi+6],al lea ecx,[ecx+ecx*4] lea ebx,[ebx+ebx*4] mov edx,ecx mov eax,ebx and ecx,07ffffffh shr edx,27 and ebx,07ffffffh shr eax,27 add dl,'0' add al,'0' mov [edi+2],dl mov [edi+7],al lea ecx,[ecx+ecx*4] lea ebx,[ebx+ebx*4] mov edx,ecx mov eax,ebx and ecx,03ffffffh shr edx,26 and ebx,03ffffffh shr eax,26 add dl,'0' add al,'0' mov [edi+3],dl mov [edi+8],al lea ecx,[ecx+ecx*4] shr ecx,25 lea ebx,[ebx+ebx*4] shr ebx,25 add cl,'0' add bl,'0' mov [edi+10],ah mov [edi+4],cl mov [edi+9],bl } return buf; } 

09 Aug 2005, 22:58 

El Tangas
Ha, well, I didn't want to infringe anyone's real or perceived copyright
Anyway, I see: /* Convert an unsigned 32bit number into a 10digit ascii value */ /* (c) Terje Mathisen */ /* GPL 2005 */ but AMD's guide is from 2002... 

10 Aug 2005, 16:33 

El Tangas
And in the spirit of optimization contests, this is the optimized version of the code I posted.
<brag> Its faster than Terje's by 20% on pentium mmx and by 10% on athlon xp. </brag> Code: magic1 equ 0a7c5ac47h magic2 equ 068db8badh bin2ascii: mov eax,magic1 mul esi shr esi,3 add esi,eax adc edx,0 shr esi,20 ;separate remainder mov ebx,edx shl ebx,12 and edx,0FFFF0000h ;mask quotient and ebx,0FFFFFFFh ;remove quotient nibble from remainder. mov eax,magic2 add esi,ebx mul edx lea esi,[4*esi+esi+5] ;multiply by 5 and round up mov eax,edx shr edx,28 mov ebx,esi shr ebx,27 and eax,0FFFFFFFh add dl,'0' and esi,07FFFFFFh mov [edi],dl add bl,'0' mov [edi+5],bl lea eax,[4*eax+eax+5] ;mul by 5 and round up lea esi,[4*esi+esi] mov edx,eax shr edx,27 and eax,07FFFFFFh mov ebx,esi add dl,'0' shr ebx,26 mov [edi+1],dl and esi,03FFFFFFh add bl,'0' mov [edi+6],bl lea eax,[4*eax+eax] lea esi,[4*esi+esi] mov edx,eax shr edx,26 and eax,03FFFFFFh mov ebx,esi add dl,'0' shr ebx,25 mov [edi+2],dl and esi,01FFFFFFh add bl,'0' mov [edi+7],bl lea eax,[4*eax+eax] lea esi,[4*esi+esi] mov edx,eax shr edx,25 and eax,01FFFFFFh mov ebx,esi add dl,'0' shr ebx,24 mov [edi+3],dl and esi,00FFFFFFh add bl,'0' mov [edi+8],bl lea edx,[4*eax+eax] shr edx,24 lea ebx,[4*esi+esi] shr ebx,23 add dl,'0' mov [edi+4],dl add bl,'0' mov [edi+9],bl ret 

11 Aug 2005, 18:49 

Terje Mathisen
The version of my code that was posted here had the GPL notice added before posting, I was asked in email if this was OK. If you have discovered one or more tweaks that can make it run slightly faster on current CPUs, that's great.
I actually invented this approach in 1999, I was told a few years later by a friend that AMD had used it without attribution. :( Anyway, it still remember the thrill when I realized that it might be possible to do the entire 10digit conversion faster than what's needed for a single DIV opcode inside the inner loop of the standard algorithm. :) Terje _________________ "almost all programming can be viewed as an exercise in caching" 

09 Sep 2005, 10:23 

Madis731
Hi Terje,
I hope you didn't just pass by...what are you working on currently and are you familiar with FASM too? Maybe you have completely abandoned ASM :S P.S. How did you find this board? Google gave this by searching your name or something? 

09 Sep 2005, 12:24 

Terje Mathisen
I am indeed passing by, I got an email tip about your tweaks.
I still write asm code, mostly in the form on inline asm in C programs, my last serious allasm project was the asm version of DFC. DFC was one of the alsorans in the Advanced Encryption Standard competion. Together with a few other guys we managed to triple the speed of that entry, which was enough to go from one of the 34 slowest (out of 15) to one of the 34 fastest: Full duplex 100 Mbit Ethernet encrypt/decrypt on a 10 year old 200 MHz PPro. :) Terje _________________ "almost all programming can be viewed as an exercise in caching" 

12 Sep 2005, 07:19 

Matrix
yo Terje Mathisen
cool dude maeby you could join the fasm community and come from c to fasm. machine code vs. c 

12 Sep 2005, 12:29 

HyperVista
This is one of the coolest websites I've seen in a while. Check it out!!
http://www.confluence.org/visitor.php?id=157 Thanks Terje! 

12 Sep 2005, 13:04 

THEWizardGenius
I assume this code converts a fp number to ASCII; there are many algorithms around for converting integers to ASCII.
I don't understand this; can someone explain how this works (I'm not that mathematicallyoriented yet, as I just started high school math)? 

12 Sep 2005, 19:36 

Terje Mathisen
No, not fp>ascii, but unsigned > ascii:
The usual algorithm uses repeated division by 10, saving the remainders, while my optimized code is an order of magnitude faster. The key idea is that by first splitting the number into two parts, effectively div/mod 100000, you get 5 digits in each half. This division is done using a reciprocal multiplication, and the remainder is generated after backmultiplication and subtraction. The actual digit extractions happen after each 5digit value is scaled (with MUL) such that the top 4 bits directly contain the top digit. Extract this (with a SHR), then mask away and multiply the remainder with 10/2 = 5 using LEA, and you can repeat the process. Terje Terje 

13 Sep 2005, 17:42 

Terje Mathisen
Matrix wrote: yo Terje Mathisen The days of pure asm programming are definitely gone, at least on x86. I wrote a lot of x86 asm code in my days (around 3 MB of source code by 1991), but the most fun was probably my ascii executable. :) Maybe I should write a little pst about that? Terje _________________ "almost all programming can be viewed as an exercise in caching" 

13 Sep 2005, 17:52 

Terje Mathisen
HyperVista wrote: This is one of the coolest websites I've seen in a while. Check it out!! The DCP project is a lot of fun, I've been the Scandinavian coordinator for some years. It is the perfect geek/nerd excuse for actually going outside the usual streets and trails. :) My chief hobby is orienteering though: http://www.orienteering.org This is a combination of physical and mental exercise, relatively easy to explain, but at least as difficult as golf to really master. Terje[/url] _________________ "almost all programming can be viewed as an exercise in caching" 

13 Sep 2005, 17:58 

El Tangas
THEWizardGenius:
It took me a while to understand this algo, too, but this allowed me to rewrite it from scratch, and in the end I noticed I used one less mul instruction than the original, resulting in faster code. I comented the code in my first post the best I could (its not optimized, for clarity), if you follow the code and see what is happening with a hex capable calculator, I think you will eventually understand it. This algorithm is based on base conversion of fractions, you can search the web for more information. And for really criptic code, my best optimized version of this algo: Code: magic1 equ 0a7c5ac47h magic2 equ 068db8badh bin2ascii_test: mov eax,magic1 mul esi shr esi,3 add esi,eax adc edx,0 shr esi,20 ;separate remainder mov ebx,edx shl ebx,12 and edx,0FFFF0000h ;mask quotient and ebx,0FFFFFFFh ;remove quotient nibble from remainder. mov eax,magic2 mul edx add esi,ebx mov eax,edx shr edx,28 and eax,0FFFFFFFh lea esi,[4*esi+esi+5] ;multiply by 5 and round up add dl,'0' mov ebx,esi and esi,07FFFFFFh shr ebx,27 mov [edi],dl add bl,'0' lea eax,[4*eax+eax+5] ;mul by 5 and round up mov [edi+5],bl lea esi,[4*esi+esi] mov edx,eax and eax,07FFFFFFh shr edx,27 lea ebx,[esi+0c0000000h] shr ebx,26 and esi,03FFFFFFh add dl,'0' lea eax,[4*eax+eax] mov [edi+1],dl lea esi,[4*esi+esi] mov [edi+6],bl lea edx,[eax+0c0000000h] shr edx,26 and eax,03FFFFFFh lea ebx,[esi+60000000h] and esi,01FFFFFFh shr ebx,25 lea eax,[4*eax+eax] mov [edi+2],dl lea esi,[4*esi+esi] mov [edi+7],bl lea edx,[eax+60000000h] shr edx,25 and eax,01FFFFFFh lea ebx,[esi+30000000h] mov [edi+3],dl shr ebx,24 and esi,00FFFFFFh mov [edi+8],bl lea edx,[4*eax+eax+30000000h] shr edx,24 lea ebx,[4*esi+esi+18000000h] shr ebx,23 mov [edi+4],dl mov [edi+9],bl ret Terje: I think the Fasm community would be very interested in your posts, if you have some time for that. 

13 Sep 2005, 21:34 

Terje Mathisen
I wonder one thing:
You have obviously saved MUL operations by avoiding the need for backmultiplication and subtraction, so just to be certain: Have you done either a mathematical proof or an exhaustive test to verify that your shortcut does work for all possible 32bit inputs? Terje 

16 Sep 2005, 09:51 

El Tangas
Yes, I tested for all 32 bit numbers, by converting to ASCII and back, and comparing with the original value.
In fact, I had to add some code to increase the precision, because the test failed the first time. 

16 Sep 2005, 18:45 

Terje Mathisen
Quote:
Right, that was why I asked. If the initial reciprocal MUL had been a bit more prescise, then the fractional result would have been directly usable for the lower 5 digits of the result. Terje 

19 Sep 2005, 10:44 

Madis731
64bit!
I discovered 2 things: 1) lea r32/64,[r32/64+0c0000000h] doesn't work because of the encoding limits. This introduces more code and 2) you can fuse two magics into one possibly reducing code this may result in a third interesting thing, that you can do INT64_2_ASCII conversion...well, lets see 

16 Mar 2007, 15:18 

Goplat
What kind of program does enough number to ascii conversions to make this worthwhile?


16 Mar 2007, 15:39 

Goto page 1, 2 Next < Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar. Also on YouTube, Twitter.
Website powered by rwasa.