flat assembler
Message board for the users of flat assembler.
Index
> Main > Binary number to decimal ASCII string conversion Goto page 1, 2 Next |
Author |
|
El Tangas 06 Aug 2005, 19:50
Hi, this is my liberal interpretation of an algorithm I found to convert a binary number to its decimal ASCII representation. The algorithm is from "AMD Athlon™ Processor x86 Code Optimization Guide" page 139. Since they don't explain it conveniently, it took me a while to get it , so I decided to rewrite the code, comment it, and share with you guys.
Hope it's usefull to someone! Ops, there was a bug, I corrected it.
|
|||||||||||
06 Aug 2005, 19:50 |
|
rugxulo 09 Aug 2005, 22:58
From: Terje Mathisen <terje.mathisen@NOSPAM.PLEASE.hda.hydro.com>
Date: April 8,2005 Fri PM 02:26:16 EDT To: rugxulo@NOSPAM.PLEASE.bellsouth.net Subject: Re: code from asm gems website: GPL? PD? rugxulo@NOSPAM.PLEASE.bellsouth.net wrote: > Hey, I hope this e-mail address of yours is valid. Anyways, I've seen some >code of yours on a website called ASM Gems (or something similar). Is this code >GPL, public domain, or what? > > http://www.df.lth.se/~john_e/gems/gem0033.html > > Feel free to consider it GPL. I have written a _much_ better binary to ascii converter, however. It is so fast that AMD "borrowed" it (without attribution) for their optimization guide. Terje -- - <Terje.Mathisen@NOSPAM.PLEASE.hda.hydro.com> "almost all programming can be viewed as an exercise in caching" Code: /* Convert an unsigned 32-bit number into a 10-digit ascii value */ /* (c) Terje Mathisen */ /* GPL 2005 */ char *dtoa(char *buf, unsigned long n) { __asm { mov ebx,[n] mov eax,2814749767 mul ebx shr ebx,1 xor ecx,ecx mov edi,[buf] add eax,ebx adc ecx,edx mov eax,100000 shr ecx,16 // ECX = high part mov ebx,[n] imul eax,ecx // High part * 100k sub ebx,eax // Low part mov eax,429497 mul ecx mov ecx,eax add dl,'0' mov eax,429497 mov [edi],dl mul ebx mov ebx,eax add ecx,7 shr ecx,3 add dl,'0' mov [edi+5],dl add ebx,7 shr ebx,3 lea ecx,[ecx+ecx*4] mov edx,ecx and ecx,0fffffffh shr edx,28 lea ebx,[ebx+ebx*4] add dl,'0' mov eax,ebx shr eax,28 mov [edi+1],dl and ebx,0fffffffh add al,'0' mov [edi+6],al lea ecx,[ecx+ecx*4] lea ebx,[ebx+ebx*4] mov edx,ecx mov eax,ebx and ecx,07ffffffh shr edx,27 and ebx,07ffffffh shr eax,27 add dl,'0' add al,'0' mov [edi+2],dl mov [edi+7],al lea ecx,[ecx+ecx*4] lea ebx,[ebx+ebx*4] mov edx,ecx mov eax,ebx and ecx,03ffffffh shr edx,26 and ebx,03ffffffh shr eax,26 add dl,'0' add al,'0' mov [edi+3],dl mov [edi+8],al lea ecx,[ecx+ecx*4] shr ecx,25 lea ebx,[ebx+ebx*4] shr ebx,25 add cl,'0' add bl,'0' mov [edi+10],ah mov [edi+4],cl mov [edi+9],bl } return buf; } |
|||
09 Aug 2005, 22:58 |
|
El Tangas 10 Aug 2005, 16:33
Ha, well, I didn't want to infringe anyone's real or perceived copyright
Anyway, I see: /* Convert an unsigned 32-bit number into a 10-digit ascii value */ /* (c) Terje Mathisen */ /* GPL 2005 */ but AMD's guide is from 2002... |
|||
10 Aug 2005, 16:33 |
|
El Tangas 11 Aug 2005, 18:49
And in the spirit of optimization contests, this is the optimized version of the code I posted.
<brag> Its faster than Terje's by 20% on pentium mmx and by 10% on athlon xp. </brag> Code: magic1 equ 0a7c5ac47h magic2 equ 068db8badh bin2ascii: mov eax,magic1 mul esi shr esi,3 add esi,eax adc edx,0 shr esi,20 ;separate remainder mov ebx,edx shl ebx,12 and edx,0FFFF0000h ;mask quotient and ebx,0FFFFFFFh ;remove quotient nibble from remainder. mov eax,magic2 add esi,ebx mul edx lea esi,[4*esi+esi+5] ;multiply by 5 and round up mov eax,edx shr edx,28 mov ebx,esi shr ebx,27 and eax,0FFFFFFFh add dl,'0' and esi,07FFFFFFh mov [edi],dl add bl,'0' mov [edi+5],bl lea eax,[4*eax+eax+5] ;mul by 5 and round up lea esi,[4*esi+esi] mov edx,eax shr edx,27 and eax,07FFFFFFh mov ebx,esi add dl,'0' shr ebx,26 mov [edi+1],dl and esi,03FFFFFFh add bl,'0' mov [edi+6],bl lea eax,[4*eax+eax] lea esi,[4*esi+esi] mov edx,eax shr edx,26 and eax,03FFFFFFh mov ebx,esi add dl,'0' shr ebx,25 mov [edi+2],dl and esi,01FFFFFFh add bl,'0' mov [edi+7],bl lea eax,[4*eax+eax] lea esi,[4*esi+esi] mov edx,eax shr edx,25 and eax,01FFFFFFh mov ebx,esi add dl,'0' shr ebx,24 mov [edi+3],dl and esi,00FFFFFFh add bl,'0' mov [edi+8],bl lea edx,[4*eax+eax] shr edx,24 lea ebx,[4*esi+esi] shr ebx,23 add dl,'0' mov [edi+4],dl add bl,'0' mov [edi+9],bl ret |
|||
11 Aug 2005, 18:49 |
|
Terje Mathisen 09 Sep 2005, 10:23
The version of my code that was posted here had the GPL notice added before posting, I was asked in email if this was OK. If you have discovered one or more tweaks that can make it run slightly faster on current CPUs, that's great.
I actually invented this approach in 1999, I was told a few years later by a friend that AMD had used it without attribution. :-( Anyway, it still remember the thrill when I realized that it might be possible to do the entire 10-digit conversion faster than what's needed for a single DIV opcode inside the inner loop of the standard algorithm. :-) Terje _________________ "almost all programming can be viewed as an exercise in caching" |
|||
09 Sep 2005, 10:23 |
|
Madis731 09 Sep 2005, 12:24
Hi Terje,
I hope you didn't just pass by...what are you working on currently and are you familiar with FASM too? Maybe you have completely abandoned ASM :S P.S. How did you find this board? Google gave this by searching your name or something? |
|||
09 Sep 2005, 12:24 |
|
Terje Mathisen 12 Sep 2005, 07:19
I am indeed passing by, I got an email tip about your tweaks.
I still write asm code, mostly in the form on inline asm in C programs, my last serious all-asm project was the asm version of DFC. DFC was one of the also-rans in the Advanced Encryption Standard competion. Together with a few other guys we managed to triple the speed of that entry, which was enough to go from one of the 3-4 slowest (out of 15) to one of the 3-4 fastest: Full duplex 100 Mbit Ethernet encrypt/decrypt on a 10 year old 200 MHz PPro. :-) Terje _________________ "almost all programming can be viewed as an exercise in caching" |
|||
12 Sep 2005, 07:19 |
|
Matrix 12 Sep 2005, 12:29
yo Terje Mathisen
cool dude maeby you could join the fasm community and come from c to fasm. machine code vs. c |
|||
12 Sep 2005, 12:29 |
|
HyperVista 12 Sep 2005, 13:04
This is one of the coolest websites I've seen in a while. Check it out!!
http://www.confluence.org/visitor.php?id=157 Thanks Terje! |
|||
12 Sep 2005, 13:04 |
|
THEWizardGenius 12 Sep 2005, 19:36
I assume this code converts a fp number to ASCII; there are many algorithms around for converting integers to ASCII.
I don't understand this; can someone explain how this works (I'm not that mathematically-oriented yet, as I just started high school math)? |
|||
12 Sep 2005, 19:36 |
|
Terje Mathisen 13 Sep 2005, 17:42
No, not fp->ascii, but unsigned -> ascii:
The usual algorithm uses repeated division by 10, saving the remainders, while my optimized code is an order of magnitude faster. The key idea is that by first splitting the number into two parts, effectively div/mod 100000, you get 5 digits in each half. This division is done using a reciprocal multiplication, and the remainder is generated after back-multiplication and subtraction. The actual digit extractions happen after each 5-digit value is scaled (with MUL) such that the top 4 bits directly contain the top digit. Extract this (with a SHR), then mask away and multiply the remainder with 10/2 = 5 using LEA, and you can repeat the process. Terje Terje |
|||
13 Sep 2005, 17:42 |
|
Terje Mathisen 13 Sep 2005, 17:52
Matrix wrote: yo Terje Mathisen The days of pure asm programming are definitely gone, at least on x86. I wrote a lot of x86 asm code in my days (around 3 MB of source code by 1991), but the most fun was probably my ascii executable. :-) Maybe I should write a little pst about that? Terje _________________ "almost all programming can be viewed as an exercise in caching" |
|||
13 Sep 2005, 17:52 |
|
Terje Mathisen 13 Sep 2005, 17:58
HyperVista wrote: This is one of the coolest websites I've seen in a while. Check it out!! The DCP project is a lot of fun, I've been the Scandinavian coordinator for some years. It is the perfect geek/nerd excuse for actually going outside the usual streets and trails. :-) My chief hobby is orienteering though: http://www.orienteering.org This is a combination of physical and mental exercise, relatively easy to explain, but at least as difficult as golf to really master. Terje[/url] _________________ "almost all programming can be viewed as an exercise in caching" |
|||
13 Sep 2005, 17:58 |
|
El Tangas 13 Sep 2005, 21:34
THEWizardGenius:
It took me a while to understand this algo, too, but this allowed me to rewrite it from scratch, and in the end I noticed I used one less mul instruction than the original, resulting in faster code. I comented the code in my first post the best I could (its not optimized, for clarity), if you follow the code and see what is happening with a hex capable calculator, I think you will eventually understand it. This algorithm is based on base conversion of fractions, you can search the web for more information. And for really criptic code, my best optimized version of this algo: Code: magic1 equ 0a7c5ac47h magic2 equ 068db8badh bin2ascii_test: mov eax,magic1 mul esi shr esi,3 add esi,eax adc edx,0 shr esi,20 ;separate remainder mov ebx,edx shl ebx,12 and edx,0FFFF0000h ;mask quotient and ebx,0FFFFFFFh ;remove quotient nibble from remainder. mov eax,magic2 mul edx add esi,ebx mov eax,edx shr edx,28 and eax,0FFFFFFFh lea esi,[4*esi+esi+5] ;multiply by 5 and round up add dl,'0' mov ebx,esi and esi,07FFFFFFh shr ebx,27 mov [edi],dl add bl,'0' lea eax,[4*eax+eax+5] ;mul by 5 and round up mov [edi+5],bl lea esi,[4*esi+esi] mov edx,eax and eax,07FFFFFFh shr edx,27 lea ebx,[esi+0c0000000h] shr ebx,26 and esi,03FFFFFFh add dl,'0' lea eax,[4*eax+eax] mov [edi+1],dl lea esi,[4*esi+esi] mov [edi+6],bl lea edx,[eax+0c0000000h] shr edx,26 and eax,03FFFFFFh lea ebx,[esi+60000000h] and esi,01FFFFFFh shr ebx,25 lea eax,[4*eax+eax] mov [edi+2],dl lea esi,[4*esi+esi] mov [edi+7],bl lea edx,[eax+60000000h] shr edx,25 and eax,01FFFFFFh lea ebx,[esi+30000000h] mov [edi+3],dl shr ebx,24 and esi,00FFFFFFh mov [edi+8],bl lea edx,[4*eax+eax+30000000h] shr edx,24 lea ebx,[4*esi+esi+18000000h] shr ebx,23 mov [edi+4],dl mov [edi+9],bl ret Terje: I think the Fasm community would be very interested in your posts, if you have some time for that. |
|||
13 Sep 2005, 21:34 |
|
Terje Mathisen 16 Sep 2005, 09:51
I wonder one thing:
You have obviously saved MUL operations by avoiding the need for back-multiplication and subtraction, so just to be certain: Have you done either a mathematical proof or an exhaustive test to verify that your shortcut does work for all possible 32-bit inputs? Terje |
|||
16 Sep 2005, 09:51 |
|
El Tangas 16 Sep 2005, 18:45
Yes, I tested for all 32 bit numbers, by converting to ASCII and back, and comparing with the original value.
In fact, I had to add some code to increase the precision, because the test failed the first time. |
|||
16 Sep 2005, 18:45 |
|
Terje Mathisen 19 Sep 2005, 10:44
Quote:
Right, that was why I asked. If the initial reciprocal MUL had been a bit more prescise, then the fractional result would have been directly usable for the lower 5 digits of the result. Terje |
|||
19 Sep 2005, 10:44 |
|
Madis731 16 Mar 2007, 15:18
64-bit!
I discovered 2 things: 1) lea r32/64,[r32/64+0c0000000h] doesn't work because of the encoding limits. This introduces more code and 2) you can fuse two magics into one possibly reducing code this may result in a third interesting thing, that you can do INT64_2_ASCII conversion...well, lets see |
|||
16 Mar 2007, 15:18 |
|
Goplat 16 Mar 2007, 15:39
What kind of program does enough number to ascii conversions to make this worthwhile?
|
|||
16 Mar 2007, 15:39 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.