flat assembler
Message board for the users of flat assembler.

Index > Main > Another hurdle on the algorithmic alley: BIN2HEX ASCII

Author
Thread Post new topic Reply to topic
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
The ugly thingy I've got so far...ugh. The edi only makes in nicer, not
necessarily faster Smile
The objective is to make the most optimal conversion routine to make
a 32-bit integer in EAX into 8-byte ASCII string in [ESI]

Let the optimizations begin:
Code:
toString:; IN=eax ; OUT=qword[esi] binary converted to ASCII HEX
        mov      edi,0F0F0F0Fh
        mov      ecx,eax
        mov      ebx,eax
        shr      eax,4
        and      ebx,edi
        and      eax,edi
        mov      ecx,eax
        mov      edx,ebx
        add      ebx,06060606h
        add      eax,06060606h
        shr      ebx,4
        shr      eax,4
        and      ebx,edi
        and      eax,edi
        lea      edx,[ebx*8+edx+30303030h]
        lea      ecx,[eax*8+ecx+30303030h]
        sub      edx,ebx
        sub      ecx,eax
        xchg     ch,dl
        rol      ecx,16
        rol      edx,16
        xchg     ch,dl
        rol      ecx,16
        xchg     cx,dx
        rol      edx,16
        mov      [esi],ecx
        mov      [esi+4],edx
        ret
    


Its 111 bytes when copied to an empty ASM-file.
About 34 clocks (??????) on my Core Duo <= really wierd, sometimes its even 1:4 compression of operations in one clock... (6 ports)

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 14 Sep 2006, 20:34
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I tried this:
Code:
include 'win32axp.inc'
.code
start:
  invoke SetPriorityClass, 0, REALTIME_PRIORITY_CLASS
.loop:
rdtsc
push eax
        mov     eax, $12345678
        mov     esi, buffer
        call    toString
rdtsc
pop edx
        dec     [counter]
        jnz     .loop
        int3
toString:; IN=eax ; OUT=qword[esi] binary converted to ASCII HEX


        mov      edi,0F0F0F0Fh
        mov      ecx,eax 
        mov      ebx,eax 
        shr      eax,4 
        and      ebx,edi 
        and      eax,edi 
        mov      ecx,eax 
        mov      edx,ebx 
        add      ebx,06060606h 
        add      eax,06060606h 
        shr      ebx,4 
        shr      eax,4 
        and      ebx,edi 
        and      eax,edi 
        lea      edx,[ebx*8+edx+30303030h] 
        lea      ecx,[eax*8+ecx+30303030h] 
        sub      edx,ebx 
        sub      ecx,eax 
        xchg     ch,dl 
        rol      ecx,16 
        rol      edx,16 
        xchg     ch,dl 
        rol      ecx,16 
        xchg     cx,dx 
        rol      edx,16 
        mov      [esi],ecx 
        mov      [esi+4],edx
        ret

.data
        counter dd 3
        buffer  rb 16
.end start    


At int3 EDX-EAX=12596 and the string is "9681B46D" instead of "12345678". <- IGNORE THIS SENTENCE!!

How do you measure the clocks? I hope that my Athlon64 isn't so bad to get that terrible time of 12596 clocks Confused
[edit] I added code to repeat the call and the second time it took only 92 cycles (measuring with the same method)[/edit]
[edit2] With the new code I count only 22 cycles!![/edit2]


Last edited by LocoDelAssembly on 14 Sep 2006, 21:45; edited 2 times in total
Post 14 Sep 2006, 21:13
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
erm
1) mov eax,12345678h
2) RDTSC => eax=XXXXXXXXh
3) translate it to ASCII Very Happy
of course you get wrong results...

and PS you should make the loop go like 100000 times with ebp or other counters that are not used...and I tested time with CALL&RET included!
So the thirty-something clocks are "Call-2-Call" which means decrementing of ebp and testing it for zero.
Post 14 Sep 2006, 21:18
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
HAHAHAHAHAHA, you are right Razz

Something here must be interrupting my CPU very often because it typically spend more than 1000 cycles (repeating at least three times) but if I use a realtime priority I get just 21 cycles!! It's a very big difference, what thing here can be preempting the CPU slices so often?

Regards

PS: I'll edit my previous post with the new code.
Post 14 Sep 2006, 21:42
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
mmm, I tested without SetPriorityClass and took the same time. If I comment ".code", ".data" and ".end start" I get 1000~ clocks again, BUT, if I put "rb 256" between RET and "counter dd 3" I get 22 clocks. Seems that is very bad write too near of code that it's executing!!
Post 14 Sep 2006, 21:54
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Quote:

mmm, I tested without SetPriorityClass and took the same time. If I comment ".code", ".data" and ".end start" I get 1000~ clocks again, BUT, if I put "rb 256" between RET and "counter dd 3" I get 22 clocks. Seems that is very bad write too near of code that it's executing!!

Indeed, and this has been known for a long while Smile - I didn't know either, until I bumped into it, while wondering why some of Herbert Kleebauer's code was so slow.
Post 14 Sep 2006, 22:02
View user's profile Send private message Visit poster's website Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat
The first mov ecx,eax is redundant.

edit: Seems the code is actually slower without it. Average of 28 cycles with it, 28.357 without it. That's really strange...
Post 15 Sep 2006, 18:02
View user's profile Send private message Reply with quote
Vasilev Vjacheslav



Joined: 11 Aug 2004
Posts: 392
Vasilev Vjacheslav
Goplat, maybe it placed for align
Post 16 Sep 2006, 04:39
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
With the redundant MOV I get 16 cycles and without it 15 cycles.

(I get 22 cycles if I put CPUID at .loop label what is just before the time counting starts and 21 cycles without the redundant MOV)
Post 16 Sep 2006, 14:10
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
LUT might be a bit faster
Code:
MOVZX EDX,AL
MOVZX ECX,AH
MOVZX EDX,WORD[LUT + EDX*2]
SHR EAX,16
MOVZX ECX,WORD[LUT + ECX*2]
MOV WORD[ESI+6],DX
MOV WORD[ESI+4],CX
MOVZX EDX,AL
MOVZX ECX,AH
MOVZX EDX,WORD[LUT + EDX*2]
MOVZX ECX,WORD[LUT + ECX*2]
MOV WORD[ESI+2],DX
MOV WORD[ESI],CX

LUT dw '00','01','02','03','04','05','06','07','08','09','0A','0B','0C','0D','0E','0F'
...
dw 'F0','F1','F2','F3','F4','F5','F6','F7','F8','F9','FA','FB','FC','FD','FE','FF'

    
Post 17 Sep 2006, 17:46
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
You might want to take a look at this thread from asmcommunity.
Post 17 Sep 2006, 17:58
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
r22, just for the record, you can do this:

Code:
LUT dw '000102030405060708090A0B0C0D0E0F'     


For lots of little strings, you may (or may not) prefer it. Just FYI. Smile
Post 18 Sep 2006, 17:21
View user's profile Send private message Visit poster's website Reply with quote
UCM



Joined: 25 Feb 2005
Posts: 285
Location: Canada
UCM
rugxulo: No, you can't, it will say "value out of range" Wink
Post 19 Sep 2006, 00:21
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Oops, I meant db, not dw:

Code:
LUT db '000102030405060708090A0B0C0D0E0F'
    
Post 19 Sep 2006, 22:02
View user's profile Send private message Visit poster's website Reply with quote
UCM



Joined: 25 Feb 2005
Posts: 285
Location: Canada
UCM
OR, you could do this:
Code:
repeat 256
   a = %-1
   l = a and 0xF
   h = a shr 4
   if h > 9
      db 'A'+(h-10)
   else
      db '0'+h
   end if
   if l > 9
      db 'A'+(l-10)
   else
      db '0'+l
   end if
end repeat
    

(tested)

Okay, it's not great, but it's more compact Very Happy
Post 20 Sep 2006, 00:41
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.