flat assembler
Message board for the users of flat assembler.

Index > Main > Best way to convert hexadecimal string to binary string?

Author
Thread Post new topic Reply to topic
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 16 Nov 2008, 02:48
Hello, can anyone please help me optimise (size or speedwise) my code below that converts a hexadecimal coded string into a binary string?
Code:
;Store eg. D0a as the ascii cr lf
;SI=> start of non-hex phrase
;DI-> where string will be stored => after new string
Hex2str:
    push    edi
 db      3Ch     ;Disable next byte
.sto: stosb           ;Store 1 digit
      call    HTOA    ;Go get a hex digit
 jc      .sto    ;If no error, go store it
   mov     ecx,edi
     pop     edi     ;DI->where char will be stored
   sub     ecx,edi
     shr     ecx,1   ;If even number of hex-codes
        adc     edi,0
       jecxz   .notHex ;If no digits left, quit
.cat:   push    esi
 mov     esi,edi
.chg:    lodsw
       shl     al,4
        add     al,ah
       stosb
       loop    .chg
        pop     esi
.notHex:ret

;HTOA lodsb and sets CF if char in AL is a hex digit (0-9, A-F, a-f).
; and converts AL into the associated binary value.
;If not a hex digit, then SI is bumped back, and CF cleared
HTOA:  lodsb
       sub     al,'0'        ;reduce decimal digits to their binary values
       cmp     al,10   ;if AL 0-9 can quit now
     jb      .ret
        and     al,0DFh ;coerce letters to upper case
       sub     al,7    ;reduce A--F range to 10--15
        cmp     al,10   ;was input below the A--F range?
    jb      ErrChar
     cmp     al,16   ;Carry If input below a--f range
    ja      ErrChar
.ret:    ret    
I know that HTOA can probably be optimised speedwise using an xlat table. I seem to remember AAM 10 can also be used somehow, but I don't exactly know how though.


Last edited by wht36 on 17 Nov 2008, 10:40; edited 1 time in total
Post 16 Nov 2008, 02:48
View user's profile Send private message Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 16 Nov 2008, 03:45
Code:
;Store eg. D0a as the ascii cr lf
;SI=> start of non-hex phrase
;DI-> where string will be stored => after new string
Hex2str:
  push    edi
 db      3Ch     ;Disable next byte
.sto: stosb           ;Store 1 digit
      lodsb
       sub     al,'0'        ;reduce decimal digits to their binary values
       cmp     al,10   ;if AL 0-9 can quit now
     jb      .sto
        and     al,0DFh ;coerce letters to upper case
       sub     al,7    ;reduce A--F range to 10--15
        cmp     al,10   ;was input below the A--F range?
    jb      .pack
       cmp     al,16   ;Carry If input below a--f range
    jb      .sto    ;If no error, go store it
.pack: dec     esi     ;Bump back to where we stopped
      mov     ecx,edi
     pop     edi     ;DI->where char will be stored
   sub     ecx,edi
     shr     ecx,1   ;If even number of hex-codes
        adc     edi,0
       jecxz   .notHex ;If no digits left, quit
    push    esi
 mov     esi,edi
.chg:    lodsw
       shl     al,4
        add     al,ah
       stosb
       loop    .chg
        pop     esi
.notHex:ret    

Minor update
Post 16 Nov 2008, 03:45
View user's profile Send private message Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 16 Nov 2008, 19:30
One optimization crossed my mind. I didn't test it but here's my idea:

Code:
 
;Input: al contains a nibble to be converted.

and eax,0xf                   ;Clear extra bits
mov ebx,hex_table
mov al,byte[ebx+eax]          ;Get needed character fromt the converions table.


hex_table: db '0123456789ABCDEF',0
    


It's not the whole code, but I hope you get the idea....Smile

regards,
Mac2004
Post 16 Nov 2008, 19:30
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 16 Nov 2008, 21:56
Code:
and al,0fh
mov ebx,htbl
xlat
...
htbl db '0123456789ABCDEF'
    

no need to test, it works.

but do you feel so hard to make it?
Code:
        and al,0fh
        add al,'0'
        cmp al,'9'
        jle @f
        add al,'A'-'9'-1
@@:
    


sorry, i've misanderstood.
then, i correct with a

great, good idea to make a hex2num func.
Post 16 Nov 2008, 21:56
View user's profile Send private message Visit poster's website Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 17 Nov 2008, 10:34
Thanks, nice tip with the xlatb, I'll keep it in mind. It's the opposite of what I am trying to do though (I am trying to convert the string '0' ... 'F' etc to binary 0-F, not the other way round e.g. given the hex codes '0D0A' it outputs crlf).

I guess I may have conveyed the wrong idea with my title (have changed it just in case).

I should also note that the code need to be case insensitive and need to stop at invalid characters (such as : and @) in the input. This means the code below is not acceptable because it converts : and @ as well
Code:
sub al,'0'
aam 16
aad 9    

The code below comes close but converts @
Code:
sub al,'0'
and al,0xDF
aam 16
daa
aad 9    
Post 17 Nov 2008, 10:34
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 17 Nov 2008, 17:39
Code:
; ASCII hexadecimal digit to nibble

; $30-$39 =>  0 - 9
  sub al,$30
  cmp al,10
  jc .done
; $31-$36 => $11-$16
  and al,$DF ; uppercase
; $11-$16 => 10 - 15
  sub al,7
  cmp al,16
  jc .done

; do something about the unknown digit

.done:    
...this might not work, but seems right.
(the FASM source code would need such a routine as well)
Post 17 Nov 2008, 17:39
View user's profile Send private message Visit poster's website Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 17 Nov 2008, 19:16
Not bad, but would parse :;<=>?@` ($3A-$40, $60) incorrectly though.

The code below works correctly and reduces the size by 2 bytes
Code:
      sub     al,':'    ;Is it a number
     jb      .num
        and     al,0xDF ;Coerce to uppercase
        sub     al,7    ;Is it a letter
     jb      .bad
.num:       add     al,10   ;Convert to binary
  cmp     al,16
       jb      .good   ;Store if 0..9 A..F    


.bad: ... ; not 0..9 and not A..F
.good: ... ; 0..9 and A..F


Last edited by wht36 on 18 Nov 2008, 03:50; edited 1 time in total
Post 17 Nov 2008, 19:16
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 17 Nov 2008, 19:56
an other way to go is to create a LUT with valid chars.

Code:
* db -1
+ db -1
- db -1
/ db -1
...
0 db 0
1 db 1
2 db 2
3 db 3
4 db 4
5 db 5
6 db 6
7 db 7
8 db 8
9 db 9
...
A db 10
B db 11
C db 12
D db 13
E db 14
F db 15
....
a db 10
b db 11
c db 12
d db 13
e db 14
f db 15
...
other ascii codes db -1
    

but it needs 256 bytes of LUT only for this task.
the code will be reduced a maximum because error detection is the -1 number.
Code:
there:
...
mov al,hexchar
mov ebx,htbl
xlat
cmp al,0
jl .error
add ah,al
...
loop there
...
.error:
...
    
Post 17 Nov 2008, 19:56
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 18 Nov 2008, 01:01
Code:
   movzx eax,al ; clamp to byte range
   movzx eax,[hextab+eax]
   shr eax,1 ; set carry flag on error
   jc .error

hextab db \
1,1,1,1,1,1,1,...

0,2,4,6,8,10,12,14,16,18,1,1,1,1,1,\
1,20,22,24,26,28,30,1,1,1,1...    
Could do several in parallel, too. Razz
Post 18 Nov 2008, 01:01
View user's profile Send private message Visit poster's website Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 18 Nov 2008, 04:40
edfed wrote:
an other way to go is to create a LUT with valid chars.

but it needs 256 bytes of LUT only for this task.
the code will be reduced a maximum because error detection is the -1 number.
Code:
there:
...
mov al,hexchar
mov ebx,htbl
xlat
cmp al,0
jl .error
add ah,al
...
loop there
...
.error:
...
    

Nice! I have this in mind as well and was thinking of doing this for speed optimisation.

bitRAKE wrote:
Code:
   movzx eax,al ; clamp to byte range
   movzx eax,[hextab+eax]
   shr eax,1 ; set carry flag on error
   jc .error

hextab db \
1,1,1,1,1,1,1,...

0,2,4,6,8,10,12,14,16,18,1,1,1,1,1,\
1,20,22,24,26,28,30,1,1,1,1...    
Could do several in parallel, too. Razz

Ooh, I like this very much, it looks quite fast! What do you mean by parallel? As in dword reads with parallel conversion of all 4 bytes (like in the speed optimised uppercase/lower case string conversions)? Would it be possible in this instance?
Post 18 Nov 2008, 04:40
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 18 Nov 2008, 05:00
Code:
movzx eax,[esi+0]
movzx ebx,[esi+1]
movzx ecx,[esi+2]
movzx edx,[esi+3]
add esi,4
mov al,[hextab+eax]
mov bl,[hextab+ebx]
mov cl,[hextab+ecx]
mov dl,[hextab+edx]
...    
...is kind of what I had in mind. Then I started wondering how best to combine the valid nibbles as they become validated...instead of using bit zero, bit 4 might work better:
Code:
and eax,$10
jnz .err0
shl ebx,32-4
jc .err1
shld eax,ebx,4
shl ecx,32-4
jc .err2
shld eax,ecx,4
shl edx,32-4
jc .err3
shld eax,edx,4    
...should be a better way - like single branch SSE2 using a multiply to combine all valid terms. Razz

Here is one way to convert a 64-bit number to hexadecimal ASCII:
Code:
    align 16
_0F  db 16 dup ($0F)
_REV: rept 16 i { db 16-i }
hex  db "0123456789ABCDEF"

...

    movq xmm0,rcx ; 64-bit input in RCX
    movdqa xmm1,xmm0
    psrlw xmm0,4
    punpcklbw xmm1,xmm0

    movdqa xmm0,dqword [hex]
    pand xmm1,dqword [_0F]
    pshufb xmm0,xmm1 ; SSSE3
    pshufb xmm0,dqword [_REV] ; SSSE3
    movdqu dqword [rdx],xmm0 ; store string of bytes where RDX points    
That PSHUFB instruction is quite interesting, imho.
Post 18 Nov 2008, 05:00
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 18 Nov 2008, 07:50
bitRAKE wrote:
That PSHUFB instruction is quite interesting, imho.

Brilliant! Smile
Post 18 Nov 2008, 07:50
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 18 Nov 2008, 18:29
bitRAKE's PSHUFB usage in that snippet caused me to stop working for a few moments and marvel.

Taking full advantage of siMD, I love it.
Post 18 Nov 2008, 18:29
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
iic2



Joined: 26 Jun 2008
Posts: 122
iic2 19 Nov 2008, 02:30
PSHUFB... What kind of processor must you have to use this instruction? It don't seem to work on my P3.
Post 19 Nov 2008, 02:30
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 19 Nov 2008, 02:38
Core2 Duo has it (at least my brother's computer has SSSE3)
Post 19 Nov 2008, 02:38
View user's profile Send private message Reply with quote
wht36



Joined: 18 Sep 2005
Posts: 106
wht36 25 Nov 2008, 15:17
bitRAKE wrote:
Code:
movzx eax,[esi+0]
movzx ebx,[esi+1]
movzx ecx,[esi+2]
movzx edx,[esi+3]
add esi,4
mov al,[hextab+eax]
mov bl,[hextab+ebx]
mov cl,[hextab+ecx]
mov dl,[hextab+edx]
...    


Very nice code! Can I ask that, in general, would a dword read result in a protection fault/cpu exception if it crosses the boundary of an allocated memory block?
Post 25 Nov 2008, 15:17
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 26 Nov 2008, 01:54
wht36 wrote:
Can I ask that, in general, would a dword read result in a protection fault/cpu exception if it crosses the boundary of an allocated memory block?
No - not in general. Some instructions are specifically designed to only read from aligned addresses (exception generated otherwise). Also, the processor has a feature to force all memory accesses to be aligned (exception generated otherwise), but no operating system I know of uses it.

Of course, a single dword read from memory could be distributed to the other registers:
Code:
mov eax,[esi]
add esi,4
movzx edx,al
movzx ecx,ah
bswap eax
movzx ebx,ah
movzx eax,al    
...but I doubt this would perform faster for uncached data, and certainly would impact the performance for cached data due to the dependancy chain.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 Nov 2008, 01:54
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.