flat assembler
Message board for the users of flat assembler.

Index > Main > Reverse bit order of a single byte

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Any simple way to reverse the order of bits of a single byte?

For example, 11010001b => 10001011b.

Currently, I am using a rather cumbersome approach:
- test each bit of the input byte, and then
- set the corresponding bit of the output byte.

Thanks,
YONG
Post 23 Mar 2005, 04:06
View user's profile Send private message Visit poster's website Reply with quote
ronware



Joined: 08 Jan 2004
Posts: 179
Location: Israel
ronware
The fastest is to simply have a lookup table with the correct values. If you can spare 256 bytes, this is hard to beat.
Post 23 Mar 2005, 05:08
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
The lookup table approach is a bit faster.

Thank ronware for the input.

YONG
Post 23 Mar 2005, 11:49
View user's profile Send private message Visit poster's website Reply with quote
rea



Joined: 14 Nov 2004
Posts: 92
rea
Search the win32asm forum there is one interesting in the algorithm section Wink, but dont know if is fast than a lookup.
Post 23 Mar 2005, 15:35
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Code:
Algorithm (SLOW)
;AL = 11101100b
MOV ECX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
SHRD DX,AX,CL
ROL DX,1
SHR AX,1
MOV AL,DL
;AL = 00110111b


LookUp Table (FASTEST)
label: db 00000000b, 10000000b, 01000000b ... etc 256bytes
MOVZX EAX,AL
MOV AL, BYTE [LABEL+EAX]
    
Post 23 Mar 2005, 19:29
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
Hello

Code:
bitswap: ; swap bits in al
mov ah,al
xor al,al
mov cx,8
@l0:
shl al,1
shr ah,1
adc al,0
loop @l0
ret

bitswap: ; swap bits in al
mov cx,8
@l0:
shl al,1
rcr ah,1
loop @l0
mov al,ah
ret
    


Techno Forever !!!!

regards ...
Post 24 Mar 2005, 02:38
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Idea from the win32asm forum:

Loop 8 times
- Shift right (shr) the input byte by 1 bit => Carry, and then
- Rotate through carry left (rcl) the output byte by 1 bit.

This approach is very flexible since it can easily deal with
16-bit or 32-bit input. However, I don't think this would be
very fast, because there are two bit-shifting operations per loop
and eight loops are required for a single byte.

Thank rea, r22 and Matrix for the input.

YONG
Post 24 Mar 2005, 08:12
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
YONG wrote:
Idea from the win32asm forum:

Loop 8 times
- Shift right (shr) the input byte by 1 bit => Carry, and then
- Rotate through carry left (rcl) the output byte by 1 bit.

This approach is very flexible since it can easily deal with
16-bit or 32-bit input. However, I don't think this would be
very fast, because there are two bit-shifting operations per loop
and eight loops are required for a single byte.

Thank rea, r22 and Matrix for the input.

YONG
Indeed, this gives very small and flexible code, but is goddamned slow if you have lots of data to reverse. You should only use this kind of code only on initialization of the lookup table once. The lookuptable approach seems to be the best one. But you need to use fixed size elements, e.g. you can only reverse 8bits directly, so when you want to reverse other sized values, say words or dwords, it's best to use multiple lookup table values in reverse order too:
Code:
label: db 00000000b, 10000000b, 01000000b; ... etc 256bytes 

reverse_word_ax:
push edx ecx
movzx edx,al 
mov ch,[label+edx] 
mov dl,ah
mov cl,[label+edx]
mov ax,cx
pop ecx edx
ret

reverse_dword_eax:
push edx ecx
movzx edx,al
mov ch,[label+edx]
mov dl,ah
mov cl,[label+edx]
shr eax,16
shl ecx,16
mov dl,al
mov ch,[label+edx]
mov dl,ah
mov cl,[label+edx]
mov eax,ecx
pop ecx edx
ret
    

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||


Last edited by MCD on 29 Mar 2005, 13:14; edited 1 time in total
Post 24 Mar 2005, 14:15
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
The only worth while way seems to be the lookup table.
I wish there were MMX/XMMX instruction where you could do multiple lookups at the same time that would be awsum.

MM0 = 8 packed bytes

PLTB MM0,ESI <= something like that and return the result bytes in mm0
;MM0 = 8 packed return bytes from byte[ESI+Byte]
It's a shame processor engineers don't think like me Very Happy
Post 24 Mar 2005, 18:02
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
It seems that the lookup table approach is the best if
there is enough room for the table.

Here is the lookup table for those who may also want to
use this approach.

Thank MCD and r22 for the input.

YONG


Description:
Download
Filename: lookup.txt
Filesize: 1.62 KB
Downloaded: 353 Time(s)

Post 25 Mar 2005, 12:26
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
this should be shorter and faster.
Code:
reverse_dword_eax:
push ebx
mov ebx,table
xlat
mov bl,ah
mov ah,[ebx]
bswap eax
xlat
mov bl,ah
mov ah,[ebx]
pop ebx
ret

aling 256 
table:
db 0,10000000b ......
    
Post 25 Mar 2005, 18:52
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Just want to point out two tiny bugs.

Code by MCD:

reverse_dword_eax:
push eax edx ecx
...
mov eax,ecx
pop ecx edx eax
ret

Since eax is used as an input/output variable,
pushing it at the beginning is unnecessary and
popping it at the end replaces its output value
by its input value.

Code by Octavio:

reverse_dword_eax:
...
xlat
...
xlat
...
ret

Since xlat requires a "dummy" operand,
use xlatb instead, which requires no operand.

Thank Octavio for the input.

YONG
Post 26 Mar 2005, 11:16
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
There was a bug ,xlatb can not be used because bl is modified.
Code:
reverse_dword_eax:
push ebx
mov ebx,table
mov bl,al
mov al,[ebx]
mov bl,ah
mov ah,[ebx]
bswap eax
mov bl,al
mov al,[ebx]
mov bl,ah
mov ah,[ebx]
pop ebx
ret

aling 256 
table:
db 0,10000000b ......
    
[/quote]
Post 26 Mar 2005, 13:55
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
XLAT like the string instructions is SLOW just use
mov ebx, Table
movzx eax,al ;the byte
mov al, [ebx+eax]

if you have an array of bytes that need reversing
Code:
xor EAX,EAX
mov ESI, Array
mov EDI, RevArray
mov EBX, Table
mov ECX, [Length]
sub ECX,1
.theLoop:
JS .allDone ;when ECX is negative
mov AL, byte[ESI+ECX]
mov AL, byte[EBX+EAX]
mov byte[EDI+ECX],AL
sub ECX,1
jmp .theLoop
.allDone:    
Post 26 Mar 2005, 20:27
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
I tested the following two loops, one using xlatb
and the other without xlatb. The two virtually
took the same amount of time to finish 1.8MB of data.
The xlatb loop is even a bit shorter.
Code:
;
; 20 bytes
                xor     bh, bh
                mov     di, buffer
        Loop1:
                mov     bl, [di]
                mov     al, [lookup+bx]
                mov     [di], al
                inc     di
                cmp     di, buffer+bufferSize
                jb      Loop1
;
; 18 bytes
                mov     bx, lookup
                mov     di, buffer
        Loop2:
                mov     al, [di]
                xlatb
                mov     [di], al
                inc     di
                cmp     di, buffer+bufferSize
                jb      Loop2
;
lookup          db      00h, ..., 0ffh  ; 256 bytes
bufferSize      =       128
buffer          rb      bufferSize
    

Thank Octavio and r22 for the input.

YONG
Post 27 Mar 2005, 19:07
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Isn't there a mathematical solution for that - something shorter than look-up and faster than the solutions offered here.

I mean logical explanation is very short for this algorithm:
FOR x=0 to y
xor bit(x),bit(y-x)
xor bit(y-x),bit(x)
xor bit(x),bit(y-x)
NEXT
We KNOW the answer to any input without even calculating - it's not like adding with carry. There must be an easier solution.
Post 27 Mar 2005, 21:46
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
ATV



Joined: 31 Aug 2004
Posts: 109
Location: Finland
ATV
Code:
bitswap_al:    ; swap bits in al (destroy: ah)
  mov ah,al
  and ax,55aah ; ah = 0?0?0?0? , al = ?0?0?0?0
  ror al,2
  or  al,ah
  mov ah,al
  and ax,9966h ; ah = ?00??00? , al = 0??00??0
  rol al,4
  or  al,ah
  ror al,1
  ret
    
Post 29 Mar 2005, 06:01
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17667
Location: In your JS exploiting you and your system
revolution
ATV: Nice routine. 8 bits in 9 instructions.

Can this be extended to 16 and 32 bits also?
Post 29 Mar 2005, 07:50
View user's profile Send private message Visit poster's website Reply with quote
ATV



Joined: 31 Aug 2004
Posts: 109
Location: Finland
ATV
32bit algo found in http://aggregate.org/MAGIC/
Code:
bitswap_ax:      ; swap bits in ax
  push dx
  mov  dx,ax
  and  ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0
  and  dx,05555h ; dx = 0?0?0?0?0?0?0?0?
  ror  ax,2
  or   ax,dx
  mov  dx,ax
  and  ax,06666h ; ax = 0??00??00??00??0
  and  dx,09999h ; dx = ?00??00??00??00?
  ror  ax,4
  or   ax,dx
  mov  dx,ax
  and  ax,01e1eh ; ax = 000????0000????0
  and  dx,0e1e1h ; dx = ???0000????0000?
  ror  ax,8
  or   ax,dx
  ror  ax,1
  pop  dx
  ret
    
Post 29 Mar 2005, 08:40
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Hi,
I'm gonna try something:
Code:
;Bitreversal in a DWORD 90.8clk average of 104857600 passes
;The code is too long to be called optimized - N E 1 wanna try to make
;it faster?
  mov eax,12345678h
  mov ebx,eax
  and ebx,055555555h
  and eax,0AAAAAAAAh
  ror al,2
  rol eax,8
  ror al,2
  rol eax,8
  ror al,2
  rol eax,8
  ror al,2
  rol eax,8
  or  eax,ebx
  mov ebx,eax
  and ebx,099999999h
  and eax,066666666h
  rol al,4
  rol eax,8
  rol al,4
  rol eax,8
  rol al,4
  rol eax,8
  rol al,4
  rol eax,8
  or  eax,ebx
  ror al,1
  rol eax,8
  ror al,1
  rol eax,8
  ror al,1
  rol eax,8
  ror al,1
  rol eax,8
  bswap eax
    

EDIT: Hehee You were faster Very Happy

Now try this:
Code:
;32-bit try on two 16-bit passes
;59.0clk on average 10485760 passes
; Question 104857600 passes gave me 18 clocks Question
  mov  dx,ax
  and  ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0
  and  dx,05555h ; dx = 0?0?0?0?0?0?0?0?
  ror  ax,2
  or   ax,dx
  mov  dx,ax
  and  ax,06666h ; ax = 0??00??00??00??0
  and  dx,09999h ; dx = ?00??00??00??00?
  ror  ax,4
  or   ax,dx
  mov  dx,ax
  and  ax,01e1eh ; ax = 000????0000????0
  and  dx,0e1e1h ; dx = ???0000????0000?
  ror  ax,8
  or   ax,dx
  ror  ax,1
  ror  eax,16
  mov  dx,ax
  and  ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0
  and  dx,05555h ; dx = 0?0?0?0?0?0?0?0?
  ror  ax,2
  or   ax,dx
  mov  dx,ax
  and  ax,06666h ; ax = 0??00??00??00??0
  and  dx,09999h ; dx = ?00??00??00??00?
  ror  ax,4
  or   ax,dx
  mov  dx,ax
  and  ax,01e1eh ; ax = 000????0000????0
  and  dx,0e1e1h ; dx = ???0000????0000?
  ror  ax,8
  or   ax,dx
  ror  ax,1
  ror  eax,16
    
Post 29 Mar 2005, 08:56
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.