flat assembler
Message board for the users of flat assembler.
Index
> Main > Reverse bit order of a single byte Goto page 1, 2 Next |
Author |
|
ronware 23 Mar 2005, 05:08
The fastest is to simply have a lookup table with the correct values. If you can spare 256 bytes, this is hard to beat.
|
|||
23 Mar 2005, 05:08 |
|
YONG 23 Mar 2005, 11:49
The lookup table approach is a bit faster.
Thank ronware for the input. YONG |
|||
23 Mar 2005, 11:49 |
|
rea 23 Mar 2005, 15:35
Search the win32asm forum there is one interesting in the algorithm section , but dont know if is fast than a lookup.
|
|||
23 Mar 2005, 15:35 |
|
r22 23 Mar 2005, 19:29
Code: Algorithm (SLOW) ;AL = 11101100b MOV ECX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 SHRD DX,AX,CL ROL DX,1 SHR AX,1 MOV AL,DL ;AL = 00110111b LookUp Table (FASTEST) label: db 00000000b, 10000000b, 01000000b ... etc 256bytes MOVZX EAX,AL MOV AL, BYTE [LABEL+EAX] |
|||
23 Mar 2005, 19:29 |
|
Matrix 24 Mar 2005, 02:38
Hello
Code: bitswap: ; swap bits in al mov ah,al xor al,al mov cx,8 @l0: shl al,1 shr ah,1 adc al,0 loop @l0 ret bitswap: ; swap bits in al mov cx,8 @l0: shl al,1 rcr ah,1 loop @l0 mov al,ah ret Techno Forever !!!! regards ... |
|||
24 Mar 2005, 02:38 |
|
YONG 24 Mar 2005, 08:12
Idea from the win32asm forum:
Loop 8 times - Shift right (shr) the input byte by 1 bit => Carry, and then - Rotate through carry left (rcl) the output byte by 1 bit. This approach is very flexible since it can easily deal with 16-bit or 32-bit input. However, I don't think this would be very fast, because there are two bit-shifting operations per loop and eight loops are required for a single byte. Thank rea, r22 and Matrix for the input. YONG |
|||
24 Mar 2005, 08:12 |
|
MCD 24 Mar 2005, 14:15
YONG wrote: Idea from the win32asm forum: Code: label: db 00000000b, 10000000b, 01000000b; ... etc 256bytes reverse_word_ax: push edx ecx movzx edx,al mov ch,[label+edx] mov dl,ah mov cl,[label+edx] mov ax,cx pop ecx edx ret reverse_dword_eax: push edx ecx movzx edx,al mov ch,[label+edx] mov dl,ah mov cl,[label+edx] shr eax,16 shl ecx,16 mov dl,al mov ch,[label+edx] mov dl,ah mov cl,[label+edx] mov eax,ecx pop ecx edx ret _________________ MCD - the inevitable return of the Mad Computer Doggy -||__/ .|+-~ .|| || Last edited by MCD on 29 Mar 2005, 13:14; edited 1 time in total |
|||
24 Mar 2005, 14:15 |
|
r22 24 Mar 2005, 18:02
The only worth while way seems to be the lookup table.
I wish there were MMX/XMMX instruction where you could do multiple lookups at the same time that would be awsum. MM0 = 8 packed bytes PLTB MM0,ESI <= something like that and return the result bytes in mm0 ;MM0 = 8 packed return bytes from byte[ESI+Byte] It's a shame processor engineers don't think like me |
|||
24 Mar 2005, 18:02 |
|
YONG 25 Mar 2005, 12:26
It seems that the lookup table approach is the best if
there is enough room for the table. Here is the lookup table for those who may also want to use this approach. Thank MCD and r22 for the input. YONG
|
|||||||||||
25 Mar 2005, 12:26 |
|
Octavio 25 Mar 2005, 18:52
this should be shorter and faster.
Code: reverse_dword_eax: push ebx mov ebx,table xlat mov bl,ah mov ah,[ebx] bswap eax xlat mov bl,ah mov ah,[ebx] pop ebx ret aling 256 table: db 0,10000000b ...... |
|||
25 Mar 2005, 18:52 |
|
YONG 26 Mar 2005, 11:16
Just want to point out two tiny bugs.
Code by MCD: reverse_dword_eax: push eax edx ecx ... mov eax,ecx pop ecx edx eax ret Since eax is used as an input/output variable, pushing it at the beginning is unnecessary and popping it at the end replaces its output value by its input value. Code by Octavio: reverse_dword_eax: ... xlat ... xlat ... ret Since xlat requires a "dummy" operand, use xlatb instead, which requires no operand. Thank Octavio for the input. YONG |
|||
26 Mar 2005, 11:16 |
|
Octavio 26 Mar 2005, 13:55
There was a bug ,xlatb can not be used because bl is modified.
Code: reverse_dword_eax: push ebx mov ebx,table mov bl,al mov al,[ebx] mov bl,ah mov ah,[ebx] bswap eax mov bl,al mov al,[ebx] mov bl,ah mov ah,[ebx] pop ebx ret aling 256 table: db 0,10000000b ...... |
|||
26 Mar 2005, 13:55 |
|
r22 26 Mar 2005, 20:27
XLAT like the string instructions is SLOW just use
mov ebx, Table movzx eax,al ;the byte mov al, [ebx+eax] if you have an array of bytes that need reversing Code: xor EAX,EAX mov ESI, Array mov EDI, RevArray mov EBX, Table mov ECX, [Length] sub ECX,1 .theLoop: JS .allDone ;when ECX is negative mov AL, byte[ESI+ECX] mov AL, byte[EBX+EAX] mov byte[EDI+ECX],AL sub ECX,1 jmp .theLoop .allDone: |
|||
26 Mar 2005, 20:27 |
|
YONG 27 Mar 2005, 19:07
I tested the following two loops, one using xlatb
and the other without xlatb. The two virtually took the same amount of time to finish 1.8MB of data. The xlatb loop is even a bit shorter. Code: ; ; 20 bytes xor bh, bh mov di, buffer Loop1: mov bl, [di] mov al, [lookup+bx] mov [di], al inc di cmp di, buffer+bufferSize jb Loop1 ; ; 18 bytes mov bx, lookup mov di, buffer Loop2: mov al, [di] xlatb mov [di], al inc di cmp di, buffer+bufferSize jb Loop2 ; lookup db 00h, ..., 0ffh ; 256 bytes bufferSize = 128 buffer rb bufferSize Thank Octavio and r22 for the input. YONG |
|||
27 Mar 2005, 19:07 |
|
Madis731 27 Mar 2005, 21:46
Isn't there a mathematical solution for that - something shorter than look-up and faster than the solutions offered here.
I mean logical explanation is very short for this algorithm: FOR x=0 to y xor bit(x),bit(y-x) xor bit(y-x),bit(x) xor bit(x),bit(y-x) NEXT We KNOW the answer to any input without even calculating - it's not like adding with carry. There must be an easier solution. |
|||
27 Mar 2005, 21:46 |
|
ATV 29 Mar 2005, 06:01
Code: bitswap_al: ; swap bits in al (destroy: ah) mov ah,al and ax,55aah ; ah = 0?0?0?0? , al = ?0?0?0?0 ror al,2 or al,ah mov ah,al and ax,9966h ; ah = ?00??00? , al = 0??00??0 rol al,4 or al,ah ror al,1 ret |
|||
29 Mar 2005, 06:01 |
|
revolution 29 Mar 2005, 07:50
ATV: Nice routine. 8 bits in 9 instructions.
Can this be extended to 16 and 32 bits also? |
|||
29 Mar 2005, 07:50 |
|
ATV 29 Mar 2005, 08:40
32bit algo found in http://aggregate.org/MAGIC/
Code: bitswap_ax: ; swap bits in ax push dx mov dx,ax and ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0 and dx,05555h ; dx = 0?0?0?0?0?0?0?0? ror ax,2 or ax,dx mov dx,ax and ax,06666h ; ax = 0??00??00??00??0 and dx,09999h ; dx = ?00??00??00??00? ror ax,4 or ax,dx mov dx,ax and ax,01e1eh ; ax = 000????0000????0 and dx,0e1e1h ; dx = ???0000????0000? ror ax,8 or ax,dx ror ax,1 pop dx ret |
|||
29 Mar 2005, 08:40 |
|
Madis731 29 Mar 2005, 08:56
Hi,
I'm gonna try something: Code: ;Bitreversal in a DWORD 90.8clk average of 104857600 passes ;The code is too long to be called optimized - N E 1 wanna try to make ;it faster? mov eax,12345678h mov ebx,eax and ebx,055555555h and eax,0AAAAAAAAh ror al,2 rol eax,8 ror al,2 rol eax,8 ror al,2 rol eax,8 ror al,2 rol eax,8 or eax,ebx mov ebx,eax and ebx,099999999h and eax,066666666h rol al,4 rol eax,8 rol al,4 rol eax,8 rol al,4 rol eax,8 rol al,4 rol eax,8 or eax,ebx ror al,1 rol eax,8 ror al,1 rol eax,8 ror al,1 rol eax,8 ror al,1 rol eax,8 bswap eax EDIT: Hehee You were faster Now try this: Code: ;32-bit try on two 16-bit passes ;59.0clk on average 10485760 passes ; 104857600 passes gave me 18 clocks mov dx,ax and ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0 and dx,05555h ; dx = 0?0?0?0?0?0?0?0? ror ax,2 or ax,dx mov dx,ax and ax,06666h ; ax = 0??00??00??00??0 and dx,09999h ; dx = ?00??00??00??00? ror ax,4 or ax,dx mov dx,ax and ax,01e1eh ; ax = 000????0000????0 and dx,0e1e1h ; dx = ???0000????0000? ror ax,8 or ax,dx ror ax,1 ror eax,16 mov dx,ax and ax,0aaaah ; ax = ?0?0?0?0?0?0?0?0 and dx,05555h ; dx = 0?0?0?0?0?0?0?0? ror ax,2 or ax,dx mov dx,ax and ax,06666h ; ax = 0??00??00??00??0 and dx,09999h ; dx = ?00??00??00??00? ror ax,4 or ax,dx mov dx,ax and ax,01e1eh ; ax = 000????0000????0 and dx,0e1e1h ; dx = ???0000????0000? ror ax,8 or ax,dx ror ax,1 ror eax,16 |
|||
29 Mar 2005, 08:56 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.