flat assembler
Message board for the users of flat assembler.

Index > Main > Reverse bit order of a single byte

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
ATV



Joined: 31 Aug 2004
Posts: 109
Location: Finland
ATV
Post 29 Mar 2005, 11:42
View user's profile Send private message Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
Thanks YONG for the small bugfix, I must have missed this.

All direct reversing algorythms herein are good, but lookup tables are still faster. And r22 is right, processor engineerer missed this kind of instruction. There is even no reversing instruction that works with the standard integer registers. This could be added as a new instruction, but you must also take into acount the actual use/familiarity of it. I guess this kind of instruction is seldom used. But on the other hand, it's very easy to implement, you don't need any logical elements for it, just some multilayered crossing conducting paths.

The problem with all this is that there are also many others, too many other bit operations to add as instructions. A sophisticated but effective solution for this would be to insert a kind of universal instruction that allows to exchange any specified bit with another. Say, lets call it SHFB or BSHF for "Bit shuffle". It could work a bit like those PSHUFB/W and SHUFPS MMX/SSE instructions, thus the bit positions would be specified by a third immediate or CL register. But I think that there would be more than 8 bits, which makes it difficult to implement it this way.

I which I were an Intel Engineerer right now. I really think I should study Computer Engineering Wink
Post 29 Mar 2005, 13:29
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Thanks ATV for the two excellent references. WHY is this procedure (32 bit reversal) useful? When would you use it? How is it superior to USING A LOT OF MEMORY? Does someone imagine that bit reversal is somehow more elegant than employing a large quantity of memory to accomplish the same purpose?
Isn't this thinking, i.e. the idea of finding a slick method to perform bit reversal efficiently, a RELIC of bygone days when memory was so precious? If MCD wishes to study computer engineering, he/she will profit from recognizing the need to FIRST DEFINE what one seeks to accomplish in writing a program, THEN explore various solutions, whether elegant or mundane. The real key to good computer engineering is to write PROGRAMS that are readily understood by anyone. Obscure, convoluted, arcane thinking, leading to a FAST, REALLY FAST, TERRIFICALLY FAST piece of CODE, that no one else can understand, is NOT ENGINEERING. It is babble. cheers Smile
Post 29 Mar 2005, 15:09
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Quote:

Obscure, convoluted, arcane thinking, leading to a FAST, REALLY FAST, TERRIFICALLY FAST piece of CODE, that no one else can understand, is NOT ENGINEERING.

This is why you document your source code. *especially* when you use highly optimized assembly. Avoiding optimizations just to have "readable code" is nonsense.
Post 29 Mar 2005, 15:41
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Well, ignoring the issue of speed, the above ideas are all
very interesting. Here is my idea:
Code:
                                        ; al = the input byte
                                        ; destroy ah, dl, dh
                mov     ah, al
                mov     dx, ax
                                        ; al = ah = dl = dh = the input byte
                and     ax, 2211h
                and     dx, 8844h
                                        ; al = 000?000?b
                                        ; ah = 00?000?0b
                                        ; dl = 0?000?00b
                                        ; dh = ?000?000b
                ror     al, 1
                ror     ah, 3
                rol     dl, 3
                rol     dh, 1
                                        ; al = ?000?000b
                                        ; ah = 0?000?00b
                                        ; dl = 00?000?0b
                                        ; dh = 000?000?b
                or      ax, dx
                or      al, ah
                                        ; al = the output byte
    


Thank ATV, revolution, Madis731, MCD, tom tobias and f0dder for the input.

YONG
Post 29 Mar 2005, 16:48
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
It was an interesting topic and I looked at the gems - I loved the idea of using the same 256byte array 4 times on a DWORD Very Happy

YONG Your idea is very simple but it takes 55 clocks though 32bit one takes 21 clocks and you can "and" your way to 8bits in one clock so you'll get 22 clocks Wink try that would you
Post 29 Mar 2005, 18:54
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
tom tobias wrote:
WHY is this procedure (32 bit reversal) useful? When would you use it?
If MCD wishes to study computer engineering, he/she will profit from recognizing the need to FIRST DEFINE what one seeks to accomplish in writing a program, THEN explore various solutions, whether elegant or mundane. The real key to good computer engineering is to write PROGRAMS that are readily understood by anyone. Obscure, convoluted, arcane thinking, leading to a FAST, REALLY FAST, TERRIFICALLY FAST piece of CODE, that no one else can understand, is NOT ENGINEERING. It is babble. cheers Smile

Indeed I can't think of so many uses for this, but assume you have a serial (input) device which sends each single bits to a controler which merges them into a byte which then may be used further by a PC driver software. Imagine the controller puts them into the reversed order, than you will have to rereverse them in the driver code. Unfortunately, it seems that similar kind of throwing bits widely across hardware I/O ports is widely common. Well, you have logical and shift instructions to do this, but a shorter, easier and faster method which does the job in one step would be nice.

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 30 Mar 2005, 12:07
View user's profile Send private message Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Here are the 16-bit & the 32-bit versions of my previous idea:

Code:
; 
; 8-bit input - 10 instructions - shown earlier
;

;
; 16-bit input - 19 instructions
;
                                        ; ax = the input word
                                        ; destroy bx, cx, dx
                mov     bx, ax
                mov     cx, ax
                mov     dx, ax
                                        ; ax = bx = cx = dx = the input word
                and     ax, 1111h
                and     bx, 2222h
                and     cx, 4444h
                and     dx, 8888h

                ror     al, 1
                ror     ah, 1
                ror     bl, 3
                ror     bh, 3
                rol     cl, 3
                rol     ch, 3
                rol     dl, 1
                rol     dh, 1

                or      ax, dx
                or      ax, cx
                or      ax, bx
                                        ; Still requires to swap al and ah
                xchg    al, ah
                                        ; ax = the output word
;
; 32-bit input - 30 instructions
;
                                        ; eax = the input dword
                                        ; destroy ebx, ecx, edx
                mov     ebx, eax
                mov     ecx, eax
                mov     edx, eax
                                        ; eax = ebx = ecx = edx
                                        ; = the input dword
                and     eax, 11111111h
                and     ebx, 22222222h
                and     ecx, 44444444h
                and     edx, 88888888h

                ror     al, 1
                ror     ah, 1
                bswap   eax
                ror     al, 1
                ror     ah, 1

                ror     bl, 3
                ror     bh, 3
                bswap   ebx
                ror     bl, 3
                ror     bh, 3

                rol     cl, 3
                rol     ch, 3
                bswap   ecx
                rol     cl, 3
                rol     ch, 3

                rol     dl, 1
                rol     dh, 1
                bswap   edx
                rol     dl, 1
                rol     dh, 1

                or      eax, edx
                or      eax, ecx
                or      eax, ebx
                                        ; eax = the output dword
    


Well, one use of bit reversal, whether it is 8-bit, 16-bit or 32-bit,
is in encryption/decryption algorithms, which acts as a simple, reversible
byte/word/dword substitution mechanism.

Thank Madis731 & MCD for the input.

YONG
Post 30 Mar 2005, 12:33
View user's profile Send private message Visit poster's website Reply with quote
Matrix



Joined: 04 Sep 2004
Posts: 1171
Location: Overflow
Matrix
hello,
YONG
whould you test this?
Code:
bitswap32:
mov cx,32
.l666:
shr eax,1
rcl ebx,1
loopw .l666
mov eax,ebx
ret
    


i'm not sure in that bswap thing is faster ...
Post 30 Mar 2005, 19:02
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Well, this is like a beauty contest and the contestants are:

Loop1: Matrix's idea / idea from the win32asm forum

Loop2: Madis731's idea

Loop3: My stupid idea

Loop4: Table-lookup approach
- Within this approach, there are many legs:
- with xlatb
- without xlatb
- align 256
- no alignment
- Let's use the simplest way: without xlatb, no alignment

Assumptions:
- the input dword: eax
- free to use ebx, ecx & edx, i.e., no need to push/pop
- the output dword: one of eax, ebx, ecx & edx

Code:
;
; 29 bytes
;
                mov     di, buffer
        Loop1:
                mov     eax, dword [di]

                mov     cx, 32
        innerLp:
                shr     eax, 1
                rcl     ebx, 1
                loop    innerLp

                mov     dword [di], ebx

                add     di, 4
                cmp     di, buffer+bufferSize
                jb      Loop1

bufferSize      =       64
buffer          rb      bufferSize

;
; 110 bytes
; - Size can be reduced by making routine calls
;

                mov     di, buffer
        Loop2:
                mov     eax, dword [di]

                mov     dx, ax
                and     ax, 0aaaah ; ax = ?0?0?0?0?0?0?0?0
                and     dx, 05555h ; dx = 0?0?0?0?0?0?0?0?
                ror     ax, 2
                or      ax, dx
                mov     dx, ax
                and     ax, 06666h ; ax = 0??00??00??00??0
                and     dx, 09999h ; dx = ?00??00??00??00?
                ror     ax, 4
                or      ax, dx
                mov     dx, ax
                and     ax, 01e1eh ; ax = 000????0000????0
                and     dx, 0e1e1h ; dx = ???0000????0000?
                ror     ax, 8
                or      ax, dx
                ror     ax, 1

                ror     eax, 16

                mov     dx, ax
                and     ax, 0aaaah ; ax = ?0?0?0?0?0?0?0?0
                and     dx, 05555h ; dx = 0?0?0?0?0?0?0?0?
                ror     ax, 2
                or      ax, dx
                mov     dx, ax
                and     ax, 06666h ; ax = 0??00??00??00??0
                and     dx, 09999h ; dx = ?00??00??00??00?
                ror     ax, 4
                or      ax, dx
                mov     dx, ax
                and     ax, 01e1eh ; ax = 000????0000????0
                and     dx, 0e1e1h ; dx = ???0000????0000?
                ror     ax, 8
                or      ax, dx
                ror     ax, 1

;               ror     eax, 16    ; <= bugfix by YONG

                mov     dword [di], eax

                add     di, 4
                cmp     di, buffer+bufferSize
                jb      Loop2

bufferSize      =       64
buffer          rb      bufferSize

;
; 115 bytes
;
                mov     di, buffer
        Loop3:
                mov     eax, dword [di]
                                        ; eax = the input dword
                mov     ebx, eax
                mov     ecx, eax
                mov     edx, eax
                                        ; eax = ebx = ecx = edx
                                        ; = the input dword
                and     eax, 11111111h
                and     ebx, 22222222h
                and     ecx, 44444444h
                and     edx, 88888888h

                ror     al, 1
                ror     ah, 1
                bswap   eax
                ror     al, 1
                ror     ah, 1

                ror     bl, 3
                ror     bh, 3
                bswap   ebx
                ror     bl, 3
                ror     bh, 3

                rol     cl, 3
                rol     ch, 3
                bswap   ecx
                rol     cl, 3
                rol     ch, 3

                rol     dl, 1
                rol     dh, 1
                bswap   edx
                rol     dl, 1
                rol     dh, 1

                or      eax, edx
                or      eax, ecx
                or      eax, ebx
                                        ; eax = the output dword
                mov     dword [di], eax

                add     di, 4
                cmp     di, buffer+bufferSize
                jb      Loop3

bufferSize      =       64
buffer          rb      bufferSize

;
; 45 bytes (+ 256 bytes for the lookup table)
;
                xor     bh, bh
                mov     di, buffer
        Loop4:
                mov     eax, dword [di]

                mov     bl, al
                mov     al, [lookup+bx]
                mov     bl, ah
                mov     ah, [lookup+bx]

                bswap   eax

                mov     bl, ah
                mov     ah, [lookup+bx]
                mov     bl, al
                mov     al, [lookup+bx]

                mov     dword [di], eax

                add     di, 4
                cmp     di, buffer+bufferSize
                jb      Loop4

lookup          db      00h, ..., 0ffh  ; 256 bytes
bufferSize      =       64
buffer          rb      bufferSize

    


Average time taken to finish 9.68MB of data:

Loop1: 31 seconds =(30+32)/2 ; one encryption + one decryption
Loop2: 18.5 seconds =(18+19)/2
Loop3: 18.5 seconds =(18+19)/2
Loop4: 15.5 seconds =(15+16)/2

To summarize:

- the fastest: Loop4
- however, it is also the biggest (due to the lookup table)

- the shortest and the most flexible: Loop1
- however, it is rather slow

- the other ideas are somewhere in the middle of the spectrum


Thank Matrix for the tough challenge.

No more testing!!!

YONG
Post 31 Mar 2005, 09:45
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17450
Location: In your JS exploiting you and your system
revolution
But you didn't fully extend the 16 bit code to 32 bit.

Code:
bitswap_eax:
        mov     edx,eax
        and     eax,0aaaaaaaah
        and     edx,055555555h
        ror     eax,2
        or      eax,edx
        mov     edx,eax
        and     eax,066666666h
        and     edx,099999999h
        ror     eax,4
        or      eax,edx
        mov     edx,eax
        and     eax,01e1e1e1eh
        and     edx,0e1e1e1e1h
        ror     eax,8
        or      eax,edx
        mov     edx,eax
        and     eax,001fe01feh
        and     edx,0fe01fe01h
        ror     eax,16
        or      eax,edx
        ror     eax,1
        ret
    
Post 01 Apr 2005, 01:47
View user's profile Send private message Visit poster's website Reply with quote
YONG



Joined: 16 Mar 2005
Posts: 8000
Location: 22° 15' N | 114° 10' E
YONG
Loop2a: revolution's suggestion

Code:
;
; 113 bytes
;
                mov     di, buffer
        Loop2a:
                mov     eax, dword [di]

                mov     edx, eax
                and     eax, 0aaaaaaaah
                and     edx, 055555555h
                ror     eax, 2
                or      eax, edx
                mov     edx, eax
                and     eax, 066666666h
                and     edx, 099999999h
                ror     eax, 4
                or      eax, edx
                mov     edx, eax
                and     eax, 01e1e1e1eh
                and     edx, 0e1e1e1e1h
                ror     eax, 8
                or      eax, edx
                mov     edx, eax
                and     eax, 001fe01feh
                and     edx, 0fe01fe01h
                ror     eax, 16
                or      eax, edx
                ror     eax, 1


                mov     dword [di], eax

                add     di, 4
                cmp     di, buffer+bufferSize
                jb      Loop2a

bufferSize      =       64
buffer          rb      bufferSize
    


Average time taken - Loop2a: 17.5 seconds = ( 17 + 18 ) / 2
- faster (but also slightly bigger) than Loop2

Thank revolution for the input.

YONG
Post 01 Apr 2005, 09:07
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17450
Location: In your JS exploiting you and your system
revolution
Just for fun:

MMX version

Code:
align 8
c5555   dq      05555555555555555h
c3333   dq      03333333333333333h
c0f0f   dq      00f0f0f0f0f0f0f0fh
c00ff   dq      000ff00ff00ff00ffh

bitswap_mm0:
        movq    mm1,mm0
        psrlq   mm0,1
        pand    mm0,[c5555]
        pand    mm1,[c5555]
        psllq   mm1,1
        por     mm0,mm1
        movq    mm1,mm0
        psrlq   mm0,2
        pand    mm0,[c3333]
        pand    mm1,[c3333]
        psllq   mm1,2
        por     mm0,mm1
        movq    mm1,mm0
        psrlq   mm0,4
        pand    mm0,[c0f0f]
        pand    mm1,[c0f0f]
        psllq   mm1,4
        por     mm0,mm1
        movq    mm1,mm0
        psrlq   mm0,8
        pand    mm0,[c00ff]
        pand    mm1,[c00ff]
        psllq   mm1,8
        por     mm0,mm1
        pshufw  mm0,mm0,0*64+1*16+2*4+3
        ret
    


XMM version

Code:
align 16
d5555   dq      05555555555555555h,05555555555555555h
d3333   dq      03333333333333333h,03333333333333333h
d0f0f   dq      00f0f0f0f0f0f0f0fh,00f0f0f0f0f0f0f0fh
d00ff   dq      000ff00ff00ff00ffh,000ff00ff00ff00ffh

bitswap_xmm0:
        movdqa  xmm1,xmm0
        psrlq   xmm0,1
        pand    xmm0,dqword[d5555]
        pand    xmm1,dqword[d5555]
        psllq   xmm1,1
        por     xmm0,xmm1
        movdqa  xmm1,xmm0
        psrlq   xmm0,2
        pand    xmm0,dqword[d3333]
        pand    xmm1,dqword[d3333]
        psllq   xmm1,2
        por     xmm0,xmm1
        movdqa  xmm1,xmm0
        psrlq   xmm0,4
        pand    xmm0,dqword[d0f0f]
        pand    xmm1,dqword[d0f0f]
        psllq   xmm1,4
        por     xmm0,xmm1
        movdqa  xmm1,xmm0
        psrlq   xmm0,8
        pand    xmm0,dqword[d00ff]
        pand    xmm1,dqword[d00ff]
        psllq   xmm1,8
        por     xmm0,xmm1
        pshuflw xmm0,xmm0,0*64+1*16+2*4+3
        pshufhw xmm0,xmm0,0*64+1*16+2*4+3
        pshufd  xmm0,xmm0,1*64+0*16+3*4+2
        ret
    
Post 04 Apr 2005, 07:48
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
wicked! Twisted Evil
Post 04 Apr 2005, 12:26
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.