flat assembler
Message board for the users of flat assembler.

Index > Main > Optimization Challenge: 13-bit technology

Author
Thread Post new topic Reply to topic
DOS386



Joined: 08 Dec 2006
Posts: 1903
DOS386 08 Oct 2024, 17:21
Task:
* a) store 3 UINT13 values incoming in EAX ECX EDX into 5 consecutive memory locations beginning at address incoming in EBX
* b) retrieve those 3 UINT13 values from 5 consecutive memory locations beginning at address incoming in EBX into EAC ECX EDX

Code:
        and    eax, 8191          '' got 13 incoming bits here
        and    ecx, 8191          '' got 13 incoming bits here
        and    edx, 8191          '' got 13 incoming bits here
        mov    [ebx], al          '' 8 bits stored -> 8 burnt and 5 pending
        shr    eax, 5             '' 3 burnt bits and 5 pending bits
        mov    ah, cl             '' +8 -> 3 burnt bits and 13 pending bits
        shr    eax, 3             '' 13 pending bits
        mov    [ebx+1], al        '' 16 bits stored -> 8 burnt and 5 pending
        shr    eax, 5             '' 3 burnt bits and 5 pending bits
        mov    ah, ch             '' +5 -> 3 burnt bits and 10 pending bits
        shr    eax, 3             '' 10 pending bits
        mov    [ebx+2], al        '' 24 bits stored -> 8 burnt and 2 pending
        shr    eax, 2             '' 6 burnt bits and 2 pending bits
        mov    ah, dl             '' +8 -> 6 burnt bits and 10 pending bits
        shr    eax, 6             '' 10 pending bits
        mov    [ebx+3], al        '' 32 bits stored -> 8 burnt and 2 pending
        shr    eax, 2             '' 6 burnt bits and 2 pending bits
        mov    ah, dh             '' +5 -> 6 burnt bits and 7 pending bits
        shr    eax, 6             '' 7 pending bits
        mov    [ebx+4], al        '' 40 bits stored -> all done
        xor    eax, eax           '' nothing left
        ret
    

Code:
        mov    ah, [ebx+2]        '' can't pick into DL directly
        mov    dl, ah             '' 8 bits picked
        mov    dh, [ebx+3]        '' 16 bits picked
        rol    edx, 16
        mov    ah, [ebx]          '' can't pick into DL directly
        mov    dl, ah             '' 24 bits picked
        mov    dh, [ebx+1]        '' 32 bits picked
        mov    eax, edx
        and    eax, 8191          '' got 13 final bits here
        shr    edx, 13
        mov    ecx, edx
        and    ecx, 8191          '' got 13 final bits here
        shr    edx, 11            '' 6 useful bits plus 2 garbage bits low
        mov    dh, [ebx+4]        '' 40 bits picked
        shr    edx, 2             '' discard those 2 garbage bits
        and    edx, 8191          '' got 13 final bits here
        ret
    


The code works, it's just absurdly many steps for such a simple task. Anyone can come up with a more efficient approach?

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 08 Oct 2024, 17:21
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4046
Location: vpcmpistri
bitRAKE 08 Oct 2024, 18:34
Is this cheating (BMI2) ...
Code:
EncodeDecode dq 0x7FF_07FF_07FF

        pext rax, rax, [EncodeDecode] ; pack word bits
        mov [rdi], rax
        add rdi, 5 ; 40 bits

        pdep rax, rax, [EncodeDecode] ; unpack bits to words
        mov [rdi], rax
        add rdi, 6 ; three words    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 08 Oct 2024, 18:34
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 977
Location: Russia
macomics 08 Oct 2024, 23:00
bitRAKE wrote:
Code:
EncodeDecode dq 0x7FF_07FF_07FF    
There are only 11 bits here
Code:
EncodeDecode dq 0x1FFF_1FFF_1FFF    


If you restrict yourself to just basic (<Pentium) 32-bit commands
Code:
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        and     ecx, 01111111111111b    ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        shl     ecx, 13                 ; ecx = 000000vvvvvvvvvv`vvv00000`00000000b
        or      eax, ecx                ; eax = 000000vvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     eax, 6                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uu000000b
        shl     edx, 2                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        shr     dl, 2                   ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`00wwwwwwb
        or      al, dl                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuwwwwwwb
        ror     eax, 6                  ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        and     dh, 01111111b           ; edx = xxxxxxxxxxxxxxxx`0wwwwwww`00wwwwwwb
        mov     [ebx], eax              ; store 31-0 bits
        mov     [ebx+4], dh             ; store 39-32 bits    

Code:
        mov     eax, [ebx]              ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        mov     dh, [ebx+4]             ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`xxxxxxxxb
        mov     ecx, eax                ; ecx = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        rol     eax, 6                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuwwwwwwb
        mov     dl, al                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`uuwwwwwwb
        mov     eax, ecx                ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     dl, 2                   ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        shr     ecx, 13                 ; ecx = 0000000000000www`wwwvvvvv`vvvvvvvvb
        shr     edx, 2                  ; edx = 00xxxxxxxxxxxxxx`xxxwwwww`wwwwwwwwb
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        and     ecx, 01111111111111b    ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        and     edx, 01111111111111b    ; edx = 0000000000000000`000wwwww`wwwwwwwwb
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined    


Using shld/shrd
Code:
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined
        ror     eax, 13                 ; eax = uuuuuuuuuuuuuxxx`xxxxxxxx`xxxxxxxxb
        shrd    eax, ecx, 13            ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuxxxxxxb
        shrd    eax, edx, 6             ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     edx, 2                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        and     dh, 01111111b           ; edx = xxxxxxxxxxxxxxxx`0wwwwwww`wwwwww00b
        mov     [ebx], eax              ; store 31-0 bits
        mov     [ebx+4], dh             ; store 39-32 bits    

Code:
        mov     dl, [ebx+4]             ; edx = xxxxxxxxxxxxxxxx`xxxxxxxx`0wwwwwwwb
        mov     eax, [ebx]              ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shld    edx, eax, 6             ; edx = xxxxxxxxxxxxxxxx`xx0wwwww`wwwwwwwwb
        shrd    ecx, eax, 26            ; ecx = vvvvvvvvvvvvvuuu`uuuuuuuu`uuxxxxxxb
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        shr     ecx, 19                 ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        and     edx, 01111111111111b    ; edx = 0000000000000000`000wwwww`wwwwwwwwb
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined    


Last edited by macomics on 09 Oct 2024, 00:47; edited 9 times in total
Post 08 Oct 2024, 23:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20355
Location: In your JS exploiting you and your system
revolution 08 Oct 2024, 23:55
What are we supposed to be optimising for?
  • Speed?
  • Efficiency?
  • Size?
  • Beauty?
  • Ease of coding?
  • Ease of understanding?
  • Uses fewest registers?
  • Some other criterion?
  • Some mixed combination of the above?
Post 08 Oct 2024, 23:55
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 977
Location: Russia
macomics 09 Oct 2024, 00:33
DOS386 wrote:
Code:
'' can't pick into DL directly    
Why? Any additional restrictions
Post 09 Oct 2024, 00:33
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4347
Location: Now
edfed 09 Oct 2024, 16:36
macomics: good wih shld shrd Smile

can't pick into dl because maybe a 64bits compatibility for future or any bug???? but if 64bits, bitrake's is enough
Post 09 Oct 2024, 16:36
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 977
Location: Russia
macomics 09 Oct 2024, 17:36
edfed wrote:
can't pick into dl because maybe a 64bits compatibility for future or any bug???? but if 64bits, bitrake's is enough
Under x64, high part is usually unavailable: ah, bh, ch, dh
Post 09 Oct 2024, 17:36
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4347
Location: Now
edfed 11 Oct 2024, 18:37
macomics wrote:
edfed wrote:
can't pick into dl because maybe a 64bits compatibility for future or any bug???? but if 64bits, bitrake's is enough
Under x64, high part is usually unavailable: ah, bh, ch, dh

oops, true.
then, why?
Post 11 Oct 2024, 18:37
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 977
Location: Russia
macomics 11 Oct 2024, 20:09
edfed wrote:
oops, true.
then, why?
mod-reg-r/m byte + rex.bit = 16 options, while registers (AL, AH, BL, BH, CL, CH, DL, DH, SPL, BPL, SIL, DIL, R8L, R9L, R10L, R11L, R12L, R13L, R14L, R15L) 20 options

So that at least one part of 1 byte was available for each register, doubling was thrown out for the first 4 registers.
Intel SDM wrote:
2.2.1.2 More on REX Prefix Fields
...
In the IA-32 architecture, byte registers (AH, AL, BH, BL, CH, CL, DH, and DL) are encoded in the ModR/M byte’s
reg field, the r/m field or the opcode reg field as registers 0 through 7. REX prefixes provide an additional
addressing capability for byte-registers that makes the least-significant byte of GPRs available for byte operations.
Certain combinations of the fields of the ModR/M byte and the SIB byte have special meaning for register encod-
ings. For some combinations, fields expanded by the REX prefix are not decoded. Table 2-5 describes how each
case behaves.

...

3.4.1.1 General-Purpose Registers in 64-Bit Mode
...
Table 3-2. Addressable General Purpose Registers
Code:
; Register Type          Without REX                              With REX
  Byte Registers         AL, BL, CL, DL, AH, BH, CH, DH           AL, BL, CL, DL, DIL, SIL, BPL, SPL, R8B - R15B
  Word Registers         AX, BX, CX, DX, DI, SI, BP, SP           AX, BX, CX, DX, DI, SI, BP, SP, R8W - R15W
  Doubleword Registers   EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP   EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D
  Quadword Registers     N.A.                                     RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8 - R15    
Post 11 Oct 2024, 20:09
View user's profile Send private message Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1903
DOS386 13 Oct 2024, 18:28
macomics wrote:
DOS386 wrote:
Code:
'' can't pick into DL directly    
Why? Any additional restrictions


It didn't work when I tried. I can no longer reproduce the problem now. Strange. Sad

macomics wrote:
If you restrict yourself to just basic (<Pentium) 32-bit commands
Code:
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        and     ecx, 01111111111111b    ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        shl     ecx, 13                 ; ecx = 000000vvvvvvvvvv`vvv00000`00000000b
        or      eax, ecx                ; eax = 000000vvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     eax, 6                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uu000000b
        shl     edx, 2                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        shr     dl, 2                   ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`00wwwwwwb
        or      al, dl                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuwwwwwwb
        ror     eax, 6                  ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        and     dh, 01111111b           ; edx = xxxxxxxxxxxxxxxx`0wwwwwww`00wwwwwwb
        mov     [ebx], eax              ; store 31-0 bits
        mov     [ebx+4], dh             ; store 39-32 bits    

Code:
        mov     eax, [ebx]              ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        mov     dh, [ebx+4]             ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`xxxxxxxxb
        mov     ecx, eax                ; ecx = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        rol     eax, 6                  ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuwwwwwwb
        mov     dl, al                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`uuwwwwwwb
        mov     eax, ecx                ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     dl, 2                   ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        shr     ecx, 13                 ; ecx = 0000000000000www`wwwvvvvv`vvvvvvvvb
        shr     edx, 2                  ; edx = 00xxxxxxxxxxxxxx`xxxwwwww`wwwwwwwwb
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        and     ecx, 01111111111111b    ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        and     edx, 01111111111111b    ; edx = 0000000000000000`000wwwww`wwwwwwwwb
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined    


Using shld/shrd
Code:
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined
        ror     eax, 13                 ; eax = uuuuuuuuuuuuuxxx`xxxxxxxx`xxxxxxxxb
        shrd    eax, ecx, 13            ; eax = vvvvvvvvvvvvvuuu`uuuuuuuu`uuxxxxxxb
        shrd    eax, edx, 6             ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shl     edx, 2                  ; edx = xxxxxxxxxxxxxxxx`xwwwwwww`wwwwww00b
        and     dh, 01111111b           ; edx = xxxxxxxxxxxxxxxx`0wwwwwww`wwwwww00b
        mov     [ebx], eax              ; store 31-0 bits
        mov     [ebx+4], dh             ; store 39-32 bits    

Code:
        mov     dl, [ebx+4]             ; edx = xxxxxxxxxxxxxxxx`xxxxxxxx`0wwwwwwwb
        mov     eax, [ebx]              ; eax = wwwwwwvvvvvvvvvv`vvvuuuuu`uuuuuuuub
        shld    edx, eax, 6             ; edx = xxxxxxxxxxxxxxxx`xx0wwwww`wwwwwwwwb
        shrd    ecx, eax, 26            ; ecx = vvvvvvvvvvvvvuuu`uuuuuuuu`uuxxxxxxb
        and     eax, 01111111111111b    ; eax = 0000000000000000`000uuuuu`uuuuuuuub
        shr     ecx, 19                 ; ecx = 0000000000000000`000vvvvv`vvvvvvvvb
        and     edx, 01111111111111b    ; edx = 0000000000000000`000wwwww`wwwwwwwwb
; eax=u*13, ecx=v*13, edx=w*13 - UINT13; x - undefined; 0 or 1 - defined    


This looks very interesting, thanks. I'll test. Smile

revolution wrote:
What are we supposed to be optimising for?
  • Some mixed combination of the above?


YES, a sane mixture of above plus compatibility.

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 13 Oct 2024, 18:28
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.