flat assembler
Message board for the users of flat assembler.

Index > Main > Replace al to al,ah,bl,bh,cl,ch,dl,dh million times

Author
Thread Post new topic Reply to topic
Fastestcodes



Joined: 13 Jun 2022
Posts: 75
Fastestcodes
code:
mov al,
mov al,
mov al,
mov al,
mov al,
mov al,
mov al,
mov al,

That I want:
mov al,
mov ah,
mov bl,
mov bh,
mov cl,
mov ch,
mov dl,
mov dh,

What is the best solution?
Post 17 Aug 2022, 10:02
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8028
Location: Kraków, Poland
Tomasz Grysztar
To re-use an existing solution, you could take my old fasmg as a preprocessor example, and use it with a preprocessing macros looking like this:
Code:
iterate reg, al,ah,bl,bh,cl,ch,dl,dh
        TARGET#% equ reg
end iterate

COUNTER = 0

namespace MACROS

        macro mov? args&
                match =al =, src, args
                        repeat 1, i:COUNTER+1
                                match dest, TARGET#i
                                        emit mov dest, src
                                end match
                        end repeat
                        COUNTER = (COUNTER + 1) and 111b
                else
                        emit mov args
                end match
        end macro

end namespace    
The script then takes the source like:
Code:
mov al,1
mov al,2
mov al,3
mov al,4
mov al,5
mov al,6
mov al,7
mov al,8
mov al,9
mov al,10
mov al,11
mov al,12
mov al,13
mov al,14
mov al,15
mov al,16    
and converts it (purely as text) into:
Code:
mov al, 1
mov ah, 2
mov bl, 3
mov bh, 4
mov cl, 5
mov ch, 6
mov dl, 7
mov dh, 8
mov al, 9
mov ah, 10
mov bl, 11
mov bh, 12
mov cl, 13
mov ch, 14
mov dl, 15
mov dh, 16    
That example is quite old. With new facilities like CALM and the two-argument variant of INCLUDE (that had been added to fasmg since then) a much cleaner framework could be made. But the old one still works and can be used with not much additional effort.
Post 17 Aug 2022, 10:41
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8028
Location: Kraków, Poland
Tomasz Grysztar
And since you mentioned a million replacements, I tested it with a simple seed that makes me a million MOVs:
Code:
repeat 1000000
        db 'mov al,0',10
end repeat    
Here's how it went on my laptop:
Code:
C:\asm\fasmglab\preprocess>fasmg seed.asm source.asm
flat assembler  version g.jmhx
1 pass, 0.5 seconds, 9000000 bytes.

C:\asm\fasmglab\preprocess>fasmg preprocess.asm -isource='source.asm' final.asm
flat assembler  version g.jmhx
1 pass, 12.7 seconds, 11000000 bytes.    
Of course, unless your source text is actually much more complex, you could do it all with nothing more than regular expressions.
Post 17 Aug 2022, 10:48
View user's profile Send private message Visit poster's website Reply with quote
Fastestcodes



Joined: 13 Jun 2022
Posts: 75
Fastestcodes
https://board.flatassembler.net/topic.php?t=22374
"La00:
pop eax
and al,0feh
and ah,0feh
and bl,0feh
and bh,0feh
and cl,0feh
and ch,0feh
and dl,0feh
and dh,0feh; if al=00h, 8Byte least significant bit will be 0,0,0,0,0,0,0,0
jmp Lawm

La01:
....;if al=01h, 8Byte least significant bit will be 0,0,0,0,0,0,0,1

Laff:
...; if al=0ffh, 8Byte least significant bit will be 1,1,1,1,1,1,1,1

Lawm: ;write to mem
mov [edi],al
mov [edi+1],ah
...
mov [edi+7],dh"

If I replace the 1s or 0s "or al,01h" and "and al,0feh" all regs will be "al".
Post 17 Aug 2022, 12:24
View user's profile Send private message Reply with quote
Fastestcodes



Joined: 13 Jun 2022
Posts: 75
Fastestcodes
Code:
:00000000j
:00000001j
:00000010j
:00000011j
:00000100j
:00000101j
:00000110j
:00000111j
:00001000j
:00001001j
:00001010j
:00001011j
:00001100j
:00001101j
:00001110j
:00001111j
z
:00010000j
:00010001j
:00010010j
:00010011j
:00010100j
:00010101j
:00010110j
:00010111j
:00011000j
:00011001j
:00011010j
:00011011j
:00011100j
:00011101j
:00011110j
:00011111j
z
:00100000j
:00100001j
:00100010j
:00100011j
:00100100j
:00100101j
:00100110j
:00100111j
:00101000j
:00101001j
:00101010j
:00101011j
:00101100j
:00101101j
:00101110j
:00101111j
z
:00110000j
:00110001j
:00110010j
:00110011j
:00110100j
:00110101j
:00110110j
:00110111j
:00111000j
:00111001j
:00111010j
:00111011j
:00111100j
:00111101j
:00111110j
:00111111j
z
:01000000j
:01000001j
:01000010j
:01000011j
:01000100j
:01000101j
:01000110j
:01000111j
:01001000j
:01001001j
:01001010j
:01001011j
:01001100j
:01001101j
:01001110j
:01001111j
z
:01010000j
:01010001j
:01010010j
:01010011j
:01010100j
:01010101j
:01010110j
:01010111j
:01011000j
:01011001j
:01011010j
:01011011j
:01011100j
:01011101j
:01011110j
:01011111j
z
:01100000j
:01100001j
:01100010j
:01100011j
:01100100j
:01100101j
:01100110j
:01100111j
:01101000j
:01101001j
:01101010j
:01101011j
:01101100j
:01101101j
:01101110j
:01101111j
z
:01110000j
:01110001j
:01110010j
:01110011j
:01110100j
:01110101j
:01110110j
:01110111j
:01111000j
:01111001j
:01111010j
:01111011j
:01111100j
:01111101j
:01111110j
:01111111j
z
:10000000j
:10000001j
:10000010j
:10000011j
:10000100j
:10000101j
:10000110j
:10000111j
:10001000j
:10001001j
:10001010j
:10001011j
:10001100j
:10001101j
:10001110j
:10001111j
z
:10010000j
:10010001j
:10010010j
:10010011j
:10010100j
:10010101j
:10010110j
:10010111j
:10011000j
:10011001j
:10011010j
:10011011j
:10011100j
:10011101j
:10011110j
:10011111j
z
:10100000j
:10100001j
:10100010j
:10100011j
:10100100j
:10100101j
:10100110j
:10100111j
:10101000j
:10101001j
:10101010j
:10101011j
:10101100j
:10101101j
:10101110j
:10101111j
z
:10110000j
:10110001j
:10110010j
:10110011j
:10110100j
:10110101j
:10110110j
:10110111j
:10111000j
:10111001j
:10111010j
:10111011j
:10111100j
:10111101j
:10111110j
:10111111j
z
:11000000j
:11000001j
:11000010j
:11000011j
:11000100j
:11000101j
:11000110j
:11000111j
:11001000j
:11001001j
:11001010j
:11001011j
:11001100j
:11001101j
:11001110j
:11001111j
z
:11010000j
:11010001j
:11010010j
:11010011j
:11010100j
:11010101j
:11010110j
:11010111j
:11011000j
:11011001j
:11011010j
:11011011j
:11011100j
:11011101j
:11011110j
:11011111j
z
:11100000j
:11100001j
:11100010j
:11100011j
:11100100j
:11100101j
:11100110j
:11100111j
:11101000j
:11101001j
:11101010j
:11101011j
:11101100j
:11101101j
:11101110j
:11101111j
z
:11110000j
:11110001j
:11110010j
:11110011j
:11110100j
:11110101j
:11110110j
:11110111j
:11111000j
:11111001j
:11111010j
:11111011j
:11111100j
:11111101j
:11111110j
:
11111111
j
    

Replace 0 with "and al,0feh0A", replace 1 with "or al,01h0A", replace j with "jmp Lawm0A"
Post 17 Aug 2022, 13:03
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3489
Location: vpcmipstrm
bitRAKE
With fasmg we could combine something like macomics' suggestion with the dispatcher:
Code:
L_start:
        and dword [edi],0xFEFEFEFE
        and dword [edi+4],0xFEFEFEFE


... (dispatch) ...


        MAGIC   := 0x8040201008040201
        MASK    := 0x8080808080808080
        repeat 256, v:0
                A = ((MAGIC *v) and MASK) shr 7
                A = A bswap 8
                L#v:
                        if A and 0xFFFFFFFF
                                or dword [edi], A and 0xFFFFFFFF
                        end if
                        if A shr 32
                                or dword [edi+4], A shr 32
                        end if
                        jmp L_end
        end repeat

L_end:
        add edi,8    
... all bits are assumed zero, and then set bits as needed. Could move the bit clearing into the branch code as well - clearing only as needed.

_________________
¯\(°_o)/¯ unlicense.org
Post 17 Aug 2022, 13:25
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3489
Location: vpcmipstrm
bitRAKE
Code:
; RCX bytes of RSI into low bits of bytes in RDI
steganography:
        lea rbp,[.end]
        lea rbx,[.L0]
.start:
        and dword [rdi],0xFEFEFEFE
        and dword [rdi+4],0xFEFEFEFE

        .RANGE := 15
        movzx eax,byte [rsi]
        add rsi,1
        imul eax,.RANGE
        lea rax,[rax+rbx]
        jmp rax

        MAGIC   := 0x8040201008040201
        MASK    := 0x8080808080808080
        repeat 256, v:0
                A = ((MAGIC * v) and MASK) shr 7
                A = A bswap 8
                .L#v:
                        if A and 0xFFFFFFFF
                                or dword [rdi], A and 0xFFFFFFFF
                        end if
                        if A shr 32
                                or dword [rdi+4], A shr 32
                        end if
                        jmp rbp ; .end

                if $ - .L#v > .RANGE
                        err 'branch larger than desired range'
                end if
                while $ - .L#v <> .RANGE
                        db 0x90
                end while
        end repeat

.end:   add rdi,8
        dec rcx
        jnz .start
        retn

repeat 1, len:$ - steganography
        display 9,`len,' bytes for steganography'
end repeat    
... is maybe what you are aiming for? Did some other stuff to reduce the code size to under 4k. SIMD methods will be faster.
Post 17 Aug 2022, 14:12
View user's profile Send private message Visit poster's website Reply with quote
Overclick



Joined: 11 Jul 2020
Posts: 577
Location: Ukraine
Overclick
movdqa xmm0,xword[yourdataplace]
pand xmm0,xword[yourmask]
...
movdqa xword[yourdataplace],xmm0

You don't need to separate the data stream by bytes, just use SSE
Post 17 Aug 2022, 19:56
View user's profile Send private message Visit poster's website Reply with quote
Fastestcodes



Joined: 13 Jun 2022
Posts: 75
Fastestcodes
Overclick wrote:
movdqa xmm0,xword[yourdataplace]
pand xmm0,xword[yourmask]
...
movdqa xword[yourdataplace],xmm0

You don't need to separate the data stream by bytes, just use SSE

Pand on 16Byte for lsb=0 0xfeff...
Por on 16Byte for lsb=1 0x0001...
Post 22 Aug 2022, 09:23
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 3489
Location: vpcmipstrm
bitRAKE
I haven't done any testing - just brainstorming with my first coffee ...
Code:
m256_bbits      mm256 db 0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80
m256_01         mm256 db 1

; 3 temp reg, 2 const regs = unroll of four

        vmovntdqa ymm14,[m256_bbits]
        vmovntdqa ymm15,[m256_01]
        align 16,0x90
@@:     iterate <_0,    _1,     _B>,\
                ymm0,   ymm1,   ymm2,\
                ymm3,   ymm4,   ymm5,\
                ymm6,   ymm7,   ymm8,\
                ymm9,   ymm10,  ymm11

                vpbroadcastb    _B, [rsi + % - 1]
                vmovdqa         _1, [rdi + 32*% - 32]
                vpand           _B, _B, ymm14
                vpandn          _0, ymm15, _1
                vpor            _1, ymm15, _1
                vpcmpeqb        _B, _B, ymm14
                vpblendvb       _0, _0, _1, _B
                vmovntdq        [rdi + 32*% - 32], _0
                .UNROLL = %%
        end iterate
        add rsi,.UNROLL
        add rdi,32*.UNROLL
        cmp rsi,rcx
        jnz @B

struc mm256 values&
        label .:32
        values
        granularity = bsr ($ - .)
        assert 6 > granularity
        repeat (1 shl (5 - granularity)) - 1
                values
        end repeat
        assert ($ - .) = sizeof .
end struc    

_________________
¯\(°_o)/¯ unlicense.org
Post 24 Aug 2022, 13:08
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.