flat assembler
Message board for the users of flat assembler.

Index > Windows > Turn procedure into Macro

Author
Thread Post new topic Reply to topic
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 12 Oct 2018, 18:01
Can you help me convert this simple 32-bit memory copy routine into a macro. The macro should be able to detect the size of the copy and adjust the code based on the size. If the size is 1, it should only use mov al,[Src] and mov [Dst],al, if the size is 4, it should use the same, but in eax instead of al. Anything else and it should use the code below.

I tried using if CpySize eq 1 and it works, but a problem starts when I pass sizeof.structure for example, then it won't detect the actual size anymore.

I'm not sure if the match macroinstruction should be used instead, but then how should it be used when I need several types of matches for each size.

I want special code to be generated when CpySize is 1,2,3,4,5,6,7,8,9,10,11 and 12, anything else, and it should use the code below.

Code:
proc MemCopy uses esi edi,CpySize,Src,Dst
  cld
  mov ecx,[CpySize]
  mov esi,[Src]
  mov edi,[Dst]
  shr ecx,2
  rep movsd
  mov ecx,[CpySize]
  and ecx,3
  rep movsb
  ret
endp    

If CpySize is 1 it should generate this code:
Code:
mov al,[Src]
mov [Dst],al    

If CpySize is 2 it should generate this code:
Code:
mov ax,[Src]
mov [Dst],ax    

If CpySize is 3 it should generate this code:
Code:
mov ax,[Src]
mov cl,[Src+2]
mov [Dst],ax
mov [Dst+2],cl    

If CpySize is 4 it should generate this code:
Code:
mov eax,[Src]
mov [Dst],eax    

If CpySize is 5 it should generate this code:
Code:
mov eax,[Src]
mov cl,[Src+4]
mov [Dst],eax
mov [Dst+4],cl    

If CpySize is 6 it should generate this code:
Code:
mov eax,[Src]
mov cx,[Src+4]
mov [Dst],eax
mov [Dst+4],cx    

If CpySize is 7 it should generate this code:
Code:
mov eax,[Src]
mov cx,[Src+4]
mov dl,[Src+6]
mov [Dst],eax
mov [Dst+4],cx
mov [Dst+6],dl    

If CpySize is 8 it should generate this code:
Code:
mov eax,[Src]
mov ecx,[Src+4]
mov [Dst],eax
mov [Dst+4],ecx    

If CpySize is 9 it should generate this code:
Code:
mov eax,[Src]
mov ecx,[Src+4]
mov dl,[Src+8]
mov [Dst],eax
mov [Dst+4],ecx
mov [Dst+8],dl    

If CpySize is 10 it should generate this code:
Code:
mov eax,[Src]
mov ecx,[Src+4]
mov dx,[Src+8]
mov [Dst],eax
mov [Dst+4],ecx
mov [Dst+8],dx    

If CpySize is 11 it should generate this code:
Code:
mov eax,[Src]
mov ecx,[Src+4]
mov dx,[Src+8]
mov [Dst],eax
mov [Dst+4],ecx
mov [Dst+8],dx
mov al,[Src+10]
mov [Dst+10],al    

and if CpySize is 12 it should generate this code:
Code:
mov eax,[Src]
mov ecx,[Src+4]
mov edx,[Src+8]
mov [Dst],eax
mov [Dst+4],ecx
mov [Dst+8],edx    

Anything else, and it should use the code at the top. The macro should be able to detect when CpySize is passed as sizeof.structure. If CpySize is zero, nothing must be generated.
Post 12 Oct 2018, 18:01
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 12 Oct 2018, 20:53
I managed to put together something. I don't know if this is "sustainable", it takes a lot of if - then statements to get this perfect. Is there an easier way to do it?

Code:
;##############################################################################################################

; Copy a block of memory from one address to another

; Entry:
;       CpySize = Number of bytes to copy (Imm32/Label or ebx)
;       Src = Source address (Imm32/Label or esi)
;       Dst = Destination address (Imm32/Label or edi)
; Used Regs:
;       ebx esi edi (If CpySize > 12, caller must save these first)
; Return:
;       None

macro _m_MemCopy CpySize*,Src*,Dst* {
  if ~ CpySize eqtype eax
    if CpySize > 0
          if CpySize = 1
            mov al,byte [Src]
            mov byte [Dst],al
      else if CpySize = 2
            mov ax,word [Src]
            mov word [Dst],ax
      else if CpySize = 3
            mov ax,word [Src]
            mov cl,byte [Src+2]
            mov word [Dst],ax
            mov byte [Dst+2],cl
      else if CpySize = 4
            mov eax,dword [Src]
            mov dword [Dst],eax
      else if CpySize = 5
            mov eax,dword [Src]
            mov cl,byte [Src+4]
            mov dword [Dst],eax
            mov byte [Dst+4],cl
      else if CpySize = 6
            mov eax,dword [Src]
            mov cx,word [Src+4]
            mov dword [Dst],eax
            mov word [Dst+4],cx
      else if CpySize = 7
            mov eax,dword [Src]
            mov cx,word [Src+4]
            mov dl,byte [Src+6]
            mov dword [Dst],eax
            mov word [Dst+4],cx
            mov byte [Dst+6],dl
      else if CpySize = 8
            mov eax,dword [Src]
            mov ecx,dword [Src+4]
            mov dword [Dst],eax
            mov dword [Dst+4],ecx
      else if CpySize = 9
            mov eax,dword [Src]
            mov ecx,dword [Src+4]
            mov dl,byte [Src+8]
            mov dword [Dst],eax
            mov dword [Dst+4],ecx
            mov byte [Dst+8],dl
      else if CpySize = 10
            mov eax,dword [Src]
            mov ecx,dword [Src+4]
            mov dx,word [Src+8]
            mov dword [Dst],eax
            mov dword [Dst+4],ecx
            mov word [Dst+8],dx
      else if CpySize = 11
            mov eax,dword [Src]
            mov ecx,dword [Src+4]
            mov dx,word [Src+8]
            mov dword [Dst],eax
            mov dword [Dst+4],ecx
            mov word [Dst+8],dx
            mov al,byte [Src+10]
            mov byte [Dst+10],al
      else if CpySize = 12
            mov eax,dword [Src]
            mov ecx,dword [Src+4]
            mov edx,dword [Src+8]
            mov dword [Dst],eax
            mov dword [Dst+4],ecx
            mov dword [Dst+8],edx
      else
            cld
            if ~ Src eqtype eax
              mov esi,Src
            end if
            if ~ Dst eqtype eax
              mov edi,Dst
            end if
            if CpySize mod 4 = 0
              mov ecx,CpySize shr 2
              rep movsd
            else
              mov ecx,CpySize shr 2
              rep movsd
              mov ecx,CpySize and 3
              rep movsb
            end if
      end if
    end if
  else
    cld
        mov ecx,CpySize
        if ~ Src eqtype eax
          mov esi,Src
        end if
        if ~ Dst eqtype eax
          mov edi,Dst
    end if
        shr ecx,2
    rep movsd
    mov ecx,CpySize
    and ecx,3
    rep movsb
  end if
}

;##############################################################################################################    
Post 12 Oct 2018, 20:53
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2564
Furs 13 Oct 2018, 12:36
Looks good to me. You could of course "share" some of the code since some cases are similar, but if you find it easier this way just keep it.

Your MemCopy function is not very efficient for the last rep movsb, since those instructions have some overhead and are useful when copying large blocks.

If you target newer CPUs with fast "rep movs" (I assume you do since you use it in the first place), then just use rep movsb for the entire thing. The CPU is smart enough to do the large copy in the most optimal way. (note that it should be done for larger blocks only). The beauties of proper CISC.

If you target older CPUs also, then well "rep movs" is not a fast way to copy memory, unfortunately.
Post 13 Oct 2018, 12:36
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 13 Oct 2018, 15:50
I haven't tested rep movsb alone, I will test it. But on 64-bit and a fairly new computer rep movs should probably be avoided all together. In my experience, rep stos instructions are very fast (superior) to regular instructions but rep mov instructions are very slow and should be avoided. I tried creating a 64-bit memcopy routine some years back using rep mov, and it was not good compared to regular instructions. Here is the 64-bit memcopy using regular instructions (Both macro and procedure)

Code:
;##############################################################################################################

; Copy a block of memory from one address to another

; Entry:
;       CpySize = Number of bytes to copy (Imm64/Label or rcx)
;       Src = Source address (Imm64/Label or rdx)
;       Dst = Destination address (Imm64/Label or r8 )
;       bRbx = Set to TRUE to allow the use of rbx register or FALSE if not
;       bRsi = Set to TRUE to allow the use of rsi register or FALSE if not
;       bRdi = Set to TRUE to allow the use of rdi register or FALSE if not
;       bR12 = Set to TRUE to allow the use of r12 register or FALSE if not
;       bR13 = Set to TRUE to allow the use of r13 register or FALSE if not
;       bR14 = Set to TRUE to allow the use of r14 register or FALSE if not
;       bR15 = Set to TRUE to allow the use of r15 register or FALSE if not
; Used Regs:
;       rbx rsi rdi and r12-r15 (If caller set them to be used in the arguments)
; Return:
;       None

macro _m_MemCopy CpySize*,Src*,Dst*,bRbx*,bRsi*,bRdi*,bR12*,bR13*,bR14*,bR15* {
  local loop32,check8,check4,check1,loop1,bye
  
  maxsize = 39+(bRbx*8)+(bRsi*8)+(bRdi*8)+(bR12*8)+(bR13*8)+(bR14*8)+(bR15*8)
  if Src eqtype rax
    maxsize = maxsize - 8
  end if
  if Dst eqtype rax
    maxsize = maxsize - 8
  end if
  
  if ~ CpySize eqtype rax & CpySize > 0 & CpySize <= maxsize
        qcount = 0
        current_offset = 0
        if CpySize/8 > qcount
          qcount = qcount + 1
          mov rax,qword [Src]
          current_offset = current_offset + 8
        end if
        if bRbx = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov rbx,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bRsi = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov rsi,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bRdi = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov rdi,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bR12 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov r12,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bR13 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov r13,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bR14 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov r14,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if bR15 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov r15,qword [Src+current_offset]
            current_offset = current_offset + 8
          end if
        end if
        if CpySize/8 > qcount
          qcount = qcount + 1
      mov rcx,qword [Src+current_offset]
          current_offset = current_offset + 8
        end if
        if ~ Src eqtype rax
          if CpySize/8 > qcount
                qcount = qcount + 1
                mov rdx,qword [Src+current_offset]
                current_offset = current_offset + 8
          end if
        end if
        if ~ Dst eqtype rax
          if CpySize/8 > qcount
                qcount = qcount + 1
                mov r8,qword [Src+current_offset]
                current_offset = current_offset + 8
          end if
        end if
    if CpySize and 4 = 4
      mov r9d,dword [Src+current_offset]
          current_offset = current_offset + 4
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov r9,qword [Src+current_offset]
          current_offset = current_offset + 8
        end if
        if CpySize and 2 = 2
      mov r10w,word [Src+current_offset]
          current_offset = current_offset + 2
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov r10,qword [Src+current_offset]
          current_offset = current_offset + 8
        end if
        if CpySize and 1 = 1
          mov r11b,byte [Src+current_offset]
          current_offset = current_offset + 1
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov r11,qword [Src+current_offset]
          current_offset = current_offset + 8
        end if
            
        current_offset = 0
        qcount = 0
                
        if CpySize/8 > qcount
          qcount = qcount + 1
      mov qword [Dst],rax
          current_offset = current_offset + 8
        end if
        if bRbx = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],rbx
            current_offset = current_offset + 8
          end if
        end if
        if bRsi = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],rsi
            current_offset = current_offset + 8
          end if
        end if
        if bRdi = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],rdi
            current_offset = current_offset + 8
          end if
        end if
        if bR12 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],r12
            current_offset = current_offset + 8
          end if
        end if
        if bR13 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],r13
            current_offset = current_offset + 8
          end if
        end if
        if bR14 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],r14
            current_offset = current_offset + 8
          end if
        end if
        if bR15 = 1
          if CpySize/8 > qcount
            qcount = qcount + 1
            mov qword [Dst+current_offset],r15
            current_offset = current_offset + 8
          end if
        end if
        if CpySize/8 > qcount
          qcount = qcount + 1
          mov qword [Dst+current_offset],rcx
          current_offset = current_offset + 8
        end if
        if ~ Src eqtype rax
          if CpySize/8 > qcount
            qcount = qcount + 1
                mov qword [Dst+current_offset],rdx
                current_offset = current_offset + 8
          end if
        end if
        if ~ Dst eqtype rax
          if CpySize/8 > qcount
            qcount = qcount + 1
                mov qword [Dst+current_offset],r8
                current_offset = current_offset + 8
      end if
        end if
    if CpySize and 4 = 4
      mov dword [Dst+current_offset],r9d
          current_offset = current_offset + 4
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov qword [Dst+current_offset],r9
          current_offset = current_offset + 8
        end if
        if CpySize and 2 = 2
      mov word [Dst+current_offset],r10w
          current_offset = current_offset + 2
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov qword [Dst+current_offset],r10
          current_offset = current_offset + 8
        end if
        if CpySize and 1 = 1
      mov byte [Dst+current_offset],r11b
          current_offset = current_offset + 1
        else if CpySize/8 > qcount
          qcount = qcount + 1
          mov qword [Dst+current_offset],r11
          current_offset = current_offset + 8
        end if
  else if (~ CpySize eqtype rax & CpySize > 0) | (CpySize eqtype rax)
        if ~ CpySize eqtype rax
          mov rcx,CpySize
          mov r9,CpySize
          if ~ Src eqtype rax
            mov rdx,Src
          end if
          if ~ Dst eqtype rax
            mov r8,Dst
          end if
        else
          mov r9,rcx
          if ~ Src eqtype rax
            mov rdx,Src
          end if
          if ~ Dst eqtype rax
            mov r8,Dst
          end if
        end if
        shr rcx,4
    mov r11d,16
    jz check8
  align 8
  loop32:
    mov rax,[rdx]
    mov r10,[rdx+8]
    lea rdx,[rdx+r11]
    mov [r8],rax
    mov [r8+8],r10
    add r8,r11
    sub rcx,1
    jz check8
    mov rax,[rdx]
    mov r10,[rdx+8]
    lea rdx,[rdx+r11]
    mov [r8],rax
    mov [r8+8],r10
    add r8,r11
    sub rcx,1
    jnz loop32
  check8:       
    test r9d,8
    jz check4
    mov rax,[rdx]
    lea rdx,[rdx+8]
    mov [r8],rax
    add r8,8
  check4:
    test r9d,4
    jz check1
    mov eax,[rdx]
    lea rdx,[rdx+4]
    mov [r8],eax
    add r8,4
  check1:
    and r9,3
    jz bye
  align 4
  loop1:
    mov al,[rdx]
    lea rdx,[rdx+1]
    mov [r8],al
    add r8,1
    sub r9,1
    jnz loop1
  bye:
  end if
}
;##############################################################################################################    
Code:
;##############################################################################################################

; Copy a block of memory from one address to another

; Entry:
;   rcx = Number of bytes to copy
;   rdx = Source address
;   r8 = Destination address
; Return:
;   None
proc MemCopy,CpySize,pSrc,pDst
  mov r9,rcx
  shr rcx,4
  mov r11d,16
  jz .check8
align 8
.loop32:
  mov rax,[rdx]
  mov r10,[rdx+8]
  lea rdx,[rdx+r11]
  mov [r8],rax
  mov [r8+8],r10
  add r8,r11
  sub rcx,1
  jz .check8
  mov rax,[rdx]
  mov r10,[rdx+8]
  lea rdx,[rdx+r11]
  mov [r8],rax
  mov [r8+8],r10
  add r8,r11
  sub rcx,1
  jnz .loop32
.check8:        
  test r9d,8
  jz .check4
  mov rax,[rdx]
  lea rdx,[rdx+8]
  mov [r8],rax
  add r8,8
.check4:
  test r9d,4
  jz .check1
  mov eax,[rdx]
  lea rdx,[rdx+4]
  mov [r8],eax
  add r8,4
.check1:
  and r9,3
  jz .ret
align 4
.loop1:
  mov al,[rdx]
  lea rdx,[rdx+1]
  mov [r8],al
  add r8,1
  sub r9,1
  jnz .loop1
.ret:
  ret
endp

;##############################################################################################################    
Post 13 Oct 2018, 15:50
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 13 Oct 2018, 22:31
I would like some input on the latest macro, is it good? Is there anything in the macro variant that should be designed differently? Is there any places where match would be better to use and perhaps use sub-macros inside the main macro?

If anyone is alive on the forum (It doesn't seem like people are active here)
Post 13 Oct 2018, 22:31
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2564
Furs 14 Oct 2018, 14:18
MacroZ wrote:
I haven't tested rep movsb alone, I will test it. But on 64-bit and a fairly new computer rep movs should probably be avoided all together.
Both rep stosb and rep movsb are fast on newer CPUs. I don't know any CPU where only stos is fast compared to movs (well, on older CPUs they're both slow obviously). I know that on Haswell (my CPU) they have both been enhanced to use 256-bit operations (internally).

I found this with a lot more info if you want Smile https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy

Your macro seems pretty large, but if it works then it's fine. match is used when you need to do symbol comparisons.

Note that FASM has two stages. The preprocessor deals with text (symbols) and match is part of it. The second stage is assembly stage. if statements are part of assembly stage, so they can mostly refer to numbers only (with few exceptions, such as registers and the like). All the symbols/values defined in assembly stage (with = operator) can only contain such numbers or whatever. You can't contain arbitrary text/symbols, you need "equ" and "define" preprocessor for that.

match is useful for macros with "custom syntax", instead of just passing parameters normally. e.g. you can extract a parameter that looks like "rdi:true" into "rdi" and "true" and do other sorts of text processing (both of those are text).

But if you parameters like "true, true, true, false, true", you don't need it. Of course assuming true expands during preprocessing to some number (1?). Remember: variables during assembly stage don't contain arbitrary text, all of it has been replaced by preprocessor.

You can actually not define "true" and "false" and use match to check for "=true" (literal text) on a parameter, if you want, instead of using if. Then, it will be replaced at preprocessing time.
Post 14 Oct 2018, 14:18
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 14 Oct 2018, 17:32
Care to show me how you would do the macro prototype? Very Happy
Post 14 Oct 2018, 17:32
View user's profile Send private message Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 14 Oct 2018, 18:51
AMD also has some rep movs alternatives in Software Optimization Guide for AMD64 Processors pulled on 25112 Rev. 3.06 September 2005 Section 5.13. It's a bit old, so it's possible their method was superseded. They have examples, which I found interesting and implemented part of one and tested it myself. The alignment wasn't yet implemented, but copying seemed to work:

Code:

        mov [linearCopy.copyAddress], rcx
        mov [linearCopy.copyDestAddress], rdx
        mov [linearCopy.copySize], r8


        .copySet:

        mov rsi, [linearCopy.copyAddress]
        mov rdi, [linearCopy.copyDestAddress]


        cld


        mov rax, [linearCopy.copySize]
        mov [linearCopy.copySizeRemainder], rax
        shr rax, 101b                           ; Divide by 32
        mov [linearCopy.copySize], rax
        mov rax, [linearCopy.copySize]
        mov rdx, 0
        mov rcx, 100000b
        imul rcx
        mov r10, [linearCopy.copySizeRemainder]
        sub r10, rax
        mov [linearCopy.copySizeRemainder], r10

        mov rax, [linearCopy.copySize]
        cmp rax, 0
        je linearCopy.smallCopyOnly
        
        ;and rsp, -32;align 16                  ; Not working yet
        .copyLarge:                             ; Copy in chunks of 4 qwords. AMD Optimization recommendation. Compare with rep movsq.
        mov r8, [rsi]
        mov r9, [rsi+1000b]
        add rsi, 100000b
        movnti [rdi], r8
        movnti [rdi+1000b], r9
        add rdi, 100000b
        mov r8, [rsi-10000b]
        mov r9, [rsi-1000b]
        dec rax
        movnti [rdi-10000b], r8
        movnti [rdi-1000b], r9
        jnz linearCopy.copyLarge

        .smallCopyOnly:

        mov rcx, [linearCopy.copySizeRemainder] 

        rep movsb                               


        mov rax, [linearCopy.copyDestAddress]
    
Post 14 Oct 2018, 18:51
View user's profile Send private message Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 14 Oct 2018, 18:53
AMD also has some rep movs alternatives in Software Optimization Guide for AMD64 Processors pulled on 25112 Rev. 3.06 September 2005 Section 5.13. It's a bit old, so it's possible their method was superseded. They have examples, which I found interesting and implemented part of one and tested it myself. The alignment wasn't yet implemented, but copying seemed to work:

Code:

        mov [linearCopy.copyAddress], rcx
        mov [linearCopy.copyDestAddress], rdx
        mov [linearCopy.copySize], r8


        .copySet:

        mov rsi, [linearCopy.copyAddress]
        mov rdi, [linearCopy.copyDestAddress]


        cld


        mov rax, [linearCopy.copySize]
        mov [linearCopy.copySizeRemainder], rax
        shr rax, 101b                           ; Divide by 32
        mov [linearCopy.copySize], rax
        mov rax, [linearCopy.copySize]
        mov rdx, 0
        mov rcx, 100000b
        imul rcx
        mov r10, [linearCopy.copySizeRemainder]
        sub r10, rax
        mov [linearCopy.copySizeRemainder], r10

        mov rax, [linearCopy.copySize]
        cmp rax, 0
        je linearCopy.smallCopyOnly
        
        ;and rsp, -32;align 16                  ; Not working yet
        .copyLarge:                             ; Copy in chunks of 4 qwords. AMD Optimization recommendation. Compare with rep movsq.
        mov r8, [rsi]
        mov r9, [rsi+1000b]
        add rsi, 100000b
        movnti [rdi], r8
        movnti [rdi+1000b], r9
        add rdi, 100000b
        mov r8, [rsi-10000b]
        mov r9, [rsi-1000b]
        dec rax
        movnti [rdi-10000b], r8
        movnti [rdi-1000b], r9
        jnz linearCopy.copyLarge

        .smallCopyOnly:

        mov rcx, [linearCopy.copySizeRemainder] 

        rep movsb                               


        mov rax, [linearCopy.copyDestAddress]
    
Post 14 Oct 2018, 18:53
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 14 Oct 2018, 20:24
Even if it is old people can still make use of it, it doesn't mean it stops there, although I was thinking more about the macro implementation itself, but thanks anyway, nice example to keep in mind. Very Happy
Post 14 Oct 2018, 20:24
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2564
Furs 14 Oct 2018, 21:49
MacroZ wrote:
Care to show me how you would do the macro prototype? Very Happy
Well I was just giving you some possibilities since you asked about match. In general, you use match when you want to do some text processing like that (i.e. if you want to match the literal "true" text, instead of having it replaced by a number constant).

In this case, I'd rather just specify the registers as a single parameter, each separated by space. It's the cleanest way to call this macro IMO.

Something like this:
Code:
@err fix macro +

macro _m_MemCopy CpySize*,Src*,Dst*,regs* {
  local loop32,check8,check4,check1,loop1,bye
  irp reg, bRbx,bRsi,bRdi,bR12,bR13,bR14,bR15 \{
    local reg
    reg = 0
  \}
  define reg
  irps r, regs \{
    match =rbx, r \\{ define reg bRbx \\}
    match =rsi, r \\{ define reg bRsi \\}
    match =rdi, r \\{ define reg bRdi \\}
    match =r12, r \\{ define reg bR12 \\}
    match =r13, r \\{ define reg bR13 \\}
    match =r14, r \\{ define reg bR14 \\}
    match =r15, r \\{ define reg bR15 \\}
    match , reg \\{
      @err "Bad register"
    \\}
    reg = 1
    restore reg
  \}
  restore reg

  ; more stuff
}    
Use it like:
Code:
_m_MemCopy 1, 2, 3, rbx r13 r15    
(just showing the register parameters of course)

Just FYI, after preprocessing, this will look like:
Code:
; the locals here would be replaced by some local auto-generated names due to our use of local without the \ (so it's part of macro, not irp)
bRbx = 0
bRsi = 0
bRdi = 0
bR12 = 0
bR13 = 0
bR14 = 0
bR15 = 0

bRbx = 1
bR13 = 1
bR15 = 1    
You can also use bitwise mask of flags (each register = 1 bit) if you want more efficient assembly process (not that important, just time to assemble it and memory usage).

It's only slightly important because the assembly stage is multi-pass, so this will get "evaluated" multiple times for each pass if needed.
Post 14 Oct 2018, 21:49
View user's profile Send private message Reply with quote
MacroZ



Joined: 12 Oct 2018
Posts: 30
MacroZ 14 Oct 2018, 22:49
Something like that Very Happy

I will try it out as soon as I get back to coding. I hate it when I can't get macro's as clean as that.
Post 14 Oct 2018, 22:49
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.