flat assembler
Message board for the users of flat assembler.

Index > Windows > Optimization advice

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
AE



Joined: 07 Apr 2022
Posts: 72
AE 02 Mar 2023, 17:58
Furs wrote:
here's an explanation

First of all, thank you for the detailed explanation.

Let me try to explain what I meant when I asked the question.
Here is my original code with your version:
Code:
format PE64 GUI 6.0
entry start
include 'win64w.inc'
section '.data' data readable writeable

    struct UNICODE_STRING
        Length             dw ?
        MaximumLength      dw ?
                           dd ?
        Buffer             dq ?
    ends
    SlashPat    dq ?,?  ; 128 bit mask for SSE2 search
    Path        du '\??\X:\1\1\11',0
    CurPath     UNICODE_STRING

section '.text' code readable executable
start:
    sub     rsp,8
    ; Create and save pattern '\' mask
    mov    eax, 005C005Ch                ; du '\\'
    movd   xmm0, eax                     ;
    pshufd xmm0, xmm0, 0                 ; fill whole xmm0 with DWORD value from xmm0
    movdqu dqword [SlashPat], xmm0       ; save xmm0 to var

    mov     [CurPath.Length], 13*2
    mov     [CurPath.Buffer], Path
    @@:
    fastcall StepBackward
    invoke  MessageBox, HWND_DESKTOP, addr Path, 0, MB_ICONERROR or MB_OK or MB_TOPMOST
    cmp     [CurPath.Length], 14
    ja      @b
    error_exit:
    invoke  ExitProcess,0


    proc StepBackward
        movzx   rcx, [CurPath.Length]       ; String lenght (in bytes)
        mov     r8,  [CurPath.Buffer]       ; String buffer
        mov     r10, r8                     ; Stop  addr
        lea r8, [r8 + rcx - 2]              ; Start addr
        lea     rdx, [SlashPat]             ; pattern (pre-generated)
        movdqu  xmm1,  [rdx]                ; load it
        ;
        mov edx, r8d
        and r8, -16
        and edx, 15
        mov ecx, 31
        sub ecx, edx
        ;
        movdqa xmm0, [r8]
        pcmpeqw xmm0, xmm1
        pmovmskb eax, xmm0
        shl eax, cl
        test eax, eax
        jnz .found_in_tail
        ; here do aligned 16-byte-at-a-time loop
        .sse_search:
        sub      r8, 16
        cmp      r8, r10
        jb       error_exit
        movdqa   xmm0, [r8]
        pcmpeqw  xmm0, xmm1
        pmovmskb eax, xmm0
        test     eax, eax
        jz       .sse_search
        ; sse2 loop match found
        bsr    eax, eax                      ; get last bit
        lea    rax, [r8+rax]                 ; calc addr
        and    eax, -2                       ; fix bit shift
        mov    ebx, eax
        jmp    .EOS
        .found_in_tail:
        bsr eax, eax
        xor ebx, ebx
        mov bx, [CurPath.Length]
        sub ebx, 32+1
        add ebx, eax
        js error_exit                        ; exception (impossible situation when '\' is not found in a string)
        add     rbx, r10                     ; <-- (buffer addr + your offset) points to SECOND BYTE of '\'
        ;
        .EOS:                                ; done
        mov     rdx, rbx
        sub     rbx, [CurPath.Buffer]        ; Calc new UNICODE_STRING size
        cmp     rbx, 14                      ; NT path used ('\??\X:\')
        jbe     @f                           ; if not disk root
        mov     [rdx], word 0                ; add zeroterm (Special case. Not normally used in UNICODE_STRING)
        jmp     .skip
        @@:
        mov     [rdx+2], word 0              ; if disk root leave '\'
        add     rbx, 2                       ; and write zero after it
        .skip:
        mov     [CurPath.Length], bx         ; save new UNICODE_STRING size
        ret
    endp


section '.idata' import data readable writeable
library kernel32,'KERNEL32.DLL',\
            user32,  'user32.dll'
    include 'api/KERNEL32.inc'
    include 'api/USER32.inc'    

Code without any optimizations to keep your code absolutely accurate.
If you stop the debugger on line 79 (;<--), you will see an offset of 1 byte forward.
Post 02 Mar 2023, 17:58
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2599
Furs 02 Mar 2023, 18:35
Oops, as I suspected I messed something up because of wide chars Wink

The problem is at the top when calculating the shift:
Code:
        and r8, -16
        and edx, 15
        mov ecx, 31
        sub ecx, edx    
This must be:
Code:
        and r8, -16
        and edx, 15
        mov ecx, 30
        sub ecx, edx    
(30 instead of 31)

if you want, for less magic numbers, write 32 - 2 (or 32 - size_of_wchar)

That's because we don't start with the last byte (bit in mask), but the second-to-last-byte, even though it's the last character.

A couple more suggestions:
Code:
        mov     r10, [CurPath.Buffer]       ; String buffer
        lea r8, [r10 + rcx - 2]             ; Start addr    

and:
Code:
        xor ebx, ebx
        mov bx, [CurPath.Length]

replace with:
        movzx edx, word [CurPath.Length]    
And use edx directly, you're already moving ebx into edx down below.

Also, you can get rid of the "js error_exit" since it's not needed in your case. You already don't handle it in the main loop, so there's no reason to handle it only here, and it should speed things up (less branches to predict, less code, etc).

This way you can simplify the tail to:
Code:
        .found_in_tail:
        bsr     eax, eax
        movzx   edx, word [CurPath.Length]
        lea     edx, [rdx + rax - (32+1)]
        add     rdx, r10    
For lea it's a bit confusing in 64-bit, but as a rule of thumb (if using registers before r8 ), if you're working with 32-bit calculations only, not actual 64-bit pointers, put the 32-bit version as destination operand, and 64-bit operands in the address. The above lea is encoded without any REX prefix or address size override, so it's the most efficient, at least in size.
Post 02 Mar 2023, 18:35
View user's profile Send private message Reply with quote
AE



Joined: 07 Apr 2022
Posts: 72
AE 03 Mar 2023, 17:57
Furs, thank you!
Post 03 Mar 2023, 17:57
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.