flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > Win64 Tokenize command line arguments

Author
Thread Post new topic Reply to topic
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 06 Dec 2018, 15:20
This procedure creates an array of DWORD pointers to command line arguments that are either delimited with single/double quotes or spaces. It is intended to be used at the prologue like this;
Code:
        enter   sizeof.MSG, 0
        mov     ebx, esp
        and     esp, -16
        or      qword [rbx + MSG.wParam], -1

        call    ParseCL

        call    CreateMainWnd
        jc      @F              ; Bail if main window wasn't created

        sub     esp, 32         ; Shadow spade for API's

  Pump: mov     ecx, ebx
        xor     edx, edx
        mov      r8, rdx
        mov      r9,  r8
        call    [GetMessage]
        or      eax, eax
        jnz     @F

        mov     ecx, dword [ebx + MSG.wParam]
.exit:  leave
        call    [ExitProcess]

    @@: jns     @F
        mov     ecx, eax
        jmp     .exit

    @@: mov     ecx, ebx
        call    [TranslateMessage]
        mov     ecx, ebx
        call    [DispatchMessage]
        jmp     Pump    
There are a few programs I have where I'll customize the shortcut to address specific parameters or files relative to that invocation. Routine has been tested although not exhaustively, but would appreciate any input to make it bullet proof as possible. As long as there isn't mismatched quotes return values are predictable assuming care is taken to formulate command line properly.
Code:
  Array_Size = 128              ; Max arguments that can be passed to application

  ParseCL:

    ; Create a new stack frame of Array_Size DWORDS.

        pop     rax
        sub     esp, Array_Size shl 2
        and     esp, -32

        push    rax             ; Write return address back to TOS of new stack
        push    rbx             ; Only non-volatile register.
        lea     rbx, [esp + 16] ; Pointer to array of char pointers.

    ; Grab pointer to command line from executive. Shadow space isn't required.

        call    [GetCommandLine]
        mov     esi, eax

    ; Inline a routine to determine length of ASCIIZ string

        xor      al,  al        ; NULL
        or      ecx, -1         ; Max length (is ridiculously large)
        mov     edi, esi        ; RDI required by SCASB
        repne   scasb           ; Look for character in AL
        not     ecx
        dec     ecx
        mov     edi, edx        ; Base of array of QWORD pointers
        xor     edx, edx
        mov     eax, edx        ; MSB's are going to be status bits

; Anytime a non-delimiting character (space, single quote or double quote) is encountered
; that is the indication of the beginning of another argument.

  GetNext:

        mov     al, [esi]       ; Get next character
        or      al, al          ; If last argument is quoted, this needs to be done.
        jz      .done
        cmp     al, ' '         ; Test for probably the most common deliminator
        jnz     @F

    ; This segment eliminates leading spaces

        xor     al, al
        mov     [esi], al
        inc     esi
        dec     ecx
        jmp     GetNext

   @@:  cmp     al, 34          ; Is it double quote
        jz      @F
        cmp     al, 39          ; Is it single quote
        jz      @F

    ; Assume an argument that is quoted is terminated with space.
    ; If string is suffixed with a quote it will be included in string unless it's
    ; the last argument

        mov     al, ' '
        jmp     .update

    ; If a trailing quote doesn't match a leading, this flag will be set on exit

   @@:  bts     eax, 8          ; Indicate we're inside a quoted string
        lodsb
        dec     ecx

     .update:
        inc     dl              ; Bump argument count
        mov     [rbx+rdx*4], esi
        mov     edi, esi        ; Needed for next instruction
        repnz   scasb
        mov     esi, edi        ; Begin from this point
        mov     byte [esi-1], 0 ; Replace delimiter with NULL
        jnz     .done           ; If EOS and this arg isn't quoted ZF=0

        btr     eax, 8          ; Found the trailing quote, get rid of flag
        jmp     GetNext

   ; Now we can pack this array up against callers stack and return so RSP will be
   ; QWORD aligned.

     .done:
        mov     al, dl          ; AH might have mis-balanced quotes flag
        mov     [rbx], eax      ; Write total number of arguments to

        mov     ecx, edx
        inc      cl             ; Number of arguments at beginning
        mov      dl, Array_Size
        sub     edx, ecx
        shl     edx, 2
        add     edx, esp
        and     edx, -16        ; So RSP will be QWORD aligned on return
        lea     eax, [edx+16]
        mov     [CmdArgs], rax  ; Save global copy of pointer to array of arguments
        mov     edi, edx
        mov     esi, esp
        mov     esp, edi
        add     cl, 4           ; Include two values already on stack
        rep     movsd

        pop     rbx
        ret
    


Code:
section '.bss' data readable writeable
; ============================================================================================

    MainWnd     dd      ?
    hInst       dd      ?
    CmdArgs     dq      ?
    
Post 06 Dec 2018, 15:20
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 07 Dec 2018, 07:10
But RSI and RDI are also non-volatile, aren’t they? I believe, a short explanation on why having at most N parameters on the stack can be more profitable than using CommandLineToArgv is also a good idea for the example.
Post 07 Dec 2018, 07:10
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 07 Dec 2018, 16:56
As RBX is the only register initialized to a value contingent upon operation of the application, it is, in essence, the only non-volatile register. However, if this routine was to be called from somewhere else, then stricter adherence to ABI would be prudent. However, if this was the first routine to be called, then even RBX wouldn't need to be preserved.

The first issue I had with that API is this excerpt from Win 8.0 documentation;
Quote:
However, if lpCmdLine starts with any amount of whitespace, CommandLineToArgvW will consider the first argument to be an empty string. Excess whitespace at the end of lpCmdLine is ignored.
It's not to say my code doesn't have limitations, but leading spaces is the one I wanted to avoid in any context and there is no way of changing API short of hacking it.

Obviously as not having anything to do with the implementation of CommandLineToArgvW but I can't help but think, why does it have to be so convoluted, especially as it applies to allocating space on the heap for probably no more than 10 pointers. If the array on the stack created by ParseCL isn't needed anymore just simply move RSP.
Post 07 Dec 2018, 16:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.