flat assembler
Message board for the users of flat assembler.

Index > Examples and Tutorials > 64 bit Message Pump

Author
Thread Post new topic Reply to topic
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
I find the whole calling convention in 64 bit a little bizarre, but be that as it may, I still managed to tweak the message pump by almost 50%.

Conventional
Code:
            @@: invoke  GetMessage, msg, NULL, 0, 0
                or      rax, rax
                jz      @F

                invoke  TranslateMessage, msg
                invoke  DispatchMessage, msg
                jmp     @B
    
91 bytes


Modified version
Code:
                xor     edx, edx
                mov     r8, rdx
                mov     r9, rdx
                mov     rbx, msg

            @@: push    rdx
                push    r8
                push    r9
frame
                invoke  GetMessage, rbx, rdx, r8, r9
                or      rax, rax
                jz      @F

                invoke  TranslateMessage, rbx
                invoke  DispatchMessage, rcx
endf
                pop     r9
                pop     r8
                pop     rdx
                jmp     @B

            @@:  ; Exit Process would probably go here
    
49 bytes
Post 13 Jun 2013, 06:29
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
TightCoderEx,

Have you tried that? 'invoke DispatchMessage, rcx' looks suspicious.
Post 13 Jun 2013, 07:56
View user's profile Send private message Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
baldr wrote:
TightCoderEx,

Have you tried that? 'invoke DispatchMessage, rcx' looks suspicious.


Yes I have and started with a file called TEMPLATE.ASM and tweaking it to my style of tabulation and studying object code to see what the assembler is producing.


Description: Changes I've made to TEMPLATE.ASM todate
Download
Filename: Generic.ASM
Filesize: 2.17 KB
Downloaded: 193 Time(s)

Post 13 Jun 2013, 13:33
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2796
Location: dank orb
bitRAKE
The smallest way is to use a dialogbox as main window, but that is not always possible. Next best is to use a base address and stack size small enough to insure addresses are below $1'0000'0000.
Code:
    xor edi,edi
    lea esi,[.4] ; MSG
    jmp @F
.mloop:
    mov ecx,esi ; #32#
    call [user32.TranslateMessage]
    mov ecx,esi ; #32#
    call [user32.DispatchMessageW]
@@: mov ecx,esi ; #32#
    mov edx,edi
    mov r8,rdi
    mov r9,rdi
    call [user32.GetMessageW]
    test eax,eax ; -,0,+
    jg .mloop    
...note the stack is already aligned MOD 16, could be simply POP RAX at program start.

Dialogbox as main window:
Code:
    entry $
    pop rdx
    push 1
    mov ecx,[rcx]                       ; HINSTANCE ; #32#
    pop rdx                             ; IDD_DIALOG
    xor r8,r8                           ; HWND_DESKTOP
    lea r9,[DialogFunc]                 ;
    ; LPARAM on stack (0)
    dll [user32 DialogBoxParamW]
    xchg ecx,eax ; optional
    dll [kernel32 ExitProcess]    
...or maybe indirect dialog:
Code:
    entry $
    pop rax                             ; 5A                align stack, lol
    mov ecx,[rcx]                       ; 8B09              HINSTANCE #32#
    lea edx,[DlgMain]                   ;                   DLGTEMPLATE #32#
    xor r8,r8                           ; 4D 31C0           HWND_DESKTOP
    lea r9,[DialogFunc]                 ; 4C 8D0D D5FFFFFF
    ; LPARAM on stack (0)
dll [user32 DialogBoxIndirectParamW]    ; FF15 49000000
    xchg ecx,eax                        ; 91
dll [kernel32 ExitProcess]              ; FF15 52000000    
If the executable is being compressed there are other options that compress better.

_________________
¯\(°_o)/¯ unlicense.org
Post 13 Jun 2013, 13:52
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
It's probably pretty obvious that FASM is brand new to me and the purpose of this and subsequent postings is to fortify the concepts as I understand them along the way. Efficiency in size is what I primarily focus on and hopefully the residual affect is also speed.

I've used almost every variant of ML dating back to 1980 and hopefully this flattens the learning curve with FASM. As there are versions of NASM for M$ and Linux, it has been what I've been using lately, especially for my OS project, but it's not particularly GUI friendly in the windows environment.
Post 13 Jun 2013, 14:56
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2796
Location: dank orb
bitRAKE
Particularly confusing part of my post above would be that "call" and "dll" are macros - so, slight changes would be needed to use the code. They just get translated into CALL instructions. I don't use the included "invoke" macro as it's too general for size coding. Size coding can be further divided into executable size coding, and compressed size coding (which almost no one practices - just me and a couple other insane people).

Any optimization and "friendly" coding are competing goals. FASM can be very friendly, and the GUI is friendly. So, optimization can only be effective at a high-level in these domains (i.e. removing superfluous usage of API, effective usage of API, etc).

ML can be used in a similar capacity with it's HL concepts. FASM goes a long way to mimicking these features, but has an underlying superior feature-set, imho. (As does NASM in comparison to ML.)

Glad to answer any queries in my reductio ad absurdum manner.

_________________
¯\(°_o)/¯ unlicense.org
Post 13 Jun 2013, 16:07
View user's profile Send private message Visit poster's website Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 271
Location: California
BAiC
TightCoderEx: it looks like you can reduce it further:
Code:
                mov     rbx, msg

            @@: xor ecx, ecx
frame
                invoke  GetMessage, rbx, rcx, rcx, rcx
                or      rax, rax
                jz      @F

                invoke  TranslateMessage, rbx
                invoke  DispatchMessage, rcx
endf
                jmp     @B

            @@:  ; Exit Process would probably go here     


using the letter-registers reduces the byte count by one for every push and subsequent pop (2 total). using xor with the letter register also reduces the instruction by one.

edit: pop

_________________
byte me.
Post 16 Jun 2013, 03:54
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
My implementation
Code:
  -  40116c 52              push    rdx
  -  40116d 4150            push    r8
  -  40116f 4151            push    r9

  -  401171 4883ec20        sub     rsp,20h
  -  401175 4889d9          mov     rcx,rbx
  -  401178 ff15aa2f0000    call    [GetMessage]
  -  40117e 4809c0          or      rax,rax
  -  401181 741a            je      0040119d
  -  401183 4889d9          mov     rcx,rbx
  -  401186 ff15d42f0000    call    [TranslateMessage]
  -  40118c ff158e2f0000    call    [DispatchMessage]
  -  401192 4883c420        add     rsp,20h

  -  401196 4159            pop     r9
  -  401198 4158            pop     r8
  -  40119a 5a              pop     rdx
  -  40119b ebcf            jmp     40116c
    

9d - 6c = 31 = 49 bytes

Your implementation
Code:
  -  401164 31c9            xor     ecx,ecx

  -  401166 4883ec20        sub     rsp,20h
  -  40116a 4889d9          mov     rcx,rbx
  -  40116d 4889ca          mov     rdx,rcx
  -  401170 4989c8          mov     r8,rcx
  -  401173 4989c9          mov     r9,rcx
  -  401176 ff15ac2f0000    call    [GetMessage]
  -  40117c 4809c0          or      rax,rax
  -  40117f 7415            je      401196
  -  401181 4889d9          mov     rcx,rbx
  -  401184 ff15d62f0000    call    [TranlateMessage]
  -  40118a ff15902f0000    call    [DispatchMessage]
  -  401190 4883c420        add     rsp,20h

  -  401194 ebce            jmp     401164
    

96 - 64 = 32 = 50 bytes

As you can see and for all intents and purposes, we might as well say they are the same size. Problem is, yours doesn't work because of what the invoke macro does @ 40116a. Therefore, the pointer to MSG is being passed to each of the other parameters instead of NULL. This can easily be overcome by eliminating the macro
Code:
           @@:  xor     ecx, ecx

                mov     rdx, rcx
                mov      r8, rcx
                mov      r9, rcx
                mov     rcx, rbx
                sub     rsp, 20h
                call    [GetMessage]
                add     rsp, 20h
    
Maybe that might be something to look at in changing the invoke macro so RCX is the last thing to be dealt with.
Post 16 Jun 2013, 10:59
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1398
Location: Toronto, Canada
AsmGuru62
so RCX is preserved during an API call in Win64?
like EBX ESI EDI in Win32?
I did not know that.
Post 16 Jun 2013, 11:18
View user's profile Send private message Send e-mail Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
AsmGuru62 wrote:
so RCX is preserved during an API call in Win64?
like EBX ESI EDI in Win32?
I did not know that.
You kinda want to take some of the things I do with a grain of salt, as in this case TranslateMessage doesn't trash RCX, but MessageBox does. Hence, this is why I have to copy RBX @ 401183 again.

All I've done here is increased the probability of my app blowing up if run on Win8, but as I don't take using or developing for M$ very seriously, the focus here is to learn FASM. Shocked
Post 16 Jun 2013, 16:23
View user's profile Send private message Visit poster's website Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 271
Location: California
BAiC
TightCoderEx: my bad. I forgot about the calling convention. you can still reduce it by zeroing the inputs at the start of the loop rather than preserving the constant through each iteration. it reduces instruction bytes as well as avoids the false dependency chain.
Code:
                mov rbx, msg
            @@:
                xor edx, edx
                xor r8, r8
                xor r9, r9
frame
                invoke  GetMessage, rbx, rdx, r8, r9
                or      rax, rax
                jz      @F

                invoke  TranslateMessage, rbx
                invoke  DispatchMessage, rcx
endf
                jmp     @B
@@:     

_________________
byte me.
Post 19 Jun 2013, 03:09
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
And added to your example.
Code:
                 mov    ecx, ebx             ; Saves one byte
                 call   [GetMessage]
                 or     eax, eax             ; I think this might be safe.
    
As I'm very near having my new Linux box up and going, I will be abandoning M$ completely, but this and other postings have been good exercises in learning FASM better. I will probably continue developing 64 bit Linux with FASM.
Post 19 Jun 2013, 03:40
View user's profile Send private message Visit poster's website Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 271
Location: California
BAiC
is this code part of a macro with "msg" something other than a constant? if it's a memory location then I wouldn't change anything (your last code segment avoids reading from memory each iteration in that case). that said, if it's a constant then you can get rid of the dependency chain by initializing ecx each iteration:
Code:
            @@:
                xor edx, edx
                xor r8, r8
                xor r9, r9
                mov ecx, msg
frame
                invoke  GetMessage, rcx, rdx, r8, r9
                or      rax, rax
                jz      @F

                invoke  TranslateMessage, rbx
                invoke  DispatchMessage, rcx
endf
                jmp     @B
@@:    


by the way; you can reduce the work during each iteration of the loop by moving the call frame out of the loop. i don't see any other way to reduce the code size however it is possible to reduce the runtime work by converting the call-jmp to a push-jmp via:

Code:
frame
            @@:
                xor edx, edx
                xor r8, r8
                xor r9, r9
                mov ecx, msg
                invoke  GetMessage, rcx, rdx, r8, r9
                or      rax, rax
                jz      @F
               push @B
               push[DispatchMessage]
               jmp [TranslateMessage]
@@:
endf
    

I'm not sure whether this increased the size, however I do know that "push @B" will need to be converted to a "lea reg,[@B] / push reg" for position independence. that, of course, will certainly increase the code size.

_________________
byte me.
Post 19 Jun 2013, 07:46
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx
I really like your last example. Shaves off a few bytes i'm sure, but just ball parking it I think a dozen or so ticks are saved too. This might make a significant difference in apps as this loop is executed into the 10's of thousands of times.
Post 19 Jun 2013, 17:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2019, Tomasz Grysztar.

Powered by rwasa.