flat assembler
Message board for the users of flat assembler.

Index > Windows > Message Pump Optimization

Author
Thread Post new topic Reply to topic
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
I built this one for pure speed Razz
Also i was tired of seeing MSG struct in data section...
Code:
        sub     esp,28          ; allocate MSG struct
        mov     esi,esp           ; ESI -> MSG struct
        push    PM_REMOVE       ; PeekMessage::wRemoveMsg
        push    0                  ; PeekMessage::wMsgFilterMax
        push    0                  ; PeekMessage::wMsgFilterMin
        push    NULL             ; PeekMessage::hWnd
        push    esi            ; PeekMessage::lpMsg
    MainLoop:
        call    [PeekMessage]
        sub     esp,20  ; re-allocate PeekMessage params
        test    eax,eax ; PeekMessage returns zero if queue is empty
        jz      RenderFrame ; jumps back into MainLoop
        cmp     dword[esi+4],WM_QUIT ; MSG::message
        je      ExitLoop
       ;invoke  TranslateMessage,esi
        invoke  DispatchMessage,esi
        jmp     MainLoop
    ExitLoop:
        add     esp,28+20 ; release MSG struct and PeekMessage params

    

I think it kicks ass, what do you think?

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.


Last edited by bitshifter on 29 Jun 2010, 16:15; edited 1 time in total
Post 27 Jun 2010, 10:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
How do you know if it is faster than other code?

Have you seen the code executed by the OS when you call PeekMessage?

You have no delay in your loop so it will use full CPU resources while achieving nothing. Perhaps you should consider giving time to other tasks to do the things that you are waiting for?
Post 27 Jun 2010, 10:37
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
It has to be faster since we are using less instructions from the inner loop.
And yeah, it was meant to be a total resource hog Razz
Post 27 Jun 2010, 10:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
bitshifter wrote:
It has to be faster since we are using less instructions from the inner loop.
That is a false assumption. You have to test it to see if it really is faster. Although I am unsure just how you will test that. Confused
bitshifter wrote:
And yeah, it was meant to be a total resource hog Razz
So the system slows down and you think that makes it faster? I am confused.
Post 27 Jun 2010, 10:49
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
Basically i traded pushing 5 parameters each iteration for one subtraction.
In my book, thats always faster...
Its for a single threaded demo loop, not a normal application.
Post 27 Jun 2010, 11:03
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
bitshifter wrote:
Basically i traded pushing 5 parameters each iteration for one subtraction.
In my book, thats always faster...
I suggest you test to see if it really is faster. Your assumption may be false and will never know until you test it. Modern CPUs are very complex with all sorts of weird things that happen internally. You might be surprised at the results.
bitshifter wrote:
Its for a single threaded demo loop, not a normal application.
Thanks for clarifying the usage case. But I want to ask, what is the loop waiting for? If it is waiting for user input then your 100% resource usage (to save a perhaps few nano seconds at most) is just power burned for no good reason. The user would never notice such a small increase in response time. If you are waiting for something else, then what?
Post 27 Jun 2010, 11:16
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
One thing to note here is that Windows does not guarantee to leave the stack parameters untouched. The values may have been changed:
Code:
    MainLoop:
        call    [PeekMessage]
        sub     esp,20  ;<--- the values may have been changed!
        test    eax,eax
        jz      MainLoop    
Post 27 Jun 2010, 11:23
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 764
Location: Massachusetts, USA
bitshifter
I never seen any winapi code play behind the RIP before?
Post 27 Jun 2010, 11:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17716
Location: In your JS exploiting you and your system
revolution
bitshifter wrote:
I never seen any winapi code play behind the RIP before?
But are you willing to take the risk that all previous, current and future versions of Windows will never alter the incoming parameters?
Post 29 Jun 2010, 11:27
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
You forgot to align the MainLoop label to 16bytes.
Post 30 Jun 2010, 16:08
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
ass0



Joined: 31 Dec 2008
Posts: 521
Location: ( . Y . )
ass0
why?

_________________
Image
Nombre: Aquiles Castro.
Location2: about:robots
Post 30 Jun 2010, 17:58
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
ass0,

Cache fill after miss after jmp MainLoop will probably perform better.
Post 30 Jun 2010, 23:52
View user's profile Send private message Reply with quote
ass0



Joined: 31 Dec 2008
Posts: 521
Location: ( . Y . )
ass0
huh? but how it will filled? with several nop to reach the 16th byte?
In that case, processing each nop takes 1 cycle, then where is the needed to align?

Without to mention the file size grows.

_________________
Image
Nombre: Aquiles Castro.
Location2: about:robots
Post 01 Jul 2010, 00:26
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
ass0, but those NOPs will be executed only once in the entire program run, the remaining N-1 iterations of the loop will skip those NOPs.

Quote:
Without to mention the file size grows.
That is provided the extra bytes helped to exceed a 512-byte boundary, otherwise the file size will be unaffected.
Post 01 Jul 2010, 01:13
View user's profile Send private message Reply with quote
ass0



Joined: 31 Dec 2008
Posts: 521
Location: ( . Y . )
ass0
wow!

_________________
Image
Nombre: Aquiles Castro.
Location2: about:robots
Post 01 Jul 2010, 01:16
View user's profile Send private message Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
This isn't bottleneck.
The bottleneck is message processing not picking:

Processing by:
1. jumptables (sequential)
2. message frequency analysis (sequential but more efficient)
3. binary trees on message processing (non-sequential long-time ago implemented by c/c++ compilers)

And I doubt about reusing used parameters unless some API states that it's func (const param).
Anyway doesn't worth optimization at all.
Post 01 Jul 2010, 07:18
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
As mentioned before, you definitely shouldn't re-use stack parameters. I haven't bumped into cases where API calls modify stack parms, and there might be Microsoft coding guidelines against it - but the day it happens, you'll be pretty sorry.

Also, please stop doing CPU-greedy MessageLoops like the one above... there's really no reason to do it, and it sucks wrt. conserving laptop battery.
Post 01 Jul 2010, 22:09
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.