flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
f0dder
optimizing a wndproc is a waste of time
![]() What do you want to "optimize" it for anyway? Speed or size? (speed is 100% waste of time for a wndproc). |
|||
![]() |
|
LocoDelAssembly
Well, the "sub esp,20" has nothing to do there. Look what really happens
Code: WindowProc: push ebp mov ebp, esp . . . sub esp, 20 leave ; mov esp, ebp | pop ebp jmp [DefWindowProc] As you can see, the work done by "sub esp, 20" is inmediatelly destroyed by the "leave" instruction. For certain instructions that uses EAX tends to encode in fewer bytes than when using another register. And yes, is probably a waste of time optimizing this because is very probable that the benefits cannot be measured with your eye. Optimizing for size probably is the best to do if you are interested in optimizing. |
|||
![]() |
|
hologram
So why does it work ?
|
|||
![]() |
|
f0dder
hologram wrote: So why does it work ? Look up how the "leave" instruction works ![]() _________________ ![]() |
|||
![]() |
|
LocoDelAssembly
Quote:
Code: sub esp, 20 leave ; mov esp, ebp | pop ebp Pascal-like equivalent Code: esp := esp - 20; { sub esp, 20 } {*** leave ***} esp := ebp; { mov esp, ebp } ebp := esp^; esp := esp + 4; { pop ebp } {*** end of leave instruction *** } |
|||
![]() |
|
r22
If you had "A LOT" of window messages to check I could see using a jump table as opposed to a repeated cmp and jz 's. A jump table would increase the size of your code, BUT (if you had A LOT of messages to check) "could" noticably improve performance.
Example of a JumpTable Code: JMP_TABLE: dd jmpdefault ;;if wmsg = 0 dd jmpdefault ;;if wmsg = 1 dd wm_destroy ;;if wmsg = 2 = WM_DESTROY EJMP_TABLE: ... mov edx,[wmsg] ;;make sure the value isn't greater than the number of entries ;;in your jump table (3 in this example) cmp edx, (EJMP_TABLE - JMP_TABLE) SHR 2 jae [DefWindowProc] jmp dword[JMP_TABLE + edx*4] ... jmpdefault: jmp [DefWindowProc] |
|||
![]() |
|
m
How many times a jum-table is better than a straight cmp-jmp chain ?
|
|||
![]() |
|
f0dder
Jump table decreases code size, but increases data size - and easily pretty dramatically much. You could end up getting some heavy cache trashing, so the CMP+JE (please, JE and not JZ, more logical mnemonic) sequence is probably smarter.
You could even have the few messages that are called most of the time with a CMP+JE sequence, and then split into the binary tree approach for the rest - but again, this is fucking overkill for something as speed-insensitive as a wndproc ![]() |
|||
![]() |
|
Enko
Quote:
Aren't they synonims? like "dark" and "black" almoste the same. |
|||
![]() |
|
LocoDelAssembly
Yep, both mnemonics correspond to the same instruction. f0dder means that looks ilogical comparing values and jumping if the comparison results zero instead of equal. It recovers some sense when you remember that CMP is SUB without saving result in destination though
![]() About the jump table, it is not so good when the consecutives values are to few, you have to perform a mix of binary tree approach with it. I'm not so sure about f0dder says about cache trashing, after all, code waste cache too and micro-arquitecture jokes like the one Pentium4 has polutes the trace cache much more with branching because it tends to store the same decoded instruction more than once. |
|||
![]() |
|
r22
WM_'s go from 0 - 1023 (not including USER DEFINED).
Each entry in the jump table would take up 4 bytes on a 32bit system. 1024*4 = 4096 Bytes or 4KB Good size for a LUT I would probably only implement a jump table for the wndproc if I had ~30 or more messages that I wanted to handle. Any less than that and it would be just be over kill IMHO. But it's an interesting subject. Most of us just assume from experience and how the windowing system works that optimizing the wndproc would be pointless BUT I don't think we've ever actually setup a benchmark to know once and for all. Maybe in an opengl program you'd want an optimal windowproc for key handling... odd that the fasm ddraw example works but the opengl example doesnt work on my win xp64 box. |
|||
![]() |
|
f0dder
LocoDelAssembly: IMHO "JE" is more logic than "JZ" in this particular case, since you're (logically) checking for equality, not zero.
Quote:
Well, you waste less cache with the usually rather small amount of messages you have to handle. The CMP/JE sequence takes max 11 bytes per message (7 bytes for those that fit within +127 byte range), meaning that you'd need >370 messages to take up the same space as a jump table ![]() Hadn't thought of the trace cache, that could be an issue as well (but how much, really? How much effect does it have, considering the amount of API code etc. your wndproc goes through?) r22 wrote:
It's going to be pretty damn hard to do any timings, even if you find a really old box. I'm not sure how I'd even go about setting up the timing... r22 wrote:
Don't really think so - even on a 100mhz 486, key input is infinitely slow... slow in the sense that you're dealing with a pathetic human being with all our mechanical limitations ![]() The only thing I can really think of where wndproc handling could matter (ie, with enough messages coming at a fast rate) would be WM_* async socket stuff... but that sucks bigtime, and it's bottleneck lies in the whole message system rather than how you code your wndproc anyway. Not saying that jump tables can't be good, though - they can be really nifty when dealing with byte range input, for instance. |
|||
![]() |
|
LocoDelAssembly
Quote:
And I agree, what I said about JZ was it is not an intentional obfuscation because CMP is a substraction and when both values are equal the result is zero. About the trace cache for some reason I can't find the long and good explanation in Agner Fog's manual ![]() Still, it says this Agner Fog - Optimizing Assembly wrote: Microprocessors with a trace cache are likely to store multiple instances of the same About the 4 KB table, if utilized, then f0dder is right about cache trashing (unless very few cache lines boundaries gets accessed). |
|||
![]() |
|
f0dder
LocoDelAssembly wrote:
You'd have to look at the distribution of WM_* messages handled by your program to answer that one ![]() |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.