flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2 |
Author |
|
AsmGuru62
CMPSD is a nice trick!
It works because POINT members are aligned (by meaning) with RECT members. Very cool! In CPU Manuals, however, it is pointed out, that CALL should be paired with RET to get better performance. But, still, baldr -- you're an awesome coder!![/b] |
|||
![]() |
|
Sasha
How much things can be learned from this short example!
|
|||
![]() |
|
AsmGuru62
Yes, it is true.
However, I just want to point out that code must be optimized, but sometimes the time spent for optimization is not worth the win you get from that optimization. So, do not fall into a trap of premature optimization. I think, optimization should be done for some cases where code processes a lot of data. |
|||
![]() |
|
revolution
baldr wrote: Optimization and readability are somewhat opposite (orthogonal at least), but you may mend this with proper comments. Let's consider unsigned coordinates (and eax==-1 for indication of success): baldr wrote: xor/dec is fine, but it can stall execution later, due to partial EFLAGS update by dec. |
|||
![]() |
|
bitRAKE
Size optimization would use the API, and speed optimization wouldn't have a branch.
|
|||
![]() |
|
AsmGuru62
I wonder how it can be done with no branches.
We need to make 4 comparisons at the most, so we have to somehow transfer the code to next comparisons if previous ones did not reach the decision yet. |
|||
![]() |
|
tthsqe
Code: Point: .x dd ? .y dd ? Rect: .left dd ? .top dd ? .right dd ? .bottom dd ? IsPointInRectangle: ; return 1 for true movaps xmm0,[Rect] ; xmm0 = l | t | r | b movddup xmm1,[Point] ; xmm1 = x | y | x | y pcmpgtd xmm0,xmm1 ; xmm0 = l>x | t>y | r>x | b>y movmskps ecx,xmm0 ; ecx = l>x | t>y | r>x | b>y (in 4 bits) mov eax,ecx shr ecx,2 xor eax,ecx and eax,3 ; eax = l <= x < r | t <= y < b (in 2 bits) cmp eax,3 sete al ret I got this from the pristine code at Redmond. ![]() EDIT: comment on eax was wrong. fixed it now. Last edited by tthsqe on 07 Jun 2013, 19:55; edited 1 time in total |
|||
![]() |
|
AsmGuru62
Awesome!!
|
|||
![]() |
|
bitRAKE
That's odd how they mixed integer and floating point instructions - there really is no need, as it can be done with just integer SIMD. Not sure if the mixing penalty (pg 117) applies in this case.
Edit: Oh, they prolly wanted to just use plain SSE. Nah, Edit, edit: MOVDDUP is SSE3. Last edited by bitRAKE on 07 Jun 2013, 20:53; edited 2 times in total |
|||
![]() |
|
tthsqe
Oh man - i though it was clear i was joking. M$ does the straightforward 4x cmp+branch. I just cooked this up as a branchless example.
|
|||
![]() |
|
bitRAKE
Kudos to you then. I'm quite dry this early in the day - that went right over my head.
![]() |
|||
![]() |
|
baldr
revolution wrote: Can we assume this is optimised for size? Because I can't see how this could be considered optimised for anything else. ![]() revolution wrote: Okay, you are probably correct here for most contemporary CPUs in use today, but now you seem to have jumped into an optimise-for-speed mindset. And even then I seriously doubt whether any difference could be measurable between the two options. Aside from some very special purpose test code to show some minor difference such things are going to be inconsequential in the larger frame of reference. ![]() |
|||
![]() |
|
tthsqe
baldr,
Your solution for RectContainsPoint is very funny. I especially like the jump to middle of an already-decoded instruction. Could you try some self-modifying code next? |
|||
![]() |
|
baldr
tthsqe wrote: Could you try some self-modifying code next? ![]() Code: RectContainsPoint: pop eax ecx edx push eax ; get ready to fall through sub edx, ecx mov [disp32], edx or eax, -1 ; assume it's in call AoE? ; x>=left? add ecx, 4 call AoE? ; y>=top? add [disp32], 8 ; advance to right/bottom mov [tumbler], 0x90 ; turn 'AoE?' into 'B?' call B? ; y<bottom? sub ecx, 4 ; fall through; x<right? AoE?: B?: mov edx, [ecx] cmp edx, [ecx+128] ; 128 gets patched label disp32 dword at $-4 label tumbler byte cmc sbb edx, edx and eax, edx ret |
|||
![]() |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.