flat assembler
Message board for the users of flat assembler.
Index
> Main > Indirect addressing Goto page Previous 1, 2 |
Author |
|
AsmGuru62 07 Jun 2013, 15:24
CMPSD is a nice trick!
It works because POINT members are aligned (by meaning) with RECT members. Very cool! In CPU Manuals, however, it is pointed out, that CALL should be paired with RET to get better performance. But, still, baldr -- you're an awesome coder!![/b] |
|||
07 Jun 2013, 15:24 |
|
Sasha 07 Jun 2013, 16:15
How much things can be learned from this short example!
|
|||
07 Jun 2013, 16:15 |
|
AsmGuru62 07 Jun 2013, 17:12
Yes, it is true.
However, I just want to point out that code must be optimized, but sometimes the time spent for optimization is not worth the win you get from that optimization. So, do not fall into a trap of premature optimization. I think, optimization should be done for some cases where code processes a lot of data. |
|||
07 Jun 2013, 17:12 |
|
revolution 07 Jun 2013, 18:01
baldr wrote: Optimization and readability are somewhat opposite (orthogonal at least), but you may mend this with proper comments. Let's consider unsigned coordinates (and eax==-1 for indication of success): baldr wrote: xor/dec is fine, but it can stall execution later, due to partial EFLAGS update by dec. |
|||
07 Jun 2013, 18:01 |
|
bitRAKE 07 Jun 2013, 18:29
Size optimization would use the API, and speed optimization wouldn't have a branch.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
07 Jun 2013, 18:29 |
|
AsmGuru62 07 Jun 2013, 19:17
I wonder how it can be done with no branches.
We need to make 4 comparisons at the most, so we have to somehow transfer the code to next comparisons if previous ones did not reach the decision yet. |
|||
07 Jun 2013, 19:17 |
|
tthsqe 07 Jun 2013, 19:36
Code: Point: .x dd ? .y dd ? Rect: .left dd ? .top dd ? .right dd ? .bottom dd ? IsPointInRectangle: ; return 1 for true movaps xmm0,[Rect] ; xmm0 = l | t | r | b movddup xmm1,[Point] ; xmm1 = x | y | x | y pcmpgtd xmm0,xmm1 ; xmm0 = l>x | t>y | r>x | b>y movmskps ecx,xmm0 ; ecx = l>x | t>y | r>x | b>y (in 4 bits) mov eax,ecx shr ecx,2 xor eax,ecx and eax,3 ; eax = l <= x < r | t <= y < b (in 2 bits) cmp eax,3 sete al ret I got this from the pristine code at Redmond. EDIT: comment on eax was wrong. fixed it now. Last edited by tthsqe on 07 Jun 2013, 19:55; edited 1 time in total |
|||
07 Jun 2013, 19:36 |
|
AsmGuru62 07 Jun 2013, 19:53
Awesome!!
|
|||
07 Jun 2013, 19:53 |
|
bitRAKE 07 Jun 2013, 20:13
That's odd how they mixed integer and floating point instructions - there really is no need, as it can be done with just integer SIMD. Not sure if the mixing penalty (pg 117) applies in this case.
Edit: Oh, they prolly wanted to just use plain SSE. Nah, Edit, edit: MOVDDUP is SSE3. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup Last edited by bitRAKE on 07 Jun 2013, 20:53; edited 2 times in total |
|||
07 Jun 2013, 20:13 |
|
tthsqe 07 Jun 2013, 20:21
Oh man - i though it was clear i was joking. M$ does the straightforward 4x cmp+branch. I just cooked this up as a branchless example.
|
|||
07 Jun 2013, 20:21 |
|
bitRAKE 07 Jun 2013, 20:26
Kudos to you then. I'm quite dry this early in the day - that went right over my head.
|
|||
07 Jun 2013, 20:26 |
|
baldr 08 Jun 2013, 02:17
revolution wrote: Can we assume this is optimised for size? Because I can't see how this could be considered optimised for anything else. revolution wrote: Okay, you are probably correct here for most contemporary CPUs in use today, but now you seem to have jumped into an optimise-for-speed mindset. And even then I seriously doubt whether any difference could be measurable between the two options. Aside from some very special purpose test code to show some minor difference such things are going to be inconsequential in the larger frame of reference. |
|||
08 Jun 2013, 02:17 |
|
tthsqe 08 Jun 2013, 02:28
baldr,
Your solution for RectContainsPoint is very funny. I especially like the jump to middle of an already-decoded instruction. Could you try some self-modifying code next? |
|||
08 Jun 2013, 02:28 |
|
baldr 09 Jun 2013, 08:56
tthsqe wrote: Could you try some self-modifying code next? Code: RectContainsPoint: pop eax ecx edx push eax ; get ready to fall through sub edx, ecx mov [disp32], edx or eax, -1 ; assume it's in call AoE? ; x>=left? add ecx, 4 call AoE? ; y>=top? add [disp32], 8 ; advance to right/bottom mov [tumbler], 0x90 ; turn 'AoE?' into 'B?' call B? ; y<bottom? sub ecx, 4 ; fall through; x<right? AoE?: B?: mov edx, [ecx] cmp edx, [ecx+128] ; 128 gets patched label disp32 dword at $-4 label tumbler byte cmc sbb edx, edx and eax, edx ret |
|||
09 Jun 2013, 08:56 |
|
Goto page Previous 1, 2 < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.