flat assembler
Message board for the users of flat assembler.

Index > Main > Indirect addressing

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
Sasha,

Optimization and readability are somewhat opposite (orthogonal at least), but you may mend this with proper comments. Let's consider unsigned coordinates (and eax==-1 for indication of success):
Code:
RectContainsPoint:
        pop     edx ecx eax     ; retaddr, pRect, pPoint
        xchg    esi, eax        ; prepare for 'cmps',
        xchg    edi, ecx        ; saving esi & edi too
        cmpsd
        jb      .notCF          ; x<left? return !CF
        cmpsd
        jb      .notCF          ; y<top? return !CF
        sub     esi, 8          ; get back to the point
        cmpsd
        jnb     .CF             ; x>=right? return CF
        cmpsd
        db      0x8B            ; skip 'cmc'; actually 'mov esi, ebp'
.notCF: cmc
.CF:    mov     esi, eax        ; restore esi & edi
        mov     edi, ecx        ; to conform with 'stdcall'
        sbb     eax, eax        ; transfer CF into eax; drop it if CF alone suffice
        jmp     edx    
xor/dec is fine, but it can stall execution later, due to partial EFLAGS update by dec.
Post 07 Jun 2013, 03:04
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
CMPSD is a nice trick!
It works because POINT members are aligned (by meaning) with RECT members. Very cool!
In CPU Manuals, however, it is pointed out, that CALL should be paired with RET to get better performance.
But, still, baldr -- you're an awesome coder!![/b]
Post 07 Jun 2013, 15:24
View user's profile Send private message Send e-mail Reply with quote
Sasha



Joined: 17 Nov 2011
Posts: 93
Sasha
How much things can be learned from this short example!
Post 07 Jun 2013, 16:15
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
Yes, it is true.

However, I just want to point out that code must be optimized, but sometimes the time spent for optimization
is not worth the win you get from that optimization.
So, do not fall into a trap of premature optimization.
I think, optimization should be done for some cases where code processes a lot of data.
Post 07 Jun 2013, 17:12
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
baldr wrote:
Optimization and readability are somewhat opposite (orthogonal at least), but you may mend this with proper comments. Let's consider unsigned coordinates (and eax==-1 for indication of success):
Code:
RectContainsPoint:
        pop     edx ecx eax     ; retaddr, pRect, pPoint
        xchg    esi, eax        ; prepare for 'cmps',
        xchg    edi, ecx        ; saving esi & edi too
        cmpsd
        jb      .notCF          ; x<left? return !CF
        cmpsd
        jb      .notCF          ; y<top? return !CF
        sub     esi, 8          ; get back to the point
        cmpsd
        jnb     .CF             ; x>=right? return CF
        cmpsd
        db      0x8B            ; skip 'cmc'; actually 'mov esi, ebp'
.notCF: cmc
.CF:    mov     esi, eax        ; restore esi & edi
        mov     edi, ecx        ; to conform with 'stdcall'
        sbb     eax, eax        ; transfer CF into eax; drop it if CF alone suffice
        jmp     edx    
Can we assume this is optimised for size? Because I can't see how this could be considered optimised for anything else. It certainly is not going to be particularly efficient with use of cmpsd and jmp edx. And I wonder how the CPU will cope with the decoder having to redo the instruction boundary computations when you jump directly into the middle of the 'mov esi, ebp' instruction!
baldr wrote:
xor/dec is fine, but it can stall execution later, due to partial EFLAGS update by dec.
Okay, you are probably correct here for most contemporary CPUs in use today, but now you seem to have jumped into an optimise-for-speed mindset. And even then I seriously doubt whether any difference could be measurable between the two options. Aside from some very special purpose test code to show some minor difference such things are going to be inconsequential in the larger frame of reference.
Post 07 Jun 2013, 18:01
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2887
Location: [RSP+8*5]
bitRAKE
Size optimization would use the API, and speed optimization wouldn't have a branch.

_________________
¯\(°_o)/¯ unlicense.org
Post 07 Jun 2013, 18:29
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
I wonder how it can be done with no branches.
We need to make 4 comparisons at the most, so we have to somehow transfer
the code to next comparisons if previous ones did not reach the decision yet.
Post 07 Jun 2013, 19:17
View user's profile Send private message Send e-mail Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 724
tthsqe
Code:
Point:
  .x       dd ?
  .y       dd ?

Rect:
  .left    dd ?
  .top     dd ?
  .right   dd ?
  .bottom  dd ?

IsPointInRectangle:     ; return 1 for true
movaps   xmm0,[Rect]            ; xmm0 = l   | t    | r     | b
movddup  xmm1,[Point]           ; xmm1 = x   | y    | x     | y
pcmpgtd  xmm0,xmm1              ; xmm0 = l>x | t>y  | r>x   | b>y
movmskps ecx,xmm0               ; ecx  = l>x | t>y  | r>x   | b>y    (in 4 bits)
mov      eax,ecx                  
shr     ecx,2
xor     eax,ecx                 
and     eax,3                     ; eax  =   l <= x < r   | t <= y < b   (in 2 bits)
cmp     eax,3
sete    al
ret    

I got this from the pristine code at Redmond. Laughing

EDIT: comment on eax was wrong. fixed it now.


Last edited by tthsqe on 07 Jun 2013, 19:55; edited 1 time in total
Post 07 Jun 2013, 19:36
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
Awesome!!
Post 07 Jun 2013, 19:53
View user's profile Send private message Send e-mail Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2887
Location: [RSP+8*5]
bitRAKE
That's odd how they mixed integer and floating point instructions - there really is no need, as it can be done with just integer SIMD. Not sure if the mixing penalty (pg 117) applies in this case.

Edit: Oh, they prolly wanted to just use plain SSE. Nah,

Edit, edit: MOVDDUP is SSE3.

_________________
¯\(°_o)/¯ unlicense.org


Last edited by bitRAKE on 07 Jun 2013, 20:53; edited 2 times in total
Post 07 Jun 2013, 20:13
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 724
tthsqe
Oh man - i though it was clear i was joking. M$ does the straightforward 4x cmp+branch. I just cooked this up as a branchless example.
Post 07 Jun 2013, 20:21
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2887
Location: [RSP+8*5]
bitRAKE
Kudos to you then. I'm quite dry this early in the day - that went right over my head. Very Happy
Post 07 Jun 2013, 20:26
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
revolution wrote:
Can we assume this is optimised for size? Because I can't see how this could be considered optimised for anything else.
It was optimized for minimal readability. Wink
revolution wrote:
Okay, you are probably correct here for most contemporary CPUs in use today, but now you seem to have jumped into an optimise-for-speed mindset. And even then I seriously doubt whether any difference could be measurable between the two options. Aside from some very special purpose test code to show some minor difference such things are going to be inconsequential in the larger frame of reference.
I didn't wrote that it must or should, it simply can. And yes, I'm good with jumping. Wink
Post 08 Jun 2013, 02:17
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 724
tthsqe
baldr,
Your solution for RectContainsPoint is very funny. I especially like the jump to middle of an already-decoded instruction. Could you try some self-modifying code next?
Post 08 Jun 2013, 02:28
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
tthsqe wrote:
Could you try some self-modifying code next?
As you wish, sire! Wink
Code:
RectContainsPoint:
        pop     eax ecx edx
        push    eax                     ; get ready to fall through
        sub     edx, ecx
        mov     [disp32], edx
        or      eax, -1                 ; assume it's in
        call    AoE?                    ; x>=left?
        add     ecx, 4
        call    AoE?                    ; y>=top?
        add     [disp32], 8             ; advance to right/bottom
        mov     [tumbler], 0x90         ; turn 'AoE?' into 'B?'
        call    B?                      ; y<bottom?
        sub     ecx, 4
                                        ; fall through; x<right?
AoE?: B?:
        mov     edx, [ecx]
        cmp     edx, [ecx+128]          ; 128 gets patched
label disp32 dword at $-4
label tumbler byte
        cmc
        sbb     edx, edx
        and     eax, edx
        ret    
Post 09 Jun 2013, 08:56
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.