flat assembler
Message board for the users of flat assembler.

Index > Main > Quadrant of a point (x,y)

Author
Thread Post new topic Reply to topic
r22



Joined: 27 Dec 2004
Posts: 805
r22 15 Mar 2012, 16:19
A formula that seems pretty obvious and simple, but for some reason I found interesting. A 2d plane has 4 quadrants, I, II, III, IV.

(+, +) = 1
(+, -) = 4
(-, +) = 2
(-, -) = 3

The obvious solution is a couple of IF statements. Alas, boring and not optimal, everyone knows branching is for the plebeians.

My usual solution is throw a LUT at it and replace the IF statements with shifts to get the sign bit. Seems reasonable...

Code:
.code
Quad_LUT:
        MOV     eax, [esp+4]
        MOV     ecx, [esp+8]
        AND     ecx, 0x40000000
        SHR     eax, 31
        SHR     ecx, 30
        ADD     eax, ecx
        MOVZX     eax, byte[Quad_LUT_LUT + eax]
        RET     8

.data
Quad_LUT_LUT:        db 1,2,4,3
    


But come on... a memory look up for something that can only be four possible values. And a LUT means you can't push the algorithm onto SIMD and get the quadrants for giant arrays of points at once (for some odd reason you might want to do this). At least it will work on floating point numbers... NOT GOOD ENOUGH!

XOR seems like a good place to start (the stuff you can do with well placed XORs is practically magic right?!)

Just XOR the signs

(+, +) = 0
(+, -) = 1
(-, +) = 1
(-, -) = 0

XORing the signs gets the EVEN answers equal to 1 and the ODD answers equal to zero.

For (+, +) we have 0 but we want 1
For (+, -) we have 1 but we want 4
For (-, +) we have 1 but we want 2
For (-, -) we have 0 but we want 3

GASP! when Y is positive we are only 1 number away from what we want. When Y is negative we are 3 away. If we subtract 1 from both problems we are 0 away when Y is positive and 2 away when Y is negative. For everyone not following, ADDing 2 0's together is zero and adding 2 1's together is 2.

Code:
Quad:
        MOV     eax, [esp+4] ;; x
        MOV     ecx, [esp+8] ;; y
        SHR     eax, 31
        SHR     ecx, 31
        XOR     eax, ecx
        LEA     eax, [eax + ecx * 2 + 1]
        RET     8  
    


http://stackoverflow.com/questions/9718059/determining-the-quadrant-of-a-point I came up with the above staring at this question from StackOverflow. I was so enamored with my own ability to do simple binary maths that I had to post my answer on there as well.

It should go without saying, but no one is allowed to reply to this post (lest my ego be shattered) with the obligatory "What-his-name figured this simple trick out #-centuries ago" or "this has been on Wikipedia for #-years".


Last edited by r22 on 16 Mar 2012, 18:30; edited 1 time in total
Post 15 Mar 2012, 16:19
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 15 Mar 2012, 19:41
franckly, this algo cannot be more optimized, otherwise intel (or amd) implement it in silicon. Smile

but now it should be made in sse, to paralellise 4 quadrants computations. but if it is more than 24 instructions, it is pointless, but not with floating point.

thanks for this really cool trick, i am pretty sure i will need it one day for 2D and 3D computations.

what about a 3D quadrant?
in 3D, there are only 8 quadrants, and i suppose this test can really improve the 3D depth test, in order to determine who is behind who, first

cool Smile
Post 15 Mar 2012, 19:41
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 16 Mar 2012, 14:02
@edfed - I think the 3d version for Octants would probably be less efficient than the LUT... (at least for non parallel/SIMD implementations)

According to the internet, 3d *octants* are.
Quote:

first octant (+, +, +)
top-back-right (−, +, +)
top-back-left (−, −, +)
top-front-left (+, −, +)
bottom-front-left (+, −, −)
bottom-back-left (−, −, −)
bottom-back-right (−, +, −)
bottom-front-right (+, +, −)

http://en.wikipedia.org/wiki/Octant#Octant_in_three-dimensional_space
Post 16 Mar 2012, 14:02
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
bubach



Joined: 17 Sep 2004
Posts: 341
Location: Trollhättan, Sweden
bubach 17 Mar 2012, 02:30
excuse my ignorance, but what do you use this algorithm for? when is 2d x and y coordinates negative? :$
Post 17 Mar 2012, 02:30
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 17 Mar 2012, 03:30
(Hope you don't consider this a violation of your no-reply request)

Perhaps a more natural way to see what you have derived is just to construct a truth table of the required output (zero based) with the signs of x and y as inputs and then apply boolean algebra:
Code:
x y o1 o0
0 0  0  0
0 1  1  1
1 0  0  1
1 1  1  0


o1 = x'y + xy = y(x' + x) = y
o0 = x'y + xy' ; i.e. o0 = x XOR y    
Then, the multiplication in LEA is just the bit shifting of o1 required to make the output a single number.
Post 17 Mar 2012, 03:30
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 17 Mar 2012, 18:00
@LocoDelAssembly - essentially, but I skipped the algebra part because it's hard...

@bubach - absolutely nothing, also you get negatives when the X or Y axis breaks and the numbers fall through the other side (but this sound more like mysticism than maths, so perhaps faith is required to have negative numbers).

* I guess you could use this algorithm to quickly generate 2d space quad trees. You'd just have to transform your x,y to current thresholds.
Post 17 Mar 2012, 18:00
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
bubach



Joined: 17 Sep 2004
Posts: 341
Location: Trollhättan, Sweden
bubach 17 Mar 2012, 22:07
heheh, ok.. my first thought was collision detection in 2d games so you could see which way a collision came from.. Laughing

well, i guess mathematics just for fun is good too. Smile
Post 17 Mar 2012, 22:07
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 20 Mar 2012, 13:03
I guess SSE is just:
Code:
QuadSSE:
        mov     eax, [esp+4] ;; arr_x
        mov     ecx, [esp+8] ;; arr_y
        mov     edx, [esp+12] ;; result
        mov     ebx, [esp+16] ;; len (in bytes)
      @@:
        sub     ebx,16
        jc      .exit
        movdqa  xmm0,[eax]
        movdqa  xmm1,[ecx]
        pcmpeqd xmm2,xmm2
        psrld   xmm0,31
        psrld   xmm1,31
        psrld   xmm2,31
        pxor    xmm0,xmm1
        paddd   xmm0,xmm1
        paddd   xmm0,xmm1
        psubd   xmm0,xmm2
        movdqa  [edx],xmm0
        add     eax,16
        add     ecx,16
        add     edx,16
        jmp     @b
      .exit:
        ret     8
    

I tried retaining most of the registers although I wouldn't use it as is (a function) but inline it when needed. Usually in tight loops you need early exits and test results after every iteration.

I guess one optimization would be BT instead of MOV. That way only some ADC/SBB/SETC trickery is needed. I just can't figure it out yet.

EDIT: added incrementing of GPRs


Last edited by Madis731 on 18 Apr 2012, 17:57; edited 1 time in total
Post 20 Mar 2012, 13:03
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4354
Location: Now
edfed 20 Mar 2012, 19:44
Code:
xor eax,eax
bt [esp+4],31
adc al,0
bt [esp+8],31
adc ah,0
xor al,ah
shl ah,1
or al,ah
movzx eax,al
    

???
Post 20 Mar 2012, 19:44
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 21 Mar 2012, 14:21
@madis - I'd likely pad the arrays out to 32 bytes, interleave the instructions and move the creation of the +1 constant out of the loop.

Code:
QuadSSE:
        mov     eax, [esp+4] ;; arr_x
        mov     ecx, [esp+8] ;; arr_y
        mov     edx, [esp+12] ;; result
        mov     ebx, [esp+16] ;; len (in bytes)
        pcmpeqd xmm4,xmm4 ;; -1
        sub     ebx,32
        jc      .exit
        psrld   xmm4,31 ;; +1
      @@:
        movdqa  xmm0,[eax+ebx] ;; x
        movdqa  xmm2,[eax+ebx+16] ;; x_2
        psrld   xmm0,31 ;; x = x.sign
        movdqa  xmm1,[ecx+ebx] ;; y
        psrld   xmm2,31 ;; x_2 = x_2.sign
        movdqa  xmm3,[ecx+ebx+16] ;; y_2
        psrld   xmm1,31 ;; y = y.sign
        psrld   xmm3,31 ;; y_2 = y_2.sign

        pxor    xmm0,xmm1 ;;x = x xor y
        pxor    xmm2,xmm3 ;;x_2 = x_2 xor y_2

        paddd   xmm0,xmm4 ;;x = x + 1
        psubd   xmm2,xmm4

        paddd   xmm1,xmm1 ;;y = y * 2
        paddd   xmm3,xmm3

        paddd   xmm0,xmm1 ;;x = x + y
        paddd   xmm2,xmm3

        movdqa  [edx+ebx],xmm0
        movdqa  [edx+ebx+16],xmm2

        sub     ebx,32
        jc      @b
      .exit:
        ret     16
    

something like that


Last edited by r22 on 17 Dec 2013, 16:16; edited 1 time in total
Post 21 Mar 2012, 14:21
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 21 Mar 2012, 14:25
I'm curious how fast the VPGATHER avx2 instruction would be, if it would make the LUT solution viable even in SIMD land.
Post 21 Mar 2012, 14:25
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 29 Mar 2012, 20:26
I guess we'll be tracking quadrants really fast in 2013 Smile
Of course with AVX2 a lot of algorithms should be reviewed.
Post 29 Mar 2012, 20:26
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot 30 Mar 2012, 18:03
@edfed:

Right with you. With all the current hype surrounding sparse octree voxel engines, this could be interesting. I don't have much faith in VPGATHER, but I have faith in your genius. Fancy working this out for 3d octants? Wink
Post 30 Mar 2012, 18:03
View user's profile Send private message Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot 18 Apr 2012, 08:04
Never mind...

Code:
mov     eax, [esp+4]  ; x
mov     ebx, [esp+8]  ; y
mov     ecx, [esp+12] ; z
shr     eax, 31
shr     ebx, 31
shr     ecx, 31
xor     eax, ebx
xor     ebx, ecx
xor     eax, ecx
shl     ecx, 2
lea     eax, [eax+ebx*2+1]
add     eax, ecx
    


Easy to SIMD, and probably still better than thrashing the cache.
Post 18 Apr 2012, 08:04
View user's profile Send private message Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot 18 Apr 2012, 10:00
Erm, regarding the two vectorised examples posted by Madis731 and r22, have you forgotten to increment your pointers inside the loop?
Post 18 Apr 2012, 10:00
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 18 Apr 2012, 17:54
@tripledot: Yes, you are correct! Will fix mine.
Post 18 Apr 2012, 17:54
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
aq83326



Joined: 25 Jun 2011
Posts: 21
aq83326 29 Apr 2012, 05:41
first pass at changing quad code to octant, have done no math or checking
Code:
Octant:
        MOV     eax, [esp+4] ;; x
        MOV     ecx, [esp+8] ;; y
        MOV     ebx, [esp+12] ;; z
        SHR     eax, 31
        SHR     ecx, 31
        SHR     ebx, 31
        XOR     eax, ecx
        LEA     eax, [eax + ecx * 2 + 1]
        SHL     ebx 
        RET     12
    
Post 29 Apr 2012, 05:41
View user's profile Send private message Visit poster's website Reply with quote
tripledot



Joined: 06 Jan 2009
Posts: 49
tripledot 29 Apr 2012, 19:12
@aq83326: See my earlier post for the octant version. Your code fails; you haven't taken the z-coordinate into account - for octants (1,2,3,4,5,6,7,8...) your code returns (1,2,3,4,4,3,2,1).
Post 29 Apr 2012, 19:12
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.