flat assembler
Message board for the users of flat assembler.

 Index > Main > Quadrant of a point (x,y)
Author
r22

Joined: 27 Dec 2004
Posts: 805
r22
A formula that seems pretty obvious and simple, but for some reason I found interesting. A 2d plane has 4 quadrants, I, II, III, IV.

(+, +) = 1
(+, -) = 4
(-, +) = 2
(-, -) = 3

The obvious solution is a couple of IF statements. Alas, boring and not optimal, everyone knows branching is for the plebeians.

My usual solution is throw a LUT at it and replace the IF statements with shifts to get the sign bit. Seems reasonable...

Code:
```.code
MOV     eax, [esp+4]
MOV     ecx, [esp+8]
AND     ecx, 0x40000000
SHR     eax, 31
SHR     ecx, 30
MOVZX     eax, byte[Quad_LUT_LUT + eax]
RET     8

.data
```

But come on... a memory look up for something that can only be four possible values. And a LUT means you can't push the algorithm onto SIMD and get the quadrants for giant arrays of points at once (for some odd reason you might want to do this). At least it will work on floating point numbers... NOT GOOD ENOUGH!

XOR seems like a good place to start (the stuff you can do with well placed XORs is practically magic right?!)

Just XOR the signs

(+, +) = 0
(+, -) = 1
(-, +) = 1
(-, -) = 0

XORing the signs gets the EVEN answers equal to 1 and the ODD answers equal to zero.

For (+, +) we have 0 but we want 1
For (+, -) we have 1 but we want 4
For (-, +) we have 1 but we want 2
For (-, -) we have 0 but we want 3

GASP! when Y is positive we are only 1 number away from what we want. When Y is negative we are 3 away. If we subtract 1 from both problems we are 0 away when Y is positive and 2 away when Y is negative. For everyone not following, ADDing 2 0's together is zero and adding 2 1's together is 2.

Code:
```Quad:
MOV     eax, [esp+4] ;; x
MOV     ecx, [esp+8] ;; y
SHR     eax, 31
SHR     ecx, 31
XOR     eax, ecx
LEA     eax, [eax + ecx * 2 + 1]
RET     8
```

http://stackoverflow.com/questions/9718059/determining-the-quadrant-of-a-point I came up with the above staring at this question from StackOverflow. I was so enamored with my own ability to do simple binary maths that I had to post my answer on there as well.

It should go without saying, but no one is allowed to reply to this post (lest my ego be shattered) with the obligatory "What-his-name figured this simple trick out #-centuries ago" or "this has been on Wikipedia for #-years".

Last edited by r22 on 16 Mar 2012, 18:30; edited 1 time in total
15 Mar 2012, 16:19
edfed

Joined: 20 Feb 2006
Posts: 4267
Location: Now
edfed
franckly, this algo cannot be more optimized, otherwise intel (or amd) implement it in silicon.

but now it should be made in sse, to paralellise 4 quadrants computations. but if it is more than 24 instructions, it is pointless, but not with floating point.

thanks for this really cool trick, i am pretty sure i will need it one day for 2D and 3D computations.

in 3D, there are only 8 quadrants, and i suppose this test can really improve the 3D depth test, in order to determine who is behind who, first

cool
15 Mar 2012, 19:41
r22

Joined: 27 Dec 2004
Posts: 805
r22
@edfed - I think the 3d version for Octants would probably be less efficient than the LUT... (at least for non parallel/SIMD implementations)

According to the internet, 3d *octants* are.
Quote:

first octant (+, +, +)
top-back-right (−, +, +)
top-back-left (−, −, +)
top-front-left (+, −, +)
bottom-front-left (+, −, −)
bottom-back-left (−, −, −)
bottom-back-right (−, +, −)
bottom-front-right (+, +, −)

http://en.wikipedia.org/wiki/Octant#Octant_in_three-dimensional_space
16 Mar 2012, 14:02
bubach

Joined: 17 Sep 2004
Posts: 341
Location: TrollhÃ¤ttan, Sweden
bubach
excuse my ignorance, but what do you use this algorithm for? when is 2d x and y coordinates negative? :\$
17 Mar 2012, 02:30
LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly
(Hope you don't consider this a violation of your no-reply request)

Perhaps a more natural way to see what you have derived is just to construct a truth table of the required output (zero based) with the signs of x and y as inputs and then apply boolean algebra:
Code:
```x y o1 o0
0 0  0  0
0 1  1  1
1 0  0  1
1 1  1  0

o1 = x'y + xy = y(x' + x) = y
o0 = x'y + xy' ; i.e. o0 = x XOR y    ```
Then, the multiplication in LEA is just the bit shifting of o1 required to make the output a single number.
17 Mar 2012, 03:30
r22

Joined: 27 Dec 2004
Posts: 805
r22
@LocoDelAssembly - essentially, but I skipped the algebra part because it's hard...

@bubach - absolutely nothing, also you get negatives when the X or Y axis breaks and the numbers fall through the other side (but this sound more like mysticism than maths, so perhaps faith is required to have negative numbers).

* I guess you could use this algorithm to quickly generate 2d space quad trees. You'd just have to transform your x,y to current thresholds.
17 Mar 2012, 18:00
bubach

Joined: 17 Sep 2004
Posts: 341
Location: TrollhÃ¤ttan, Sweden
bubach
heheh, ok.. my first thought was collision detection in 2d games so you could see which way a collision came from..

well, i guess mathematics just for fun is good too.
17 Mar 2012, 22:07

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
I guess SSE is just:
Code:
```QuadSSE:
mov     eax, [esp+4] ;; arr_x
mov     ecx, [esp+8] ;; arr_y
mov     edx, [esp+12] ;; result
mov     ebx, [esp+16] ;; len (in bytes)
@@:
sub     ebx,16
jc      .exit
movdqa  xmm0,[eax]
movdqa  xmm1,[ecx]
pcmpeqd xmm2,xmm2
psrld   xmm0,31
psrld   xmm1,31
psrld   xmm2,31
pxor    xmm0,xmm1
psubd   xmm0,xmm2
movdqa  [edx],xmm0
jmp     @b
.exit:
ret     8
```

I tried retaining most of the registers although I wouldn't use it as is (a function) but inline it when needed. Usually in tight loops you need early exits and test results after every iteration.

I guess one optimization would be BT instead of MOV. That way only some ADC/SBB/SETC trickery is needed. I just can't figure it out yet.

EDIT: added incrementing of GPRs

Last edited by Madis731 on 18 Apr 2012, 17:57; edited 1 time in total
20 Mar 2012, 13:03
edfed

Joined: 20 Feb 2006
Posts: 4267
Location: Now
edfed
Code:
```xor eax,eax
bt [esp+4],31
bt [esp+8],31
xor al,ah
shl ah,1
or al,ah
movzx eax,al
```

???
20 Mar 2012, 19:44
r22

Joined: 27 Dec 2004
Posts: 805
r22
@madis - I'd likely pad the arrays out to 32 bytes, interleave the instructions and move the creation of the +1 constant out of the loop.

Code:
```QuadSSE:
mov     eax, [esp+4] ;; arr_x
mov     ecx, [esp+8] ;; arr_y
mov     edx, [esp+12] ;; result
mov     ebx, [esp+16] ;; len (in bytes)
pcmpeqd xmm4,xmm4 ;; -1
sub     ebx,32
jc      .exit
psrld   xmm4,31 ;; +1
@@:
movdqa  xmm0,[eax+ebx] ;; x
movdqa  xmm2,[eax+ebx+16] ;; x_2
psrld   xmm0,31 ;; x = x.sign
movdqa  xmm1,[ecx+ebx] ;; y
psrld   xmm2,31 ;; x_2 = x_2.sign
movdqa  xmm3,[ecx+ebx+16] ;; y_2
psrld   xmm1,31 ;; y = y.sign
psrld   xmm3,31 ;; y_2 = y_2.sign

pxor    xmm0,xmm1 ;;x = x xor y
pxor    xmm2,xmm3 ;;x_2 = x_2 xor y_2

paddd   xmm0,xmm4 ;;x = x + 1
psubd   xmm2,xmm4

paddd   xmm1,xmm1 ;;y = y * 2

paddd   xmm0,xmm1 ;;x = x + y

movdqa  [edx+ebx],xmm0
movdqa  [edx+ebx+16],xmm2

sub     ebx,32
jc      @b
.exit:
ret     16
```

something like that

Last edited by r22 on 17 Dec 2013, 16:16; edited 1 time in total
21 Mar 2012, 14:21
r22

Joined: 27 Dec 2004
Posts: 805
r22
I'm curious how fast the VPGATHER avx2 instruction would be, if it would make the LUT solution viable even in SIMD land.
21 Mar 2012, 14:25

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
I guess we'll be tracking quadrants really fast in 2013
Of course with AVX2 a lot of algorithms should be reviewed.
29 Mar 2012, 20:26
tripledot

Joined: 06 Jan 2009
Posts: 49
tripledot
@edfed:

Right with you. With all the current hype surrounding sparse octree voxel engines, this could be interesting. I don't have much faith in VPGATHER, but I have faith in your genius. Fancy working this out for 3d octants?
30 Mar 2012, 18:03
tripledot

Joined: 06 Jan 2009
Posts: 49
tripledot
Never mind...

Code:
```mov     eax, [esp+4]  ; x
mov     ebx, [esp+8]  ; y
mov     ecx, [esp+12] ; z
shr     eax, 31
shr     ebx, 31
shr     ecx, 31
xor     eax, ebx
xor     ebx, ecx
xor     eax, ecx
shl     ecx, 2
lea     eax, [eax+ebx*2+1]
```

Easy to SIMD, and probably still better than thrashing the cache.
18 Apr 2012, 08:04
tripledot

Joined: 06 Jan 2009
Posts: 49
tripledot
Erm, regarding the two vectorised examples posted by Madis731 and r22, have you forgotten to increment your pointers inside the loop?
18 Apr 2012, 10:00

Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
@tripledot: Yes, you are correct! Will fix mine.
18 Apr 2012, 17:54
aq83326

Joined: 25 Jun 2011
Posts: 21
aq83326
first pass at changing quad code to octant, have done no math or checking
Code:
```Octant:
MOV     eax, [esp+4] ;; x
MOV     ecx, [esp+8] ;; y
MOV     ebx, [esp+12] ;; z
SHR     eax, 31
SHR     ecx, 31
SHR     ebx, 31
XOR     eax, ecx
LEA     eax, [eax + ecx * 2 + 1]
SHL     ebx
RET     12
```
29 Apr 2012, 05:41
tripledot

Joined: 06 Jan 2009
Posts: 49
tripledot
@aq83326: See my earlier post for the octant version. Your code fails; you haven't taken the z-coordinate into account - for octants (1,2,3,4,5,6,7,8...) your code returns (1,2,3,4,4,3,2,1).
29 Apr 2012, 19:12
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum