flat assembler
Message board for the users of flat assembler.
Index
> Main > Quadrant of a point (x,y) 
Author 

r22 15 Mar 2012, 16:19
A formula that seems pretty obvious and simple, but for some reason I found interesting. A 2d plane has 4 quadrants, I, II, III, IV.
(+, +) = 1 (+, ) = 4 (, +) = 2 (, ) = 3 The obvious solution is a couple of IF statements. Alas, boring and not optimal, everyone knows branching is for the plebeians. My usual solution is throw a LUT at it and replace the IF statements with shifts to get the sign bit. Seems reasonable... Code: .code Quad_LUT: MOV eax, [esp+4] MOV ecx, [esp+8] AND ecx, 0x40000000 SHR eax, 31 SHR ecx, 30 ADD eax, ecx MOVZX eax, byte[Quad_LUT_LUT + eax] RET 8 .data Quad_LUT_LUT: db 1,2,4,3 But come on... a memory look up for something that can only be four possible values. And a LUT means you can't push the algorithm onto SIMD and get the quadrants for giant arrays of points at once (for some odd reason you might want to do this). At least it will work on floating point numbers... NOT GOOD ENOUGH! XOR seems like a good place to start (the stuff you can do with well placed XORs is practically magic right?!) Just XOR the signs (+, +) = 0 (+, ) = 1 (, +) = 1 (, ) = 0 XORing the signs gets the EVEN answers equal to 1 and the ODD answers equal to zero. For (+, +) we have 0 but we want 1 For (+, ) we have 1 but we want 4 For (, +) we have 1 but we want 2 For (, ) we have 0 but we want 3 GASP! when Y is positive we are only 1 number away from what we want. When Y is negative we are 3 away. If we subtract 1 from both problems we are 0 away when Y is positive and 2 away when Y is negative. For everyone not following, ADDing 2 0's together is zero and adding 2 1's together is 2. Code: Quad: MOV eax, [esp+4] ;; x MOV ecx, [esp+8] ;; y SHR eax, 31 SHR ecx, 31 XOR eax, ecx LEA eax, [eax + ecx * 2 + 1] RET 8 http://stackoverflow.com/questions/9718059/determiningthequadrantofapoint I came up with the above staring at this question from StackOverflow. I was so enamored with my own ability to do simple binary maths that I had to post my answer on there as well. It should go without saying, but no one is allowed to reply to this post (lest my ego be shattered) with the obligatory "Whathisname figured this simple trick out #centuries ago" or "this has been on Wikipedia for #years". Last edited by r22 on 16 Mar 2012, 18:30; edited 1 time in total 

15 Mar 2012, 16:19 

r22 16 Mar 2012, 14:02
@edfed  I think the 3d version for Octants would probably be less efficient than the LUT... (at least for non parallel/SIMD implementations)
According to the internet, 3d *octants* are. Quote:
http://en.wikipedia.org/wiki/Octant#Octant_in_threedimensional_space 

16 Mar 2012, 14:02 

bubach 17 Mar 2012, 02:30
excuse my ignorance, but what do you use this algorithm for? when is 2d x and y coordinates negative? :$


17 Mar 2012, 02:30 

LocoDelAssembly 17 Mar 2012, 03:30
(Hope you don't consider this a violation of your noreply request)
Perhaps a more natural way to see what you have derived is just to construct a truth table of the required output (zero based) with the signs of x and y as inputs and then apply boolean algebra: Code: x y o1 o0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 o1 = x'y + xy = y(x' + x) = y o0 = x'y + xy' ; i.e. o0 = x XOR y 

17 Mar 2012, 03:30 

r22 17 Mar 2012, 18:00
@LocoDelAssembly  essentially, but I skipped the algebra part because it's hard...
@bubach  absolutely nothing, also you get negatives when the X or Y axis breaks and the numbers fall through the other side (but this sound more like mysticism than maths, so perhaps faith is required to have negative numbers). * I guess you could use this algorithm to quickly generate 2d space quad trees. You'd just have to transform your x,y to current thresholds. 

17 Mar 2012, 18:00 

bubach 17 Mar 2012, 22:07
heheh, ok.. my first thought was collision detection in 2d games so you could see which way a collision came from..
well, i guess mathematics just for fun is good too. 

17 Mar 2012, 22:07 

Madis731 20 Mar 2012, 13:03
I guess SSE is just:
Code: QuadSSE: mov eax, [esp+4] ;; arr_x mov ecx, [esp+8] ;; arr_y mov edx, [esp+12] ;; result mov ebx, [esp+16] ;; len (in bytes) @@: sub ebx,16 jc .exit movdqa xmm0,[eax] movdqa xmm1,[ecx] pcmpeqd xmm2,xmm2 psrld xmm0,31 psrld xmm1,31 psrld xmm2,31 pxor xmm0,xmm1 paddd xmm0,xmm1 paddd xmm0,xmm1 psubd xmm0,xmm2 movdqa [edx],xmm0 add eax,16 add ecx,16 add edx,16 jmp @b .exit: ret 8 I tried retaining most of the registers although I wouldn't use it as is (a function) but inline it when needed. Usually in tight loops you need early exits and test results after every iteration. I guess one optimization would be BT instead of MOV. That way only some ADC/SBB/SETC trickery is needed. I just can't figure it out yet. EDIT: added incrementing of GPRs Last edited by Madis731 on 18 Apr 2012, 17:57; edited 1 time in total 

20 Mar 2012, 13:03 

edfed 20 Mar 2012, 19:44
Code: xor eax,eax bt [esp+4],31 adc al,0 bt [esp+8],31 adc ah,0 xor al,ah shl ah,1 or al,ah movzx eax,al ??? 

20 Mar 2012, 19:44 

r22 21 Mar 2012, 14:21
@madis  I'd likely pad the arrays out to 32 bytes, interleave the instructions and move the creation of the +1 constant out of the loop.
Code: QuadSSE: mov eax, [esp+4] ;; arr_x mov ecx, [esp+8] ;; arr_y mov edx, [esp+12] ;; result mov ebx, [esp+16] ;; len (in bytes) pcmpeqd xmm4,xmm4 ;; 1 sub ebx,32 jc .exit psrld xmm4,31 ;; +1 @@: movdqa xmm0,[eax+ebx] ;; x movdqa xmm2,[eax+ebx+16] ;; x_2 psrld xmm0,31 ;; x = x.sign movdqa xmm1,[ecx+ebx] ;; y psrld xmm2,31 ;; x_2 = x_2.sign movdqa xmm3,[ecx+ebx+16] ;; y_2 psrld xmm1,31 ;; y = y.sign psrld xmm3,31 ;; y_2 = y_2.sign pxor xmm0,xmm1 ;;x = x xor y pxor xmm2,xmm3 ;;x_2 = x_2 xor y_2 paddd xmm0,xmm4 ;;x = x + 1 psubd xmm2,xmm4 paddd xmm1,xmm1 ;;y = y * 2 paddd xmm3,xmm3 paddd xmm0,xmm1 ;;x = x + y paddd xmm2,xmm3 movdqa [edx+ebx],xmm0 movdqa [edx+ebx+16],xmm2 sub ebx,32 jc @b .exit: ret 16 something like that Last edited by r22 on 17 Dec 2013, 16:16; edited 1 time in total 

21 Mar 2012, 14:21 

r22 21 Mar 2012, 14:25
I'm curious how fast the VPGATHER avx2 instruction would be, if it would make the LUT solution viable even in SIMD land.


21 Mar 2012, 14:25 

Madis731 29 Mar 2012, 20:26
I guess we'll be tracking quadrants really fast in 2013
Of course with AVX2 a lot of algorithms should be reviewed. 

29 Mar 2012, 20:26 

tripledot 30 Mar 2012, 18:03
@edfed:
Right with you. With all the current hype surrounding sparse octree voxel engines, this could be interesting. I don't have much faith in VPGATHER, but I have faith in your genius. Fancy working this out for 3d octants? 

30 Mar 2012, 18:03 

tripledot 18 Apr 2012, 08:04
Never mind...
Code: mov eax, [esp+4] ; x mov ebx, [esp+8] ; y mov ecx, [esp+12] ; z shr eax, 31 shr ebx, 31 shr ecx, 31 xor eax, ebx xor ebx, ecx xor eax, ecx shl ecx, 2 lea eax, [eax+ebx*2+1] add eax, ecx Easy to SIMD, and probably still better than thrashing the cache. 

18 Apr 2012, 08:04 

tripledot 18 Apr 2012, 10:00
Erm, regarding the two vectorised examples posted by Madis731 and r22, have you forgotten to increment your pointers inside the loop?


18 Apr 2012, 10:00 

Madis731 18 Apr 2012, 17:54
@tripledot: Yes, you are correct! Will fix mine.


18 Apr 2012, 17:54 

aq83326 29 Apr 2012, 05:41
first pass at changing quad code to octant, have done no math or checking
Code: Octant: MOV eax, [esp+4] ;; x MOV ecx, [esp+8] ;; y MOV ebx, [esp+12] ;; z SHR eax, 31 SHR ecx, 31 SHR ebx, 31 XOR eax, ecx LEA eax, [eax + ecx * 2 + 1] SHL ebx RET 12 

29 Apr 2012, 05:41 

tripledot 29 Apr 2012, 19:12
@aq83326: See my earlier post for the octant version. Your code fails; you haven't taken the zcoordinate into account  for octants (1,2,3,4,5,6,7,8...) your code returns (1,2,3,4,4,3,2,1).


29 Apr 2012, 19:12 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.