flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
r22 15 Mar 2012, 16:19
A formula that seems pretty obvious and simple, but for some reason I found interesting. A 2d plane has 4 quadrants, I, II, III, IV.
(+, +) = 1 (+, -) = 4 (-, +) = 2 (-, -) = 3 The obvious solution is a couple of IF statements. Alas, boring and not optimal, everyone knows branching is for the plebeians. My usual solution is throw a LUT at it and replace the IF statements with shifts to get the sign bit. Seems reasonable... Code: .code Quad_LUT: MOV eax, [esp+4] MOV ecx, [esp+8] AND ecx, 0x40000000 SHR eax, 31 SHR ecx, 30 ADD eax, ecx MOVZX eax, byte[Quad_LUT_LUT + eax] RET 8 .data Quad_LUT_LUT: db 1,2,4,3 But come on... a memory look up for something that can only be four possible values. And a LUT means you can't push the algorithm onto SIMD and get the quadrants for giant arrays of points at once (for some odd reason you might want to do this). At least it will work on floating point numbers... NOT GOOD ENOUGH! XOR seems like a good place to start (the stuff you can do with well placed XORs is practically magic right?!) Just XOR the signs (+, +) = 0 (+, -) = 1 (-, +) = 1 (-, -) = 0 XORing the signs gets the EVEN answers equal to 1 and the ODD answers equal to zero. For (+, +) we have 0 but we want 1 For (+, -) we have 1 but we want 4 For (-, +) we have 1 but we want 2 For (-, -) we have 0 but we want 3 GASP! when Y is positive we are only 1 number away from what we want. When Y is negative we are 3 away. If we subtract 1 from both problems we are 0 away when Y is positive and 2 away when Y is negative. For everyone not following, ADDing 2 0's together is zero and adding 2 1's together is 2. Code: Quad: MOV eax, [esp+4] ;; x MOV ecx, [esp+8] ;; y SHR eax, 31 SHR ecx, 31 XOR eax, ecx LEA eax, [eax + ecx * 2 + 1] RET 8 http://stackoverflow.com/questions/9718059/determining-the-quadrant-of-a-point I came up with the above staring at this question from StackOverflow. I was so enamored with my own ability to do simple binary maths that I had to post my answer on there as well. It should go without saying, but no one is allowed to reply to this post (lest my ego be shattered) with the obligatory "What-his-name figured this simple trick out #-centuries ago" or "this has been on Wikipedia for #-years". Last edited by r22 on 16 Mar 2012, 18:30; edited 1 time in total |
|||
![]() |
|
r22 16 Mar 2012, 14:02
@edfed - I think the 3d version for Octants would probably be less efficient than the LUT... (at least for non parallel/SIMD implementations)
According to the internet, 3d *octants* are. Quote:
http://en.wikipedia.org/wiki/Octant#Octant_in_three-dimensional_space |
|||
![]() |
|
bubach 17 Mar 2012, 02:30
excuse my ignorance, but what do you use this algorithm for? when is 2d x and y coordinates negative? :$
|
|||
![]() |
|
LocoDelAssembly 17 Mar 2012, 03:30
(Hope you don't consider this a violation of your no-reply request)
Perhaps a more natural way to see what you have derived is just to construct a truth table of the required output (zero based) with the signs of x and y as inputs and then apply boolean algebra: Code: x y o1 o0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 o1 = x'y + xy = y(x' + x) = y o0 = x'y + xy' ; i.e. o0 = x XOR y |
|||
![]() |
|
r22 17 Mar 2012, 18:00
@LocoDelAssembly - essentially, but I skipped the algebra part because it's hard...
@bubach - absolutely nothing, also you get negatives when the X or Y axis breaks and the numbers fall through the other side (but this sound more like mysticism than maths, so perhaps faith is required to have negative numbers). * I guess you could use this algorithm to quickly generate 2d space quad trees. You'd just have to transform your x,y to current thresholds. |
|||
![]() |
|
bubach 17 Mar 2012, 22:07
heheh, ok.. my first thought was collision detection in 2d games so you could see which way a collision came from..
![]() well, i guess mathematics just for fun is good too. ![]() |
|||
![]() |
|
Madis731 20 Mar 2012, 13:03
I guess SSE is just:
Code: QuadSSE: mov eax, [esp+4] ;; arr_x mov ecx, [esp+8] ;; arr_y mov edx, [esp+12] ;; result mov ebx, [esp+16] ;; len (in bytes) @@: sub ebx,16 jc .exit movdqa xmm0,[eax] movdqa xmm1,[ecx] pcmpeqd xmm2,xmm2 psrld xmm0,31 psrld xmm1,31 psrld xmm2,31 pxor xmm0,xmm1 paddd xmm0,xmm1 paddd xmm0,xmm1 psubd xmm0,xmm2 movdqa [edx],xmm0 add eax,16 add ecx,16 add edx,16 jmp @b .exit: ret 8 I tried retaining most of the registers although I wouldn't use it as is (a function) but inline it when needed. Usually in tight loops you need early exits and test results after every iteration. I guess one optimization would be BT instead of MOV. That way only some ADC/SBB/SETC trickery is needed. I just can't figure it out yet. EDIT: added incrementing of GPRs Last edited by Madis731 on 18 Apr 2012, 17:57; edited 1 time in total |
|||
![]() |
|
edfed 20 Mar 2012, 19:44
Code: xor eax,eax bt [esp+4],31 adc al,0 bt [esp+8],31 adc ah,0 xor al,ah shl ah,1 or al,ah movzx eax,al ??? |
|||
![]() |
|
r22 21 Mar 2012, 14:21
@madis - I'd likely pad the arrays out to 32 bytes, interleave the instructions and move the creation of the +1 constant out of the loop.
Code: QuadSSE: mov eax, [esp+4] ;; arr_x mov ecx, [esp+8] ;; arr_y mov edx, [esp+12] ;; result mov ebx, [esp+16] ;; len (in bytes) pcmpeqd xmm4,xmm4 ;; -1 sub ebx,32 jc .exit psrld xmm4,31 ;; +1 @@: movdqa xmm0,[eax+ebx] ;; x movdqa xmm2,[eax+ebx+16] ;; x_2 psrld xmm0,31 ;; x = x.sign movdqa xmm1,[ecx+ebx] ;; y psrld xmm2,31 ;; x_2 = x_2.sign movdqa xmm3,[ecx+ebx+16] ;; y_2 psrld xmm1,31 ;; y = y.sign psrld xmm3,31 ;; y_2 = y_2.sign pxor xmm0,xmm1 ;;x = x xor y pxor xmm2,xmm3 ;;x_2 = x_2 xor y_2 paddd xmm0,xmm4 ;;x = x + 1 psubd xmm2,xmm4 paddd xmm1,xmm1 ;;y = y * 2 paddd xmm3,xmm3 paddd xmm0,xmm1 ;;x = x + y paddd xmm2,xmm3 movdqa [edx+ebx],xmm0 movdqa [edx+ebx+16],xmm2 sub ebx,32 jc @b .exit: ret 16 something like that Last edited by r22 on 17 Dec 2013, 16:16; edited 1 time in total |
|||
![]() |
|
r22 21 Mar 2012, 14:25
I'm curious how fast the VPGATHER avx2 instruction would be, if it would make the LUT solution viable even in SIMD land.
|
|||
![]() |
|
Madis731 29 Mar 2012, 20:26
I guess we'll be tracking quadrants really fast in 2013
![]() Of course with AVX2 a lot of algorithms should be reviewed. |
|||
![]() |
|
tripledot 30 Mar 2012, 18:03
@edfed:
Right with you. With all the current hype surrounding sparse octree voxel engines, this could be interesting. I don't have much faith in VPGATHER, but I have faith in your genius. Fancy working this out for 3d octants? ![]() |
|||
![]() |
|
tripledot 18 Apr 2012, 08:04
Never mind...
Code: mov eax, [esp+4] ; x mov ebx, [esp+8] ; y mov ecx, [esp+12] ; z shr eax, 31 shr ebx, 31 shr ecx, 31 xor eax, ebx xor ebx, ecx xor eax, ecx shl ecx, 2 lea eax, [eax+ebx*2+1] add eax, ecx Easy to SIMD, and probably still better than thrashing the cache. |
|||
![]() |
|
tripledot 18 Apr 2012, 10:00
Erm, regarding the two vectorised examples posted by Madis731 and r22, have you forgotten to increment your pointers inside the loop?
|
|||
![]() |
|
Madis731 18 Apr 2012, 17:54
@tripledot: Yes, you are correct! Will fix mine.
|
|||
![]() |
|
aq83326 29 Apr 2012, 05:41
first pass at changing quad code to octant, have done no math or checking
Code: Octant: MOV eax, [esp+4] ;; x MOV ecx, [esp+8] ;; y MOV ebx, [esp+12] ;; z SHR eax, 31 SHR ecx, 31 SHR ebx, 31 XOR eax, ecx LEA eax, [eax + ecx * 2 + 1] SHL ebx RET 12 |
|||
![]() |
|
tripledot 29 Apr 2012, 19:12
@aq83326: See my earlier post for the octant version. Your code fails; you haven't taken the z-coordinate into account - for octants (1,2,3,4,5,6,7,8...) your code returns (1,2,3,4,4,3,2,1).
|
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.