flat assembler
Message board for the users of flat assembler.
Index
> Main > Compute 10^n/10 using only 1 register with 2 instructions Goto page Previous 1, 2, 3 
Author 

Furs 29 Jan 2019, 12:40
Amazing thread, should be made a sticky, I don't want it lost. Maybe in some useful tricks section or so
Maybe we should have a thread with such code tricks, possibly in more languages (I have in constexpr C++ also, computing the constant itself fully at compiletime, zero overhead there). Then we can link this thread and others from there so it's not lost, and only that thread be made sticky. Tomasz Grysztar wrote: Well, except that B needs to be odd, but if it's even you can start with shifting both A and B right in parallel until B becomes odd 

29 Jan 2019, 12:40 

Tomasz Grysztar 29 Jan 2019, 13:06
Furs wrote: What if A is 1? Shifting it right would make it zero, so it's clearly wrong. 

29 Jan 2019, 13:06 

Tomasz Grysztar 15 Oct 2019, 11:10
For the purpose of playing with 2adic numbers using fasmg I made this macro that implements the same algorithm for division:
Code: struc div2adic? A, B, BITS:64 .a = A .b = B while .b and 1 = 0 assert .a and 1 = 0 .b = .b shr 1 .a = .a shr 1 end while . = 0 .product = 0 while .product xor .a .index = bsf (.product xor .a) if .index >= BITS break end if . = . or 1 shl .index .product = .product + .b shl .index end while end struc Code: i3 div2adic 1,3 dq i3 ; 0AAAAAAAAAAAAAAABh Code: i7 div2adic 1,7, 512 ddqq i7 ; 0B6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB7h 

15 Oct 2019, 11:10 

Tomasz Grysztar 07 Nov 2019, 19:39
There is one more thing worth mentioning: even though 2adic reciprocals are only useful for division when you know that the number is divisible exactly, it is also quite easy to verify that it was the case (so you can have error handling).
For example, if we compute 32 lowest digits of A/B (let's call that 32bit value Q), then if we multiply it by B, we get 32 lowest digits of (A/B)*B = A. The only problem is that when A was not exactly divisible by B, then Q*B has some additional digits (above the 32) and it just happens to have the same low 32 digits as A. Therefore to test whether the number was exactly divisible, it is enough to check whether Q*B overflows out of 32 bits. For unsigned case this can be tested by comparing Q with (1 shl 32)/B. For example, to divide an unsigned number by 7 and signal an error when the number was not exactly divisible: Code: imul eax,0B6DB6DB7h cmp eax,(1 shl 32)/7 ja not_divisible Last edited by Tomasz Grysztar on 08 Nov 2019, 06:34; edited 1 time in total 

07 Nov 2019, 19:39 

revolution 08 Nov 2019, 05:49
Yes, makes sense.
But it fails for 0xfffffffc (0x24924924 * 7) 0xB6DB6DB7 * 0xfffffffc = 0x(...B6DB6DB4)24924924 (1 shl 32) / 7 = 0x24924924(.6DB6DB6DB...) So the JAE test is taken. 

08 Nov 2019, 05:49 

Tomasz Grysztar 08 Nov 2019, 06:21
Oh, right, this should be JA there. The divisor must be odd, so (1 shl 32)/B is never going to be exact (and it is rounded down). So when the numbers are equal, there is still no overflow.
In other words the condition for no overflow is Q <= (1 shl 32)/B < . 

08 Nov 2019, 06:21 

Tomasz Grysztar 24 Nov 2019, 17:52
I have also made a two part video that teaches the basics of applying 2adic numbers to assembly programming:
2adic numbers for x86 programmers, part 1 2aidc numbers for x86 programmers, part 2 Perhaps this topic might be more understandable to beginners when presented such way. 

24 Nov 2019, 17:52 

Furs 26 Nov 2019, 14:29
Thanks so much for those videos, I finally understood how 2adic numbers work now, and your multiplication/division was intuitive (second part).
They rely on "infinite" being special, as in (1+2+4+8...) equals 2*(1+2+4+...) even if second one has one "less" term, because infinity works that way. 

26 Nov 2019, 14:29 

Tomasz Grysztar 26 Nov 2019, 15:20
Furs wrote: They rely on "infinite" being special, as in (1+2+4+8...) equals 2*(1+2+4+...) even if second one has one "less" term, because infinity works that way. So, for example, when you consider 1+2+4+8+...+P and 1+2*(1+2+4+8+...+P), the difference between them is a high power two, and you can take P as large as you want. So you can simply choose P large enough that it is no longer relevant to you. The 2adic metric formalizes this by saying that when you multiply something by a power of two, you move it closer to zero. The measure of closeness to zero is how many 0 bits you have at the bottom of your number  the more there are, the closer to zero the number is (as zero is the number that has indefinitely many of them). And in general, any two numbers are closer to each other the more identical bits they have on the low end. Once you have this metric welldefined, you can use limits, just as you do with real numbers (which use classic euclidean metric). But you can also have a more practical view of it: the closer 2adic numbers are to each other, the more bits of storage you need to be able to tell a difference. 

26 Nov 2019, 15:20 

revolution 10 Feb 2020, 20:07
Just now the thought occurs to me that a 2adic reciprocal is exactly the same as the inverse modulo 2^bit_length.
Since the reciprocalofx multiplied by x must be 1 mod 2^bit_length. Therefore: 2adic 1/x == x^(1) mod 1^bit_length. If this is already mentioned, or alluded to, above then I totally missed that point. Sorry. 

10 Feb 2020, 20:07 

revolution 11 Feb 2020, 13:40
So based upon the above, this should also work:
Code: macro div_2 dest, numerator, denominator { ;compute dest = numerator / denominator mov dest,denominator repeat 32  3 imul dest,dest imul dest,denominator end repeat imul dest,numerator } ; compute (1/13+8/17)*(210/11+1) = 11 div_2 eax, 1, 13 div_2 ebx, 8, 17 div_2 ecx, 210, 11 add eax,ebx inc ecx imul eax,ecx cmp eax,11 jnz .failed 

11 Feb 2020, 13:40 

revolution 12 Feb 2020, 10:39
Since each iteration of the imul pairs produces one more valid bit of output then we can use shorter registers for the first stage. This could be a win for 64bit CPUs if the multiplier has a faster path for 32bit operations.
Code: macro div_2 dest, numerator, denominator { ;compute dest = numerator / denominator mov dest,denominator repeat 16  3 imul dest,dest imul dest,denominator end repeat repeat 32  16 imul e#dest,e#dest imul e#dest,denominator end repeat imul e#dest,numerator } main: ; compute (1/13+8/17)*(210/11+1) = 11 div_2 ax, 1, 13 div_2 bx, 8, 17 div_2 cx, 210, 11 add eax,ebx inc ecx imul eax,ecx cmp eax,11 jnz .failed Also the initial seed value isn't important. Using the denominator gives us three bits. Any other odd value will also work fine but it will need two more iterations since only the first bit would be guaranteed valid. 

12 Feb 2020, 10:39 

Goto page Previous 1, 2, 3 < Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.