flat assembler
Message board for the users of flat assembler.

 Index > Main > Compute 10^n/10 using only 1 register with 2 instructions Goto page Previous  1, 2, 3
Author
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17668
Location: In your JS exploiting you and your system
revolution
Tomasz Grysztar wrote:
What is especially nice about truncated 2-adic arithmetics is that you can use it to perform several operations on rational numbers, and as long as you know that the final result is going to be an integer in 32-bit range, you know that result is going to be precise. You need such additional knowledge because just by looking at the lowest 32 bits of 2-adic result you are not able to tell an integer from a fraction.
I understand the 2-adic thing much better now. Thanks for the explanations.

I find them to be very interesting.

I also wonder if it could be "faster" (hehe, whatever that means) when computing a division to make the divide loop very efficient for only a fixed numerator of 1 and when the loop is done multiply by the final numerator. Pseudo code
Code:
 mov B,divisor
compute A = 1/B ;efficient single purpose loop
return A * numerator ;using imul    
29 Jan 2019, 09:49
Furs

Joined: 04 Mar 2016
Posts: 1520
Furs
Amazing thread, should be made a sticky, I don't want it lost. Maybe in some useful tricks section or so

Maybe we should have a thread with such code tricks, possibly in more languages (I have in constexpr C++ also, computing the constant itself fully at compile-time, zero overhead there). Then we can link this thread and others from there so it's not lost, and only that thread be made sticky.

Tomasz Grysztar wrote:
Well, except that B needs to be odd, but if it's even you can start with shifting both A and B right in parallel until B becomes odd
What if A is 1? Shifting it right would make it zero, so it's clearly wrong.
29 Jan 2019, 12:40
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
Furs wrote:
What if A is 1? Shifting it right would make it zero, so it's clearly wrong.
Yes, this is the "underflow" condition in my function, which is treated as invalid input (sets CF to 1). We lose any bits that go too far right. But this can be handled in general by shifting the entire working space left by a sufficient number of digits (p-adic numbers always extend only finitely to the right, so you should be able to find some upper bound for your set of calculations). Then it becomes similar to fixed-point arithmetics.
29 Jan 2019, 13:06
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
For the purpose of playing with 2-adic numbers using fasmg I made this macro that implements the same algorithm for division:
Code:
struc div2adic? A, B, BITS:64
.a = A
.b = B
while .b and 1 = 0
assert .a and 1 = 0
.b = .b shr 1
.a = .a shr 1
end while
. = 0
.product = 0
while .product xor .a
.index = bsf (.product xor .a)
if .index >= BITS
break
end if
. = . or 1 shl .index
.product = .product + .b shl .index
end while
end struc    
Use it like this:
Code:
i3 div2adic 1,3
dq i3 ; 0AAAAAAAAAAAAAAABh    
Note that it can compute an arbitrary number of bits:
Code:
i7 div2adic 1,7, 512
ddqq i7 ; 0B6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB6DB7h    
15 Oct 2019, 11:10
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
There is one more thing worth mentioning: even though 2-adic reciprocals are only useful for division when you know that the number is divisible exactly, it is also quite easy to verify that it was the case (so you can have error handling).

For example, if we compute 32 lowest digits of A/B (let's call that 32-bit value Q), then if we multiply it by B, we get 32 lowest digits of (A/B)*B = A. The only problem is that when A was not exactly divisible by B, then Q*B has some additional digits (above the 32) and it just happens to have the same low 32 digits as A.

Therefore to test whether the number was exactly divisible, it is enough to check whether Q*B overflows out of 32 bits. For unsigned case this can be tested by comparing Q with (1 shl 32)/B.

For example, to divide an unsigned number by 7 and signal an error when the number was not exactly divisible:
Code:
        imul    eax,0B6DB6DB7h
cmp     eax,(1 shl 32)/7
ja      not_divisible    
For signed case we would have two boundaries to check (we need to ensure that Q*B has high bits all the same as the sign bit of A).

Last edited by Tomasz Grysztar on 08 Nov 2019, 06:34; edited 1 time in total
07 Nov 2019, 19:39
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17668
Location: In your JS exploiting you and your system
revolution
Yes, makes sense.

But it fails for 0xfffffffc (0x24924924 * 7)

0xB6DB6DB7 * 0xfffffffc = 0x(...B6DB6DB4)24924924
(1 shl 32) / 7 = 0x24924924(.6DB6DB6DB...)

So the JAE test is taken.
08 Nov 2019, 05:49
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
Oh, right, this should be JA there. The divisor must be odd, so (1 shl 32)/B is never going to be exact (and it is rounded down). So when the numbers are equal, there is still no overflow.

In other words the condition for no overflow is Q <= (1 shl 32)/B < .
08 Nov 2019, 06:21
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
I have also made a two part video that teaches the basics of applying 2-adic numbers to assembly programming:

2-adic numbers for x86 programmers, part 1
2-aidc numbers for x86 programmers, part 2

Perhaps this topic might be more understandable to beginners when presented such way.
24 Nov 2019, 17:52
Furs

Joined: 04 Mar 2016
Posts: 1520
Furs
Thanks so much for those videos, I finally understood how 2-adic numbers work now, and your multiplication/division was intuitive (second part).

They rely on "infinite" being special, as in (1+2+4+8...) equals 2*(1+2+4+...) even if second one has one "less" term, because infinity works that way.
26 Nov 2019, 14:29
Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
Furs wrote:
They rely on "infinite" being special, as in (1+2+4+8...) equals 2*(1+2+4+...) even if second one has one "less" term, because infinity works that way.
You can still view it as just a potential infinity, not an actual one. You could say that when performing arithmetic operations on 2-adic numbers you simply keep computing (like adding with carry) until you reach such high powers of two that they are no longer relevant. After all, in reality you always have only a finite amount of bits to store the number, so there always is a power of two large enough that it no longer makes any difference to you.

So, for example, when you consider 1+2+4+8+...+P and 1+2*(1+2+4+8+...+P), the difference between them is a high power two, and you can take P as large as you want. So you can simply choose P large enough that it is no longer relevant to you.

The 2-adic metric formalizes this by saying that when you multiply something by a power of two, you move it closer to zero. The measure of closeness to zero is how many 0 bits you have at the bottom of your number - the more there are, the closer to zero the number is (as zero is the number that has indefinitely many of them). And in general, any two numbers are closer to each other the more identical bits they have on the low end. Once you have this metric well-defined, you can use limits, just as you do with real numbers (which use classic euclidean metric). But you can also have a more practical view of it: the closer 2-adic numbers are to each other, the more bits of storage you need to be able to tell a difference.
26 Nov 2019, 15:20
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17668
Location: In your JS exploiting you and your system
revolution
Just now the thought occurs to me that a 2-adic reciprocal is exactly the same as the inverse modulo 2^bit_length.

Since the reciprocal-of-x multiplied by x must be 1 mod 2^bit_length.

Therefore: 2-adic 1/x == x^(-1) mod 1^bit_length.

If this is already mentioned, or alluded to, above then I totally missed that point. Sorry.
10 Feb 2020, 20:07
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17668
Location: In your JS exploiting you and your system
revolution
So based upon the above, this should also work:
Code:
macro div_2 dest, numerator, denominator {
;compute dest = numerator / denominator
mov     dest,denominator
repeat 32 - 3
imul    dest,dest
imul    dest,denominator
end repeat
imul    dest,numerator
}

; compute (1/13+8/17)*(210/11+1) = 11
div_2   eax,   1, 13
div_2   ebx,   8, 17
div_2   ecx, 210, 11
inc     ecx
imul    eax,ecx
cmp     eax,11
jnz     .failed    
I unrolled the loop because I was lazy to put a label. But with the pure straight line code it might even perform batter than the bit-by-bit version in some circumstances.
11 Feb 2020, 13:40
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 17668
Location: In your JS exploiting you and your system
revolution
Since each iteration of the imul pairs produces one more valid bit of output then we can use shorter registers for the first stage. This could be a win for 64-bit CPUs if the multiplier has a faster path for 32-bit operations.
Code:
macro div_2 dest, numerator, denominator {
;compute dest = numerator / denominator
mov     dest,denominator
repeat 16 - 3
imul    dest,dest
imul    dest,denominator
end repeat
repeat 32 - 16
imul    e#dest,e#dest
imul    e#dest,denominator
end repeat
imul    e#dest,numerator
}

main:

; compute (1/13+8/17)*(210/11+1) = 11
div_2   ax,   1, 13
div_2   bx,   8, 17
div_2   cx, 210, 11
inc     ecx
imul    eax,ecx
cmp     eax,11
jnz     .failed    
The first mov gives 3-bits. Then the next 13 iterations brings that to a valid 16-bit value. And finally the next 16 iterations brings it to 32-bit. And if it was then extended to use 64-bit registers we can do another 32 iterations to give a 64-bit result. Although if the denominator is larger than the register size then we have to use the larger registers from the start.

Also the initial seed value isn't important. Using the denominator gives us three bits. Any other odd value will also work fine but it will need two more iterations since only the first bit would be guaranteed valid.
12 Feb 2020, 10:39
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page Previous  1, 2, 3

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum